指导
网站地图
返回首页

留学生作业:数据挖掘技术在保险行业中的决策研究

论文价格: 免费 时间:2013-08-02 10:11:18 来源:www.ukassignment.org 作者:留学作业网

1 Introduction
1 引言

 

With the rapid development of database technology and database management systems widely used, more and more data accumulate all walks of life. Growing surge of data hidden behind a lot of important information that people want to be able to be a higher level of analysis in order to make better use of the data. The current database systems can efficiently implement data entry, query, statistics and other functions, but can not find the data relationships and rules exist, can not be based on existing data to predict future trends. Lack of knowledge hidden behind data mining tools, led to the "data explosion but knowledge poor" phenomenon.
随着数据库技术的迅速发展以及数据库管理系统的广泛应用,各行各业积累的数据越来越多。日益剧增的数据背后隐藏着许多重要的信息,人们希望能够对其进行更高层次的分析,以便更好地利用这些数据。目前的数据库系统可以高效地实现数据的录入、查询、统计等功能,但无法发现数据中存在的关系和规则,无法根据现有的数据预测未来的发展趋势。缺乏挖掘数据背后隐藏的知识手段,导致了“数据爆炸但知识贫乏”的现象。


With the development of computer and network technology, access to a particular industry relevant information has been feasible. For large quantities, involving a wide range of data, relying on the traditional simple summary of the specified model to analyze the statistical methods of data analysis can not be completed. Therefore, an intelligent analysis of information technology - "data mining" (Data Mining) came into being.
随着计算机及网络技术的发展,获得某一行业有关资料已切实可行。而对于数量大、涉及面广的数据,依靠传统的简单汇总、按指定模式去分析的统计方法无法完成对数据的分析。因此,一种智能化的信息分析技术——“数据挖掘”(Data Mining)应运而生。


Data Mining (Data Mining) is a large, incomplete, noisy, fuzzy, random data to extract implicit in them, people are not known in advance, but is potentially useful information and knowledge in the process . By mining data warehouse to store large amounts of data, and found a new association meaningful patterns and trends in the process. Data mining is a new business information processing technology, is a large number of commercial database business data extraction, transformation, analysis and processing of other models to extract critical data supporting business decisions. So that enterprises in the fierce market competition opportunities. As for the insurance industry, currently has a broad market demand.
数据挖掘(Data Mining)是从大量的、不完全的、有噪声的、模糊的、随机的数据中提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。通过挖掘数据仓库中存储的大量数据,从中发现有意义的新的关联模式和趋势的过程。数据挖掘是一种新的商业信息处理技术,是对商业数据库中的大量业务数据进行抽取、转换、分析和其他模型化处理,从中提取辅助商业决策的关键性数据。从而使企业在激烈的市场竞争中获得先机。就保险行业而言,目前具有广阔的市场需求。


2 Item Description
The project has developed "the insurance industry decision system V1.0". The main interface of system operation using ASP programming: data preprocessing, customers to buy insurance analysis, customer buying habits analysis and the results output functions; background database using the Sql Server 2005 network database implementation; mining tools using SPSS Clementine 11.0; experiments in the study stage Apriori algorithm exists for "Storage complexity" and "a lot of redundant rules," two major drawbacks of the algorithm to improve through the use of a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing the appearance of redundant rules .
The system consists of: data preprocessing, customers to buy insurance analysis, customer buying habits analysis and the results output and other major functional blocks.


(1) "preprocessing" modules include: upload, data platform, data processing, statistics, and other functions to generate data sets.
● Upload: to be completed by all branches Insurance Corporation under the data upload.
● Data Platform: allows the data before uploading data platform to choose.
● Data processing: cleaning up the data, format conversion and other operations.
● Statistics: The preprocessed data analysis, extraction efficacy data.
● generate data sets: the statistical data generating process to extract the active data set, to provide a higher quality data mining data source.


(2) "customers to buy insurance analysis" modules include: data import, parameter setting, result analysis and other functions.
● Data Import: In this user interface, by selecting different data platform will go through "data preprocessing" generated data sets were imported.
● Parameter setting: In this user interface settings "support", "confidence" and other parameters for effective analysis of the data set with the value range of the data record filter.
● Analysis: In this user interface can be "customers to buy insurance analysis," the final results of the analysis to the "report", "chart" format display, the results of this analysis for the industry to provide a "same customer buy our various (sub) insurance "customer information, thus providing the industry" to win customers' decision-making basis.


(3) "customer buying habits of" modules include: data import, parameter setting, result analysis and other functions.
● Data Import: This operation is the same (2) "customers to buy insurance analysis" module "Data Import."
● Parameter setting: In this setting, respectively, "Input Parameters" (including: age, gender, occupation and other basic customer information) and "Output Parameters" (customers buy insurance information).
● Analysis: With this interface can demonstrate customer buying habits analysis, thus providing the industry "to retain customers' decision-making basis.
(4) "analysis result output" modules include: "Analysis of customers to buy insurance" and "customer buying habits analysis" of the print output results.


Three projects improved fast algorithm
Since Apriori algorithm time and space complexity is high and there is a large amount of redundant rules two major defects. Therefore, this project through the use of a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing redundant rules appear.


3.1 a pattern tree structure
root is the one labeled as "null" the root, root root following the child's program as a prefix sub-tree collection, as well as project head table composition; tree each node contains four fields user_id, count, node_link, node_next. Which, user_id is user tags (uniquely identifies a user), count for the parent node of the node reaches the number of paths, node_link point to the same tree the user_id next node to the next node, the moment a node does not exist, node_link is null, node_next pointing to its child nodes in the tree; program header table for each table entry contains three fields: user_id, count, head of node, user_id with the same meaning as defined in the tree, count as user_id of the tree and all the same, head of node points to the tree with the same user_id value of the first node pointer.


3.2 Creating Pattern Tree
Algorithm is as follows:
Let the transaction database as A, one of the items set to Ai.
Algorithm: Patterntree (tree, p), constructed pattern tree
Input: A transaction database user
Output: User mode tree
Procedure Patterntree (T, p)
{Create_ tree (T) ;/ / create a Pattern-Tree root node to "null" mark
t = T; / / t for the current node
While A <> null do
{Read into a transactional database item set Ai
while p! = null
do
{If p.user_id == t ancestors n.user_id
then
{N.count = n.count + l;
t = n;
}
Elseif p.user_id == T kids c.user_id
then
{C.count = c.count + l;
t = c;
}
else
insert_Patterntree (T, p) ;/ / put p as a new node into the tree, as the current node's child nodes
p = p.next;
}
}
}


3.3 pairs pattern tree pruning
Pattern tree is established, there may be a large number of redundant branches, in order to ensure that the data mining results will not be the redundant branches affected by the noise generated, so the need for tree pruning, removing noise information.
Algorithm: SPT (Tree, a), by calling the model tree pruning algorithm
/ / SPT to support pattern tree, ie Supported Access Pattern Tree; a head table for the project
Input: Pattern tree PatternTree, Min_Sup (Pattern Tree minimum support)
Output: After pruning the support pattern tree SPT, mode B = {bi | i = 1,2,3 ...... n}
SPT (Tree, a)
{I = 1;
While (ai! = null) / / for the project head table in a one
{
if (ai.count> = Min_Sup)
then
{
Mode bi = ai.head of node;

p = ai.head of node ;/ / p in the schema tree pointing ai
Location
While (p! = null and ai.count> = Min_Sup)
{
Find the prefix p group, the p-group, and p connection prefix, configuration
Into Mode b;
if (bi.count> = Min_Sup)
then
{
/ / Bi.count the mode p and p b is the base of the prefix
The minimum count
P in the schema bi retain their prefixes base;
bi = bi. node_link
}
else
{
Depending on the mode of p and b prefix base deletion
PatternTree the corresponding node, a child node reconfiguration
With the parent node, and modify the project header table ai;
p = p. node_next / / p points in the pattern tree
Next position;   http://www.ukassignment.org/dxtermpaper/
}
}
}
else
{
Modify the project head node ai value;
Delete mode corresponding node in the tree and prefix-based, reconstruction Sons
Node;
i + +;
}
}
}


The establishment of the tree can be avoided through mode multiple scans the transaction database; while taking advantage count field effectively retains the number of itemsets to avoid generating a large number of frequent itemsets, for reducing the complexity of space-time has played a certain role. Tree structure can be avoided through a large amount of redundant rules.
Through the pattern tree pruning, tree can be deducted in the pattern generation process produces a large number of redundant branches, played a role in reducing the space complexity, and can utilize the output mode B production rules, to avoid a number of sets appears frequently, reducing the time complexity.


4 Conclusion
The project tree structure by mode improved Apriori algorithm, Apriori algorithm to make up for the defects. This method is not only capable of Apriori algorithm from time complexity and space complexity to improve on, while avoiding the generation of intermediate rules. This study shows that by using a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing the appearance of redundant rules, which improved Apriori algorithm is an effective measure.

此论文免费


如果您有论文代写需求,可以通过下面的方式联系我们
点击联系客服
如果发起不了聊天 请直接添加QQ 923678151
923678151
推荐内容
  • Term paper写作格式...

    term paper写作格式参考-直接税对消费者均衡的影响。本学期论文旨在检验和确定影响消费者均衡的各种直接税因素的影响。主要供各位参考国外大学的termpap......

  • Termpaper格式范文A...

    本文是一篇termpaper范文节选,主要内容是关于全球化与社会福利通过研究分析全球化作为经济、社会和政治力量的一体化和相互关联的进程,产生了各种结果。...

  • 英国coursework指导...

    惠普是美国著名的跨国信息技术公司。它在1934年由比尔•休利特和戴维•帕卡德创立。公司名字就是从两位创造者的名字中来的。惠普已经发展成世界上最大的信息技术公司之......

  • 计算机联锁的发展要求及发展方...

    文章主要提出一种全电子、模块化的计算机联锁系统,并对计算机联锁以后的发展趋势进行了分析,希望随着计算机技术和电子制造技术的发展,全电子模块化的计算机联锁系统成为......

  • 澳洲艺术业论文范文:联合的即...

    历史发展到21世纪,为世人展现出一幅多元的文化图景。很多原有的文化局面在新世纪新观念的冲击下重组再生;同时,随着这一文化格局的改变,人们对多元文化现象的思考日趋......

  • 澳洲经济类term pape...

    本文首先介绍了公司股票回购制度的概况,并对该项制度进行利弊评析,然后在介绍分析境外若干个国家或地区有关股票回购市场准入等相关规定的基础上,剖析了我国股票回购的发......

923678151