指导
网站地图
英国作业 美国作业 加拿大作业
返回首页

英国管理科学assignment指导-Data Mining Analysis

论文价格: 免费 时间:2011-05-14 16:51:50 来源:www.ukassignment.org 作者:留学作业网

论文题目:Data Mining Analysis
论文语种:英文
您的研究方向:管理科学
是否有数据处理要求:是
您的国家:英国
您的学校背景:英国名校排名10
要求字数:35页
论文用途:本科课程论文 BA Assignment
是否需要盲审(博士或硕士生有这个需要):
补充要求和说明:需要根据SAS Enterprise Miner 分析HEMQ 数据,写一篇相关报告
Hi, I need to finish a coursework which is about Data Mining Analysis. I was wondering if there is a writer in our company can use the software called "SAS 9.1.3", as the coursework is required to analyse one of the data set by the software.

Also, this coursework would be about 35 pages. Is there any problem to do with it since your max page is 20???

Please reply me asap, thanks.

The topic is shown below:
By using SAS software, develop the most suitable predictions for the test dataset of the HMEQ dataset and prepare a technical report documenting your modelling process and results. Use your knowledge on all aspects of http://www.ukassignment.org/   model building, including earlier phases of data sampling, partitioning, transformation, replacement, and the different algorithms. Discuss and justify your choices of preprocessing and methods in a report (see instructions at the end). The report should be no longer than 30 pages (excl. Appendices). This report should include graphs and appropriate analysis. (Create random seed: 47701)


Some ideas / questions which you may want to address:
--Does the dataset contain outliers, missing values etc?   How were those issues addressed in data preprocessing? You  should  evaluate  at  least  two  different  candidate  sets  of  relevant  variables (argue  why  these  could  be  relevant)  and  also  evaluate  at  least  two  different  pre-processings, transformations  and  replacement  schemes  of  these  variables.  Use  the  findings  from  the  data analysis  to  evaluate  different  pre-processing  and  transformation  candidates. What  are  the most important variables / the ones with the highest discriminatory power (you may need to run models to  determine  this)?What  is  the  impact of  those data preprocessing  choices on  the performance different  algorithms? Consider that documenting those pre-processings that do not have a significant impact on the final accuracy may also be of interest.

--What is the best method to predict this dataset? Are particular methods more or less suitable for solving this task? What could serve as a simple baseline solution? You are expected to build and evaluate at least four different candidate models (e.g.  Different types of decision trees, logistic regression, neural networks or just 4 types of neural nets) and justify your choice of algorithm parameters. What is the sensitivity of the algorithms to their options in setting them up?#p#分页标题#e#

--Interpret how well your models are performing. What is a suitable benchmark?  What is an appropriate performance metric / measure? Classification rate, lift, ROC, costs...? Use at least two metrics and justify your choice. Carefully choose, describe and justify on what data partition you build, validate and evaluate your models. Provide evidence on errors on all data partitions!

--Consider whether this problem represents a balanced or imbalanced classification problem. Are the relevant classes balanced? Evaluate two different sampling strategies. Also, are the costs of misclassifying individual instances symmetric or asymmetric? Consider t in setting up your experiments with different target profiles to rebalance asymmetries and /or costs. Assume a cost relationship of “10” for failing to predict a default and “1” for failing to predict an instance to pay back the loan. Document whether and how this improves your results! 


--Clustering algorithms of unsupervised learning allow a type of modelling and answers / solutions to question distinctively different to those of classification. Critically  discuss  the  difference  of clustering  to classification, and where clustering algorithms may be employed within  the scope of the aforementioned questions of  the  coursework on  the HMEQ dataset. The essay should be no longer than 1500 words and make adequate use of current academic literature, case study evidence and examples to support your arguments.
 

此论文免费


如果您有论文代写需求,可以通过下面的方式联系我们
点击联系客服
如果发起不了聊天 请直接添加QQ 923678151
923678151
推荐内容
923678151