指导
网站地图
澳洲代写assignment 代写英国assignment Assignment格式 如何写assignment
返回首页

英国爱丁堡大学计算机实验室使用留学生指导-Computer Lab Workshop

论文价格: 免费 时间:2010-12-25 17:44:23 来源:www.ukassignment.org 作者:留学作业网

Credit Scoring and Data Mining

Computer Lab Workshop 2 using Excel
 
A. Using Pivot Tables in Excel to help coarse classifying variables
 

We will use the data on 100 customers that is found on cc100.xls.

Open that Excel file and note what the data means.

Most of the data describes the customer attributes but some describe the subsequent history.

DL1 is number of times in year 1 month overdue, DL2 is number of times 2 months overdue etc. Definition of bad=1 is at least once 1 month overdue  (i,e. 1 or more in DL1).

 

We use Pivot Tables in Excel to display analysis. For real scorecard building one would use the Chi module is SPSS, or the Enterprise Miner module in SAS ( though that chooses classes automatically so some just use the standard SAS module) or dedicated credit scoring systems like Paragon’s DSS..

 

#p#分页标题#e#1. To use the pivot tables proceed as follows.

Highlight the whole table from a1: w101.

Go to Pivot Table and Pivot Chart Report in Data

When the first screen of the Wizard appears you want to create to click on data is in Microsoft Excel  list and create Pivot table

Then go to next screen and make sure you are using data in A1:W101.

Go to next screen and say you want to go to existing worksheet and chose to start table at A120.

Then scroll down to A120.

Drag  res ( residential status ) to pivot column

Drag good=1 and bad=1 to data section.http://www.ukassignment.org/yingguozuoye/ying_liu_zuo_dai_/

 

You should get that res status O ( owner) has 37 goods and 20 bads, residential status P ( with parents) has 10 goods and 7 bads; residential status T( furnished tenants) has 7 goods and 4 bads.

To get ratio of goods to bads in cell b124 type =b122/b123. Copy this across to columns C,D, E and F.

#p#分页标题#e#

Which outcomes of residential status would you combine? My suggestion would be to have classes of {O}, {P,T,X} and {U}. Do you agree?

 

2.The pivot table for “children” has already been done just above. What classes would you combine of these?

Similarly spouses employment category has a table above. What classes would you form there?  Look at the definitions of the attributes as well as the good:bad ratio.

 

3. You might want to look at combinations of variables. So create a two dimensional pivot table to look at the interaction between residential status and having a cheque guarantee card.

Again highlight the whole table from a1: w101.

Go to Pivot Table and Pivot Chart Report in Data

When the first screen of the Wizard appears you want to create to click on data is in Microsoft Excel  list and create Pivot table

Then go to next screen and make sure you are using data in A1:W101.

Go to next screen and say you want to go to existing worksheet and chose to start table at A130.#p#分页标题#e#

Then scroll down to A130.

Drag  res ( residential status ) to pivot row and drag cheque to pivot column

Drag good=1 and bad=1 to data section.

 

You should get that

Of owners with cheque guarantee card 28 are good and 16 are bad

Of owners with no cheque guarantee card 9 are good and 4 are bad;

Of with parents with cheque guarantee card 4 are good and 6 are bad

Of with parents with no cheque guarantee card 6 are good and 1 is bad

 

Which groups would you put together to form classes now.

I would be tempted to make it { owners}, { P or T or X with no cheque guarantee card}, {P or T with card and U with no card} and { U or X with card}. What groups would you choose.

 #p#分页标题#e#

 

4. For binary variables decide what would be good/bad ratio for interactions of

cheque account

deposit account

current account

phone

Can you work out more than two interactions?

 

5. For continuous variables do same thing and then combine to get sensible coarse classifying for age

Why does pivot tables not work as well for this.

Can you construct the best monotone coarse classifier?

 

 

B: Constructing KS and ROC Curves#p#分页标题#e#

 

Suppose one has constructed a scorecard how does one then calculate the KS and ROC curves?

We will use the linear regression based scorecard we obtained in the previous workshop where

 

100Y= -1 +17X(18-33) -11X(33-48) -31X(49-54) +19X(55 -63) + 68X(0child) +63X(1child) +67X(2child) +30X(B,Y) +6X(R,H,X) –14X(Z, no response) +23X(P,E,L)

 

Go to sheet 2 where the data used in constructing this scorecard is. In row 104 type in the scoring values starting in column E. So type in 17 in E104, -11 in F104 , -31 in G104 and so in until one finishes with 23 in O104.

Go to  T1 and type in “score”.

In T2 we calculate the score of that person by typing in   =SUMPRODUCT(E2:O2,E$104:O$104)

This should give a score of 3.

Copy cell T2 down the column to T101, che

Copy column D, the good=1 column into column U by highlighting the top of column D ( the cell with a D in it , clicking copy and then going to the top of U ( the cell with U) in it and paste.#p#分页标题#e#

 

These, score and whether person was good  are the two pieces of information needed to evaluate the ROC and KS curve. We will use another sheet to evaluate the curves on.

Highlight columns T and U. Press copy go to Sheet 4 and with cursor on the first column go to Paste Special in the Edit menu. Click the values box ( otherwise we only get the formulae which don’t make sense in this sheet)  and then click on OK.

Highlighting the two columns A and B go to Sort in the Data menu and sort on score column with scores ascending.

In cell B102 ( at bottom of good column) use Summation sign to get =SUM(B2:B101), the number of goods, In c102 type =100-b102 and in d102 type =b102. We will use the C and D column to find how many bads ( C) and How many goods(D) have values less than that customer.

Type 0s in C1 and D1

In C2, type =IF(b2=0,C1+1,C1) which adds one if the b column is 0 ( a bad). Copy this down to C101.

In D2 type =IF(b2=1,d1+1,D1) which does the same thing for the goods.

To get these into percentages in E1 type %of bads, in F1 type % of goods

In E2 type =C2/C$102 and copy down the E column.

#p#分页标题#e#In F2 type =D2/D$102 and copy down the F column

 

To get KS value in g column type in G2 =E2-F2 and copy this down the whole column. The largest value ( 0.66) is the KS statistic

 

To get KS chart go to Chart Wizard

Choose line  and  in data range type in  or highlight E1 to F101 ( E1:F101). Let this chart be in the current sheet and tidy up the colours and background fill. This is the KS chart

 

To get the ROC chart go to Chart Wizard again

Choose scatter this time and choose the variant where the data points are connected by a line, The data range is E2:F101 and when it comes to labelling the axes label the x axis % bads and the Y axis % goods.

 

 

 

Exercise: Can you work out how to calculate the Gini coefficient form the numbers given in the spreadsheet?#p#分页标题#e# 

 
 

此论文免费


如果您有论文代写需求,可以通过下面的方式联系我们
点击联系客服
如果发起不了聊天 请直接添加QQ 923678151
923678151
推荐内容
  • 英文Assignment和D...

    英文Assignment和Dissertation的写作细节(珍藏版)-Dissertation大体结构-Dissertation写作思路-Dissertati......

  • 从女性黑人说唱音乐中看美国传...

    本文是本站代做的assignment范文,有关女性解放问题。人们都认为黑人女说唱音乐应该不受传统观念的束缚,它应当是创新的、能够促进黑人女性解放的,并且能够提高......

  • 英国assignment格式...

    这是一个动态的模块,这里的学生都将参加在分析现实世界的例子,利用直接观察获得的信息。 本模块考虑的问题,实践文化管理都可能遇到,在他们的组织内,并有助于认识到......

  • 英语专业课程作业assign...

    提供英语专业课程作业assignment格式范例(商务、财经、法律英语方向)-本范例涵盖项目设计及论文写作课程(商务、财经、法律英语方向)第二次作业前五个部分。......

  • 英国assignment指导...

    核心提示:英国assignment指导要怎么写Report(British assignment writing to how to write Report ......

  • 英国法学论文:现代民法变迁来...

    19世纪到20世纪发生了剧烈的社会变迁,以此为基础,民法也发生了相应的变化和调整。如民法的社会化、去法典化以及自由法运动的兴起等等。英国民法应当从这些变化中汲取......

923678151