UrbanPro
true

Learn Data Science from the Best Tutors

  • Affordable fees
  • 1-1 or Group class
  • Flexible Timings
  • Verified Tutors

Search in

Use Data Science To Find Credit Worthy Customers

Ranjit Mishra
14/07/2017 0 1

K-nearest neighbor classifier is one of the simplest to use, and hence, is widely used for classifying dynamic datasets. Click on the link to see how easy it is to classify credit-worthy vs credit-risk  customers:

gc 
##   Default checkingstatus1 duration history purpose amount savings employ
## 1       0             A11        6     A34     A43   1169     A65    A75
## 2       1             A12       48     A32     A43   5951     A61    A73
## 3       0             A14       12     A34     A46   2096     A61    A74
## 4       0             A11       42     A32     A42   7882     A61    A74
## 5       1             A11       24     A33     A40   4870     A61    A73
## 6       0             A14       36     A32     A46   9055     A65    A73
##   installment status others residence property age otherplans housing
## 1           4    A93   A101         4     A121  67       A143    A152
## 2           2    A92   A101         2     A121  22       A143    A152
## 3           2    A93   A101         3     A121  49       A143    A152
## 4           2    A93   A103         4     A122  45       A143    A153
## 5           3    A93   A101         4     A124  53       A143    A153
## 6           2    A93   A101         4     A124  35       A143    A153
##   cards  job liable tele foreign
## 1     2 A173      1 A192    A201
## 2     1 A173      1 A191    A201
## 3     1 A172      2 A191    A201
## 4     1 A173      2 A191    A201
## 5     2 A173      2 A191    A201
## 6     1 A172      2 A192    A201
## Taking back-up of the input file, in case the original data is required later

gc.bkup 
##      duration.V1          amount.V1         installment.V1   
##  Min.   :-1.401713   Min.   :-1.070329   Min.   :-1.7636311  
##  1st Qu.:-0.738298   1st Qu.:-0.675145   1st Qu.:-0.8697481  
##  Median :-0.240737   Median :-0.337176   Median : 0.0241348  
##  Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.0000000  
##  3rd Qu.: 0.256825   3rd Qu.: 0.248338   3rd Qu.: 0.9180178  
##  Max.   : 4.237315   Max.   : 5.368103   Max.   : 0.9180178
## Let's predict on a test set of 100 observations. Rest to be used as train set.

set.seed(123) 
test 
## [1] 68
100 * sum(test.def == knn.5)/100  # For knn = 5
## [1] 74
100 * sum(test.def == knn.20)/100 # For knn = 20
## [1] 81
## If we look at the above proportions, it's quite evident that K = 1 correctly classifies 68% of the outcomes, K = 5 correctly classifies 74% and K = 20 does it for 81% of the outcomes. 

## We should also look at the success rate against the value of increasing K.

table(knn.1 ,test.def)
##      test.def
## knn.1  0  1
##     0 54 11
##     1 21 14
## For K = 1, among 65 customers, 54 or 83%, is success rate. Let's look at k = 5 now

table(knn.5 ,test.def)
##      test.def
## knn.5  0  1
##     0 62 13
##     1 13 12
## For K = 5, among 76 customers, 63 or 82%, is success rate.Let's look at K = 20 now

table(knn.20 ,test.def)
##       test.def
## knn.20  0  1
##      0 69 13
##      1  6 12
##For K = 20, among 88 customers, 71 or 80%, is success rate.

## It seems increasing K increases the classification but reduces success rate. It is worse to class a customer as good when it is bad, than it is to class a customer as bad when it is good. 
## By looking at above success rates, K = 1 or K = 5 can be taken as optimum K.
## We can make a plot of the data with the training set in hollow shapes and the new ones filled in. 
## Plot for K = 1 can be created as follows - 

plot(train.gc[,c("amount","duration")],
           col=c(4,3,6,2)[gc.bkup[-test, "installment"]],
           pch=c(1,2)[as.numeric(train.def)],
           main="Predicted Default, by 1 Nearest Neighbors",cex.main=.95)
     
     points(test.gc[,c("amount","duration")],
            bg=c(4,3,6,2)[gc.bkup[-test,"installment"]],
            pch=c(21,24)[as.numeric(knn.1)],cex=1.2,col=grey(.7))
     
     legend("bottomright",pch=c(1,16,2,17),bg=c(1,1,1,1),
            legend=c("data 0","pred 0","data 1","pred 1"),
            title="default",bty="n",cex=.8)
     
     legend("topleft",fill=c(4,3,6,2),legend=c(1,2,3,4),
            title="installment %", horiz=TRUE,bty="n",col=grey(.7),cex=.8)

0 Dislike
Follow 0

Please Enter a comment

Submit

S

Saumya Rajen Shah | 28/07/2017

Why didn't you use K-means instead. For KNN, you are supposed to have labels beforehand, what if you never know who was credit worthy?

0 0

Other Lessons for You

Data Scientist Survey by IBM for 2020
According to IBM, there will be an increase by 3,50,000 to 2,80,000 opening in year 2020. Finance and Professional service having expected growth by 60%

Approach for Mastering Data Science
Few tips to Master Data Science 1)Do not start your learning with some software like R/Python/SAS etc 2)Start with very basics like 10th class Matrices/Coordinate Geometry/ 3) Understand little bit...

What is Dummy Regression?
What is a Dummy variable? A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute with two or more distinct categories/levels. Basically the binary variables...

REFERENCE BOOKS FOR DATA SCIENCE
Dear All, You can use the following books to master the DATA SCIENCE Concepts 1) First Course in Probability-Ronald Russel 2)Applied Regression Analysis-Drapper and Smith 3)Applied Multivariate Analysis-Richard...

What is Logistic Regression Model ?
Logistic regression is a form of regression which is used when the dependent is a dichotomy (yes or no) and the independents of any type (either continuous or binary). Logistic regression can be used...
X

Looking for Data Science Classes?

The best tutors for Data Science Classes are on UrbanPro

  • Select the best Tutor
  • Book & Attend a Free Demo
  • Pay and start Learning

Learn Data Science with the Best Tutors

The best Tutors for Data Science Classes are on UrbanPro

This website uses cookies

We use cookies to improve user experience. Choose what cookies you allow us to use. You can read more about our Cookie Policy in our Privacy Policy

Accept All
Decline All

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 55 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 7.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more