Data Mining Using Rapid Minor

Data Mining Using Rapid Minor

Part B: Exploratory analysis

In order to perform the exploratory analyis of the loan-delinq-train.csv data set, here first of all the reseacher tired to identify any unsual pattern exists or not. As mentioned in the appendix one, the data set did not contain any unsusal pattern. At the same time, it has also seen from the the analysis that there is no missing value in the data set.

Additionally, the analysis has also explored that the characteristics of the training data set loan-delinq-train.csv consistent with the test data set loan-delinq-test.csv. At the same time, in order to understand whether there is any interesting relationships between the potential predictor variables and target variable SeriousDlqin2yrs  exists or not, correlation test has been performed here. The details of the analysis has shwon in the appendix section. According to the correlation matrix, Number Of Time 30-59 Days Past Due Not Worse has shown perfect relation with the target variable. In addition, Number Of Times 90 Days Late, age, and Number Of Time 60-89 Days Past Due Not Worse also have strong positive association with target variable. However, other variable does not have significant influence over SeriousDlqin2yrs. All these association has also demonstrated the fact that customer is likely to forfeit on a loan and become a loan delinquency. Finally, while talking about the loan delinquency occurring for a customer of ACME Bank, it can be argued that the above identified variables are the true predictor of loan delinquency. The literature review section also supports this view identified through exploratory analysis.

Part C: Decision tree analysis

The detailed of the decision tree model, input and output results using rapid miner has shown in the appendix section. According to this analysis, it can be seen that number of open credit line and loans is considered as the target variables. On the other hand, Number Of Time 30-59 Days Past Due Not Worse, Number Of Times 90 Days Late, age,  debt ratio and Number Of Time 60-89 Days Past Due Not Worse are considered as the predictor variables.

According to the decision tree analysis, the first option is to understand the number open credit line and loan.  Here, if the number of open credit line and loan is equivalent to range 1, then the customer will be categorized into a new group. On the other hand, if it become equivalent to range 2, then debt equity ratio will be taken into account. Now, as per the debt equity ratio, if the value become less than or equivalent to 0.718, then also the customer will be categorized into previous group. Now, if the debt equity ratio becomes more than 0.718, then revolving utilization of unsecured loan will be taken care off. According to the revolving utilization of unsecured loan score, if it fall below or equivalent to 0.003 or more than 0.039, then the customer will be categorized into previously mentioned group. On the other hand, any value above 0.003 but less than or equivalent to 0.039 will require assessment of SeriousDlqin2yrs. As per the SeriousDlqin2yrs value, if it zero then customer will be categorized under first group or else second group.    

Part D: Logistic regression analysis

The logistic regression analysis has shown that the total number of 4 attributes that have significant influence over loan delinquency. These are debt to equity ratio, number open credit line and loan, revolving utilization of unsecured loan and serious delinquency in 2 years period. According to the Kernel Model, here the weights of each of the attributes are as follows:

W(debt to equity ratio) = 0.708

W(number open credit line and loan) = 0.720

W(revolving utilization of unsecured loan) = 0.264 and

W(serious delinquency in 2 years period) = 1.052

These weighted value have shown that all these factors are significant predictor of loan delinquency.

Part E: Evaluation

The above two analysis have shown that debt equity ratio, number open credit line and loan, revolving utilization of unsecured loan and serios deliquency in 2 years are key variables. In addition, if any specific customer of ACME bank is taken into account, then it can be said that the fisrt thing that needs to be taken care off to judge whether a loan deliquency will have true out come is the debt equity value greater than  0.718. At the same time, if the utilization of unsecured loan value is 0.003 and serious deliquency in 2 years value is 0.039, then also it can be argued that the deliequncy will have true outcome.