Methods for finding one - year probability of default in credit risk modeling
Abstract
Machine Learning is becoming one of the most important elds in our world. The
reason is that thanks to the growth of technology, it is getting easier to collect data of
individuals, objects, or phenomenons. With the enormous volume of data, it enables
scientists to predict the outcomes using corresponding variables. Furthermore, the need
for prediction is becoming more and more important in our daily lives. A case in point
is that a bank needs to determine whether it should lend money to a customer, and
hence to come to a nal conclusion, data scientists are asked to build models based
on some certain information. The information of the customer may include annual
income, age, gender, marriage status, number of children, history of repaying loans,
and many more types of data. Using these variables, scientists will be able to give
advice to the bank about whether the person can pay back the loan. The prediction
does not guarantee that it will be correct in the future. However, we can believe that
there is a high chance of obtaining the same result.
The example just demonstrates a fraction of how Machine Learning can be used in
real life, but it shows the potential of the eld. A professor once told me that Machine
Learning in Vietnam was like a "baby", it would de nitely grow up in a few years.
Moreover, he said that there was a high chance this "baby" could be a "genius". In
other words, Machine Learning could grow signi cantly to become the backbone of
Vietnam industrial development.
Because of the ability to grow in the future, I aim to give an introduction of Machine
Learning, including Logistic Regression, Decision Trees, Bagging, and Random Forest,
and how to apply in Credit Risk Modelling. Di erent from what had been done before
by other researchers, my goal is to clarify the idea behind each approach. In R, all
of the methods are written in compact functions and this can prevent students from
understanding how the codes work. To avoid blindly applying the functions, the thesis
will perform every single step of each method and recheck with R. To sum up, a diagram
is drawn to visualize the structure of the dissertation.