dc.description.abstract | Credit risk is one of the major nancial challenges that exists in the banking system and
nancial institutions. This thesis proposes a Machine-Learning-based approach named Boosting
Algorithms in order to solve the default risk problem. Boosting Algorithm is the general
name representing for ensemble models. There are many di erent boosting algorithms, the
later versions improve the shortcomings of previous one as well as be designed to work with
complicated and heterogeneous data, especially to tackle sparsity and large-scale data issues.
This thesis mainly introduces about AdaBoost and Gradient Boosted Decision Trees (GBDTs).
While AdaBoost is the very rst version of boosting algorithm, GBDTs proves to be
a clever algorithm and has a lot of potential for further improvement. Talking about GBDTs
cannot help but mention three powerful implementations, which are XGBoost, LightGBM
and CatBoost.
Applying to the Home Credit dataset to solve the credit default risk problem, XGBoost,
LightGBM and CatBoost achieved auc score larger than 0.75 (AdaBoost was more modest
with 0.71) while the current highest score is 0.8.
At the end of this thesis, after improving CatBoost with some advanced techniques, our
model gained 0.79 auc score. Without stopping here, understanding the foundation of these
algorithms can help us to continue to research and improve their performance.
Key words:
Default Risk, Machine Learning, Boosting Algorithms, AdaBoost, Gradient Boosting Machine,
Gradient Boosted Decision Trees, XGBoost, LightGBM, CatBoost. | en_US |