dc.description.abstract | Type 2 diabetes mellitus (T2DM) is a severe chronic metabolic disorder threatening human health and
it has a high incidence worldwide. People need to apply effective prediction model to diagnose and
avoid Type 2 diabetes mellitus in time. Dipeptidyl peptidase IV inhibitors (DPPIV) is a promising
Type 2 diabetes mellitus drug target that prolong the action of glucagon-like peptide-1 (GLP-1) and
gastric inhibitory peptide (GIP). In this study, a new strategy is reported to prdict DPPIV
inhibitors with machine learning approached ensemble learning such Random Forest, Extreme Gradient
Boosting and Support Vector Machine. Currently, data mining methods have become an increasingly
crucial technology with classification and regression capability in the field of T2DM diagnosis.
This study suggested a risk prediction model for Type 2 diabetes mellitus based on ensemble
learning methods. These models combine the predictions of several base models to achieve higher
out-of-sample classification accuracy than the base models. The purpose is to develop a strategy
for combining ensemble classifiers results in higher classification accuracy than the constituent
ensemble models. Two of the best performing tree-based ensemble methods – Random Forest (RF),
eXtreme Gradient Boosting (XGB) and the best performing distance-based ensemble learning is Support
Vector Machine (SVM) – were applied to generate a set of base models. The effectiveness of the
methods was validated by comparing the various performance metrics and the outcomes of different
contrast experiments. In classification models, SVM has combined with Substructure fingerprints
that achieved the best AUC of 86% and MCC of 76% results in Layer 1 that has performance better
than Layer 2. In Layer 3, Random Forest has represented results more significant with MSE of 0.19
and R-squared of 0.193 although overview of three regression models have performanced to be not
high results. This research is designed to investigated if there is an approach which can integrate
ensemble-based models to achieve even better classification accuracy. | en_US |