Show simple item record

dc.contributor.advisorLe, Thi Ly
dc.contributor.authorVo, Thi Le Uyen
dc.date.accessioned2024-03-19T03:09:24Z
dc.date.available2024-03-19T03:09:24Z
dc.date.issued2020-02
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/4765
dc.description.abstractType 2 diabetes mellitus (T2DM) is a severe chronic metabolic disorder threatening human health and it has a high incidence worldwide. People need to apply effective prediction model to diagnose and avoid Type 2 diabetes mellitus in time. Dipeptidyl peptidase IV inhibitors (DPPIV) is a promising Type 2 diabetes mellitus drug target that prolong the action of glucagon-like peptide-1 (GLP-1) and gastric inhibitory peptide (GIP). In this study, a new strategy is reported to prdict DPPIV inhibitors with machine learning approached ensemble learning such Random Forest, Extreme Gradient Boosting and Support Vector Machine. Currently, data mining methods have become an increasingly crucial technology with classification and regression capability in the field of T2DM diagnosis. This study suggested a risk prediction model for Type 2 diabetes mellitus based on ensemble learning methods. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. The purpose is to develop a strategy for combining ensemble classifiers results in higher classification accuracy than the constituent ensemble models. Two of the best performing tree-based ensemble methods – Random Forest (RF), eXtreme Gradient Boosting (XGB) and the best performing distance-based ensemble learning is Support Vector Machine (SVM) – were applied to generate a set of base models. The effectiveness of the methods was validated by comparing the various performance metrics and the outcomes of different contrast experiments. In classification models, SVM has combined with Substructure fingerprints that achieved the best AUC of 86% and MCC of 76% results in Layer 1 that has performance better than Layer 2. In Layer 3, Random Forest has represented results more significant with MSE of 0.19 and R-squared of 0.193 although overview of three regression models have performanced to be not high results. This research is designed to investigated if there is an approach which can integrate ensemble-based models to achieve even better classification accuracy.en_US
dc.language.isoenen_US
dc.subjectDPPIV inhibitorsen_US
dc.subjectType 2 diabets mellitusen_US
dc.subjectDPPIV inhibitorsen_US
dc.subjectIC50en_US
dc.subjectRandom Foresten_US
dc.subjecteXtreme Gradient Boostingen_US
dc.subjectSupport Vector Machineen_US
dc.titleIdppiv-3l: Identifying Dppiv Inhibitors And Their Strength Using Ensemble Learning Modelsen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record