Show simple item record

dc.contributor.advisorLe, Thi Ly
dc.contributor.authorTran, Ha Phuong Anh
dc.date.accessioned2024-03-19T03:00:13Z
dc.date.available2024-03-19T03:00:13Z
dc.date.issued2020-02
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/4760
dc.description.abstractOn a positive aspect, machine learning approaches which have been intensively grown in recent years, are considered to be robust and effective solutions to address various biological and chemical issues, including drug discovery because it can save time, cost and human efforts. This study aims to develop an ensemble learning model to establish a relationship between chemical structures of natural compounds and their anti-cancerous inhibition activities. Consequently, three commonly used models, including Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB), were selected to deal with two classification problems and one regression problem. The used compound data were collected from naturally occurring Plant-based Anti-cancerous Compound-Activity-Target (NPACT) and Anticancer Herbs database of Systems Pharmacology (CancerHSP) databases. Each molecule was expressed using 12 types of molecular fingerprints as inputs for modeling to identify anticancer compounds as well as evaluating their therapeutic efficacies. The constructed well-fitted models can assist experimental scientists to screen for potent natural compounds with strong anticancer activity. The results show that three models efficiently classify inhibitors and non-inhibitors with an accuracy of up to 0.7. The experiments also reveal that good outcomes were not obtained for regression tasks with a low coefficient of determination (R2<0.5) and high mean squared error (MSE>11). Furthermore, this study suggests that the XGB model is the bestperformed model for almost evaluation metrics in both classification and regression problems.en_US
dc.language.isoenen_US
dc.subjectAnticancer compoundsen_US
dc.subjectherbal databaseen_US
dc.subjectRandom Forest (RF)en_US
dc.subjectVector Machine (SVM)en_US
dc.subjectExtreme Gradient Boosting (XGB)en_US
dc.subjectclassification problemen_US
dc.subjectregression problemen_US
dc.titleInacc - 3l: Identification Of Anticancer Compounds From Herbal Database And Evaluation Of Their Therapeutic Efficacy Using Ensemble Learning Modelsen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record