Inacc - 3l: Identification Of Anticancer Compounds From Herbal Database And Evaluation Of Their Therapeutic Efficacy Using Ensemble Learning Models
Abstract
On a positive aspect, machine learning approaches which have been intensively grown
in recent years, are considered to be robust and effective solutions to address various
biological and chemical issues, including drug discovery because it can save time, cost
and human efforts. This study aims to develop an ensemble learning model to
establish a relationship between chemical structures of natural compounds and their
anti-cancerous inhibition activities. Consequently, three commonly used models,
including Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient
Boosting (XGB), were selected to deal with two classification problems and one
regression problem. The used compound data were collected from naturally occurring
Plant-based Anti-cancerous Compound-Activity-Target (NPACT) and Anticancer Herbs
database of Systems Pharmacology (CancerHSP) databases. Each molecule was
expressed using 12 types of molecular fingerprints as inputs for modeling to identify
anticancer compounds as well as evaluating their therapeutic efficacies. The
constructed well-fitted models can assist experimental scientists to screen for potent
natural compounds with strong anticancer activity. The results show that three models
efficiently classify inhibitors and non-inhibitors with an accuracy of up to 0.7. The
experiments also reveal that good outcomes were not obtained for regression tasks
with a low coefficient of determination (R2<0.5) and high mean squared error
(MSE>11). Furthermore, this study suggests that the XGB model is the bestperformed model for almost evaluation metrics in both classification and regression
problems.