Prediction of novel genes associated to a disease is an important issue in biomedical research. At early days, annotation-based methods were proposed for this problem. In next stage, with high-throughput technologies, data of interaction between genes/proteins has grown quickly and covered almost genome and proteome, and therefore network-based methods for the issue is becoming prominent. Besides those two methods, the prediction problem can be also approached using machine learning techniques because it can be formulated as a classification task of machine learning. To date, a number of supervised learning techniques and various types of gene/protein annotation data have been used to solve the disease gene classification/ prediction problem. However, to the best of our knowledge, there has been no study on the comparison of these methods that work on comprehensive biomedical annotation data. In addition, it is generally true that no classifier is better than others for all classification problems. Therefore, in this study, we compare the performance of disease gene prediction of several supervised learning techniques that have been used in the literature such as Decision Tree Learning, k-Nearest Neighbor, Naive Bayesian, Artificial Neural Networks and Support Vector Machines. We additionally assess Random Forest, a relatively new decision-tree-based ensemble learning method. The simulation results indicate that Random Forest obtained the best performance of all. Also, all methods are stable with the change of known disease genes used as positive training samples.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- A Comparative Study of Classification-Based Machine Learning Methods for Novel Disease Gene Prediction
Nguyen Xuan Hoai
Neuer Inhalt/© ITandMEDIA