2018 | OriginalPaper | Buchkapitel
Tipp
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Erschienen in:
Bioinformatics and Biomedical Engineering
Due to its robustness and built-in feature selection capability, random forest is frequently employed in omics studies for biomarker discovery and predictive modeling. However, random forest assumes equal importance of all features, while in reality domain knowledge may justify the prioritization of more relevant features. Furthermore, it has been shown that an antecedent feature selection step can improve the performance of random forest by reducing noises and search space. In this paper, we present a novel Know-guided regularized random forest (Know-GRRF) method that incorporates domain knowledge in a random forest framework for feature selection. Via rigorous simulations, we show that Know-GRRF outperforms existing methods by correctly identifying informative features and improving the accuracy of subsequent predictive models. Know-GRRF is responsive to a wide range of tuning parameters that help to better differentiate candidate features. Know-GRRF is also stable from run to run, making it robust to noises. We further proved that Know-GRRF is a generalized form of existing methods, RRF and GRRF. We applied Known-GRRF to a real world radiation biodosimetry study that uses non-human primate data to discover biomarkers for human applications. By using cross-species correlation as domain knowledge, Know-GRRF was able to identify three gene markers that significantly improved the cross-species prediction accuracy. We implemented Know-GRRF as an R package that is available through the CRAN archive.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Anzeige
1.
Zurück zum Zitat Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999). https://doi.org/10.1126/science.286.5439.531 CrossRef Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science
286, 531–537 (1999).
https://doi.org/10.1126/science.286.5439.531
CrossRef
2.
Zurück zum Zitat Zhou, H., Skolnick, J.: A knowledge-based approach for predicting gene–disease associations. Bioinformatics 32, 2831–2838 (2016). https://doi.org/10.1093/bioinformatics/btw358 CrossRef Zhou, H., Skolnick, J.: A knowledge-based approach for predicting gene–disease associations. Bioinformatics
32, 2831–2838 (2016).
https://doi.org/10.1093/bioinformatics/btw358
CrossRef
3.
Zurück zum Zitat Barzilay, O., Brailovsky, V.L.: On domain knowledge and feature selection using a support vector machine. Pattern Recognit. Lett. 20, 475–484 (1999). https://doi.org/10.1016/S0167-8655(99)00014-8 CrossRef Barzilay, O., Brailovsky, V.L.: On domain knowledge and feature selection using a support vector machine. Pattern Recognit. Lett.
20, 475–484 (1999).
https://doi.org/10.1016/S0167-8655(99)00014-8
CrossRef
4.
Zurück zum Zitat Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 03, 185–205 (2005). https://doi.org/10.1142/S0219720005001004 CrossRef Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol.
03, 185–205 (2005).
https://doi.org/10.1142/S0219720005001004
CrossRef
5.
Zurück zum Zitat Park, H., Niida, A., Imoto, S., Miyano, S.: Interaction-based feature selection for uncovering cancer driver genes through copy number-driven expression level. J. Comput. Biol. 24, 138–152 (2017). https://doi.org/10.1089/cmb.2016.0140 MathSciNetCrossRef Park, H., Niida, A., Imoto, S., Miyano, S.: Interaction-based feature selection for uncovering cancer driver genes through copy number-driven expression level. J. Comput. Biol.
24, 138–152 (2017).
https://doi.org/10.1089/cmb.2016.0140
MathSciNetCrossRef
6.
Zurück zum Zitat Iguyon, I., Elisseeff, A.: An introduction to variable and feature selection. J Mach. Learn. Res. 3, 1157–1182 (2003) MATH Iguyon, I., Elisseeff, A.: An introduction to variable and feature selection. J Mach. Learn. Res.
3, 1157–1182 (2003)
MATH
7.
Zurück zum Zitat Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014). https://doi.org/10.1016/j.ins.2014.05.042 CrossRef Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci.
282, 111–135 (2014).
https://doi.org/10.1016/j.ins.2014.05.042
CrossRef
8.
Zurück zum Zitat Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recogn. 46, 3483–3489 (2013). https://doi.org/10.1016/j.patcog.2013.05.018 CrossRef Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recogn.
46, 3483–3489 (2013).
https://doi.org/10.1016/j.patcog.2013.05.018
CrossRef
9.
Zurück zum Zitat Breiman, L.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984) MATH Breiman, L.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
MATH
10.
Zurück zum Zitat Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57, 289–300 (1995) MathSciNetMATH Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol.
57, 289–300 (1995)
MathSciNetMATH
11.
Zurück zum Zitat Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Sys. 34, 483–519 (2013). https://doi.org/10.1007/s10115-012-0487-8 CrossRef Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Sys.
34, 483–519 (2013).
https://doi.org/10.1007/s10115-012-0487-8
CrossRef
12.
Zurück zum Zitat Park, J.G., Paul, S., Briones, N., Zeng, J., Gillis, K., et al.: Developing human radiation biodosimetry models: testing cross-species conversion approaches using an ex vivo model system. Radiat. Res. 187, 708–721 (2017). https://doi.org/10.1667/RR14655.1 CrossRef Park, J.G., Paul, S., Briones, N., Zeng, J., Gillis, K., et al.: Developing human radiation biodosimetry models: testing cross-species conversion approaches using an ex vivo model system. Radiat. Res.
187, 708–721 (2017).
https://doi.org/10.1667/RR14655.1
CrossRef
13.
Zurück zum Zitat Marchetti, F., Coleman, M.A., Jones, I.M., Wyrobek, A.J.: Candidate protein biodosimeters of human exposure to ionizing radiation. Int. J. Radiat. Biol. 82, 605–639 (2006). https://doi.org/10.1080/09553000600930103 CrossRef Marchetti, F., Coleman, M.A., Jones, I.M., Wyrobek, A.J.: Candidate protein biodosimeters of human exposure to ionizing radiation. Int. J. Radiat. Biol.
82, 605–639 (2006).
https://doi.org/10.1080/09553000600930103
CrossRef
14.
Zurück zum Zitat Paul, S., Barker, C.A., Turner, H.C., McLane, A., Wolden, S.L., et al.: Prediction of in vivo radiation dose status in radiotherapy patients using ex vivo and in vivo gene expression signatures. Radiat. Res. 175, 257–265 (2011). https://doi.org/10.1667/rr2420.1 CrossRef Paul, S., Barker, C.A., Turner, H.C., McLane, A., Wolden, S.L., et al.: Prediction of in vivo radiation dose status in radiotherapy patients using ex vivo and in vivo gene expression signatures. Radiat. Res.
175, 257–265 (2011).
https://doi.org/10.1667/rr2420.1
CrossRef
15.
Zurück zum Zitat Tucker, J.D., Joiner, M.C., Thomas, R.A., Grever, W.E., Bakhmutsky, M.V., et al.: Accurate gene expression-based biodosimetry using a minimal set of human gene transcripts. Int. J. Radiat. Oncol. Biol. Phys. 88, 933–939 (2014). https://doi.org/10.1016/j.ijrobp.2013.11.248 CrossRef Tucker, J.D., Joiner, M.C., Thomas, R.A., Grever, W.E., Bakhmutsky, M.V., et al.: Accurate gene expression-based biodosimetry using a minimal set of human gene transcripts. Int. J. Radiat. Oncol. Biol. Phys.
88, 933–939 (2014).
https://doi.org/10.1016/j.ijrobp.2013.11.248
CrossRef
16.
Zurück zum Zitat Riecke, A., Rufa, C.G., Cordes, M., Hartmann, J., Meineke, V., et al.: Gene expression comparisons performed for biodosimetry purposes on in vitro peripheral blood cellular subsets and irradiated individuals. Radiat. Res. 178, 234–243 (2012). https://doi.org/10.1667/rr2738.1 CrossRef Riecke, A., Rufa, C.G., Cordes, M., Hartmann, J., Meineke, V., et al.: Gene expression comparisons performed for biodosimetry purposes on in vitro peripheral blood cellular subsets and irradiated individuals. Radiat. Res.
178, 234–243 (2012).
https://doi.org/10.1667/rr2738.1
CrossRef
17.
Zurück zum Zitat Bruserud, O., Reikvam, H., Fredly, H., Skavland, J., Hagen, K.M., et al.: Expression of the potential therapeutic target CXXC5 in primary acute myeloid leukemia cells - high expression is associated with adverse prognosis as well as altered intracellular signaling and transcriptional regulation. Oncotarget 6, 2794–2811 (2015). https://doi.org/10.18632/oncotarget.3056 CrossRef Bruserud, O., Reikvam, H., Fredly, H., Skavland, J., Hagen, K.M., et al.: Expression of the potential therapeutic target CXXC5 in primary acute myeloid leukemia cells - high expression is associated with adverse prognosis as well as altered intracellular signaling and transcriptional regulation. Oncotarget
6, 2794–2811 (2015).
https://doi.org/10.18632/oncotarget.3056
CrossRef
18.
Zurück zum Zitat van Riggelen, J., Yetil, A., Felsher, D.W.: MYC as a regulator of ribosome biogenesis and protein synthesis. Nat. Rev. Cancer 10, 301–309 (2010). https://doi.org/10.1038/nrc2819 CrossRef van Riggelen, J., Yetil, A., Felsher, D.W.: MYC as a regulator of ribosome biogenesis and protein synthesis. Nat. Rev. Cancer
10, 301–309 (2010).
https://doi.org/10.1038/nrc2819
CrossRef
- Titel
- Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests
- DOI
- https://doi.org/10.1007/978-3-319-78759-6_1
- Autoren:
-
Xin Guan
Li Liu
- Sequenznummer
- 1