Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2019 | OriginalPaper | Chapter

Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

Authors : Jelena Fiosina, Maksims Fiosins, Stefan Bonn

Published in: Bioinformatics Research and Applications

Publisher: Springer International Publishing

share
SHARE

Abstract

The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The “one dataset out” average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for ‘unseen’ datasets.
Literature
1.
go back to reference Backes, C., Khaleeq, Q.T., et al.: miEAA: microRNA enrichment analysis and annotation. Nucleic Acids Res. 44(W1), W110–W116 (2016) CrossRef Backes, C., Khaleeq, Q.T., et al.: miEAA: microRNA enrichment analysis and annotation. Nucleic Acids Res. 44(W1), W110–W116 (2016) CrossRef
2.
go back to reference Ellis, S., et al.: Improving the value of public RNA-SEQ expression data by phenotype prediction. Nucleic Acids Res. 46(9), e54 (2018) CrossRef Ellis, S., et al.: Improving the value of public RNA-SEQ expression data by phenotype prediction. Nucleic Acids Res. 46(9), e54 (2018) CrossRef
4.
go back to reference Guo, L., et al.: miRNA and mRNA expression analysis reveals potential sex-biased miRNA expression. Sci. Rep. 7, 39812 (2017) CrossRef Guo, L., et al.: miRNA and mRNA expression analysis reveals potential sex-biased miRNA expression. Sci. Rep. 7, 39812 (2017) CrossRef
5.
go back to reference Guo, Z., Maki, M., et al.: Genome-wide survey of tissue-specific microRNA and transcription factor regulatory networks in 12 tissues. Sci. Rep. 4, 5150 (2014) CrossRef Guo, Z., Maki, M., et al.: Genome-wide survey of tissue-specific microRNA and transcription factor regulatory networks in 12 tissues. Sci. Rep. 4, 5150 (2014) CrossRef
6.
go back to reference Hadley, D., Pan, J., et al.: Precision annotation of digital samples in NCBI’s gene expression omnibus. Sci. Data 4, 170125 (2017) CrossRef Hadley, D., Pan, J., et al.: Precision annotation of digital samples in NCBI’s gene expression omnibus. Sci. Data 4, 170125 (2017) CrossRef
7.
8.
go back to reference Li, Y., et al.: Deep learning in bioinformatics: introduction, application, and perspective in big data era. bioRxiv (2019) Li, Y., et al.: Deep learning in bioinformatics: introduction, application, and perspective in big data era. bioRxiv (2019)
9.
go back to reference Madan, S., Fiosins, M., et al.: A semantic data integration methodology for translational neurodegenerative disease research. Figshare (2018) Madan, S., Fiosins, M., et al.: A semantic data integration methodology for translational neurodegenerative disease research. Figshare (2018)
11.
go back to reference Rahman, R.U., et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinform. 19, 54 (2018) CrossRef Rahman, R.U., et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinform. 19, 54 (2018) CrossRef
12.
go back to reference Simon, L., et al.: Human platelet microRNA-mRNA networks associated with age and gender revealed by integrated plateletomics. Blood 123, e37–e45 (2014) CrossRef Simon, L., et al.: Human platelet microRNA-mRNA networks associated with age and gender revealed by integrated plateletomics. Blood 123, e37–e45 (2014) CrossRef
13.
go back to reference Statnikov, A., Wang, L., Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319 (2008) CrossRef Statnikov, A., Wang, L., Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319 (2008) CrossRef
14.
go back to reference Sun, Y., Koo, S., et al.: Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res. 32(22), e188 (2004) CrossRef Sun, Y., Koo, S., et al.: Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res. 32(22), e188 (2004) CrossRef
15.
16.
go back to reference Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016) CrossRef Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016) CrossRef
17.
go back to reference Xiao, T., et al.: Learning from massive noisy labeled data for image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2691–2699 (2015) Xiao, T., et al.: Learning from massive noisy labeled data for image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2691–2699 (2015)
Metadata
Title
Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles
Authors
Jelena Fiosina
Maksims Fiosins
Stefan Bonn
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-20242-2_14

Premium Partner