Skip to main content

2015 | OriginalPaper | Buchkapitel

TNorm: An Unsupervised Batch Effects Correction Method for Gene Expression Data Classification

verfasst von : Praisan Padungweang, Worrawat Engchuan, Jonathan H. Chan

Erschienen in: Neural Information Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the field of biomedical research, gene expression analysis helps to identify the disease-related genes as genetic markers for diagnosis. As there is a huge number of publicly available gene expression datasets, the ongoing challenge is to utilize those available data effectively. Merging microarray datasets from different batches to improve the statistical power of a study is one of the active research topics. However, various works have addressed the issue of batch effects variation, which describes variation in gene expression levels induced by different experimental environments. Ignoring this variation may result in erroneous findings in a study. This work proposes a method for batch effect correction by mapping underlying topology of different batches. The mapping process for cross-batch normalization is examined using basic linear transformation. The comparative study of three cancers is conducted to compare the proposed method with a proven batch effects correction method. The results show that our method outperforms the existing method in most cases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Su, A.I., Welsh, J.B., Sapinoso, L.M., Kern, S.G., Dimitrov, P., Lapp, H., Schultz, P.G., Powell, S.M., Moskaluk, C.A., Frierson Jr., H.F., Hampton, G.M.: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61, 7388–7393 (2001) Su, A.I., Welsh, J.B., Sapinoso, L.M., Kern, S.G., Dimitrov, P., Lapp, H., Schultz, P.G., Powell, S.M., Moskaluk, C.A., Frierson Jr., H.F., Hampton, G.M.: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61, 7388–7393 (2001)
3.
Zurück zum Zitat Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Meijer-van Gelder, M.E., Yu, J., Jatkoe, T., Berns, E.M.J.J., Atkins, D., Foekens, J.A.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679 (2005)CrossRef Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Meijer-van Gelder, M.E., Yu, J., Jatkoe, T., Berns, E.M.J.J., Atkins, D., Foekens, J.A.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679 (2005)CrossRef
4.
Zurück zum Zitat Dupuy, A., Simon, R.M.: Critical review of published miroarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl Cancer Inst. 99, 147–157 (2007)CrossRef Dupuy, A., Simon, R.M.: Critical review of published miroarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl Cancer Inst. 99, 147–157 (2007)CrossRef
5.
Zurück zum Zitat Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays a multiple random validation strategy. Lancet 365, 488–492 (2005)CrossRef Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays a multiple random validation strategy. Lancet 365, 488–492 (2005)CrossRef
6.
Zurück zum Zitat Ein-Dor, L., Suk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. U.S.A. 103, 5923–5928 (2006)CrossRef Ein-Dor, L., Suk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. U.S.A. 103, 5923–5928 (2006)CrossRef
7.
Zurück zum Zitat Xu, L., Tan, A.C., Winslow, R.L., Geman, D.: Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinf. 9, 125 (2008)CrossRef Xu, L., Tan, A.C., Winslow, R.L., Geman, D.: Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinf. 9, 125 (2008)CrossRef
8.
Zurück zum Zitat Shabalin, A.A., Tjelmeland, H., Fan, C., Perou, C.M.: Merging two gene-expression studies via cross-platform normalization. Bioinformatics 24, 1154 (2008)CrossRef Shabalin, A.A., Tjelmeland, H., Fan, C., Perou, C.M.: Merging two gene-expression studies via cross-platform normalization. Bioinformatics 24, 1154 (2008)CrossRef
9.
Zurück zum Zitat Wang, Y., Joshi, T., Zhang, X.S., Xu, D., Chen, L.: Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics 22, 2413 (2006)CrossRef Wang, Y., Joshi, T., Zhang, X.S., Xu, D., Chen, L.: Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics 22, 2413 (2006)CrossRef
10.
Zurück zum Zitat Choi, H., Shen, R., Chinnaiyan, A.M., Ghosh, D.: A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinf. 8, 364 (2007)CrossRef Choi, H., Shen, R., Chinnaiyan, A.M., Ghosh, D.: A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinf. 8, 364 (2007)CrossRef
11.
Zurück zum Zitat Warnat, P., Eils, R., Brors, B.: Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinf. 6, 265 (2005)CrossRef Warnat, P., Eils, R., Brors, B.: Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinf. 6, 265 (2005)CrossRef
12.
Zurück zum Zitat Larsen, M.J., Thomassen, M., Tan, Q., Sorensen, K.P., Kruse, T.A.: Microarray-based RNA profiling of Breast cancer: batch effect removal improves cross-platform consistency. BioMed Res. Int. 2014, 11 (2014)CrossRef Larsen, M.J., Thomassen, M., Tan, Q., Sorensen, K.P., Kruse, T.A.: Microarray-based RNA profiling of Breast cancer: batch effect removal improves cross-platform consistency. BioMed Res. Int. 2014, 11 (2014)CrossRef
13.
Zurück zum Zitat Engchuan, W., Meechai, A., Tongsima, S., Chang, J.H.: Handling batch effect on cross-platform classification of microarray data. Int. J. Adv. Intell. Paradigms (in press) Engchuan, W., Meechai, A., Tongsima, S., Chang, J.H.: Handling batch effect on cross-platform classification of microarray data. Int. J. Adv. Intell. Paradigms (in press)
14.
Zurück zum Zitat Johnson, W.E., Li, C.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118 (2007)CrossRefMATH Johnson, W.E., Li, C.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118 (2007)CrossRefMATH
15.
Zurück zum Zitat Marian, P., Wesam, B., Colin, F.: Topology-preserving mappings for data visualization, pp. 131–150. Principal Manifolds for Data Visualization and Dimension Reduction. Springer, Berlin Heidelberg (2008) Marian, P., Wesam, B., Colin, F.: Topology-preserving mappings for data visualization, pp. 131–150. Principal Manifolds for Data Visualization and Dimension Reduction. Springer, Berlin Heidelberg (2008)
16.
Zurück zum Zitat Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–229 (2002)CrossRef Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–229 (2002)CrossRef
17.
Zurück zum Zitat Turashvili, G., Bouchal, J., Baumforth, K., Wei, W., Dziechciarkova, M., Ehrmann, J., Klein, J., Fridman, E., Skarda, J., Srovnal, J., Hajduch, M., Murray, P., Kolar, Z.: Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer 7, 55 (2007)CrossRef Turashvili, G., Bouchal, J., Baumforth, K., Wei, W., Dziechciarkova, M., Ehrmann, J., Klein, J., Fridman, E., Skarda, J., Srovnal, J., Hajduch, M., Murray, P., Kolar, Z.: Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer 7, 55 (2007)CrossRef
18.
Zurück zum Zitat Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X., Iglehart, J.D., Livingston, D.M., Ganesan, S.: X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 9, 121–132 (2006)CrossRef Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X., Iglehart, J.D., Livingston, D.M., Ganesan, S.: X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 9, 121–132 (2006)CrossRef
19.
Zurück zum Zitat Hong, Y., Ho, K.S., Eu, K.W., Cheah, P.Y.: A susceptibility gene set for early onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis. Clin. Cancer Res. 13, 1107–1114 (2007)CrossRef Hong, Y., Ho, K.S., Eu, K.W., Cheah, P.Y.: A susceptibility gene set for early onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis. Clin. Cancer Res. 13, 1107–1114 (2007)CrossRef
20.
Zurück zum Zitat Sabates-Bellver, J., Van der Flier, L.G., de Palo, M., Cattaneo, E., Maake, C., Rehrauer, H., Laczko, E., Kurowski, M.A., Bujnicki, J.M., Menigatti, M., Luz, J., Ranalli, T.V., Gomes, V., Pastorelli, A., Faggiani, R., Anti, M., Jiricny, J., Clevers, H., Marra, G.: Transcriptome profile of human colorectal adenomas. Mol. Cancer Res. 5, 1263–1275 (2007)CrossRef Sabates-Bellver, J., Van der Flier, L.G., de Palo, M., Cattaneo, E., Maake, C., Rehrauer, H., Laczko, E., Kurowski, M.A., Bujnicki, J.M., Menigatti, M., Luz, J., Ranalli, T.V., Gomes, V., Pastorelli, A., Faggiani, R., Anti, M., Jiricny, J., Clevers, H., Marra, G.: Transcriptome profile of human colorectal adenomas. Mol. Cancer Res. 5, 1263–1275 (2007)CrossRef
21.
Zurück zum Zitat Spira, A., Beane, J.E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gliman, S., Dumas, Y.M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M.E., Brody, J.S.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13, 361–366 (2007)CrossRef Spira, A., Beane, J.E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gliman, S., Dumas, Y.M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M.E., Brody, J.S.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13, 361–366 (2007)CrossRef
22.
Zurück zum Zitat Landi, M.T., Dracheva, T., Rotunno, M., Figueroa, J.D., Liu, H., Dasgupta, A., Mann, R.E., Fukuoka, J., Hames, M., Bergen, A.W., Murphy, S.E., Yang, P., Pesatori, A.C., Consonni, D., Bertazzi, P.A., Wacholder, S., Shih, J.H., Caporaso, N.E., Jen, J.: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3, e1651 (2008)CrossRef Landi, M.T., Dracheva, T., Rotunno, M., Figueroa, J.D., Liu, H., Dasgupta, A., Mann, R.E., Fukuoka, J., Hames, M., Bergen, A.W., Murphy, S.E., Yang, P., Pesatori, A.C., Consonni, D., Bertazzi, P.A., Wacholder, S., Shih, J.H., Caporaso, N.E., Jen, J.: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3, e1651 (2008)CrossRef
23.
Zurück zum Zitat Sootanan, P., Prom-on, S., Meechai, A., Chan, J.H.: Pathway-based microarray analysis for robust disease classification. Neural Comput. Appl. 21, 649–660 (2011)CrossRef Sootanan, P., Prom-on, S., Meechai, A., Chan, J.H.: Pathway-based microarray analysis for robust disease classification. Neural Comput. Appl. 21, 649–660 (2011)CrossRef
24.
Zurück zum Zitat Engchuan, W., Chan, J.H.: Pathway activity transformation for multi-class classification of Lung cancer datasets. Neurocomputing 165, 81–89 (2014)CrossRef Engchuan, W., Chan, J.H.: Pathway activity transformation for multi-class classification of Lung cancer datasets. Neurocomputing 165, 81–89 (2014)CrossRef
25.
Zurück zum Zitat Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis, a knowledge-based approached for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005)CrossRef Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis, a knowledge-based approached for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005)CrossRef
26.
Zurück zum Zitat Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)CrossRef Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)CrossRef
27.
Zurück zum Zitat Hall, M.A.: Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand (1998) Hall, M.A.: Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand (1998)
28.
Zurück zum Zitat Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced dataset: A review. GESTS Int. Trans. ComSci. Eng. 30, 25–36 (2006) Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced dataset: A review. GESTS Int. Trans. ComSci. Eng. 30, 25–36 (2006)
Metadaten
Titel
TNorm: An Unsupervised Batch Effects Correction Method for Gene Expression Data Classification
verfasst von
Praisan Padungweang
Worrawat Engchuan
Jonathan H. Chan
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-26532-2_45