Skip to main content
Top

2021 | OriginalPaper | Chapter

Identification of Outliers in Gene Expression Data

Authors : Md. Manzur Rahman Farazi, A. H. M. Rahmatullah Imon

Published in: Data Science and SDGs

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Identification of outliers is a big challenge in big data although it has drawn a great deal of attention in recent years. Among all big data problems, the detection of outliers in gene expression data warrants extra attention because of its inherent complexity. Although a variety of outlier detection methods are available in the literature, Tomlins et al. (Tomlins et al. Science 310:644–648, 2005) argued that traditional analytical methods, for example, a two-sample t-statistic, which search for common activation of genes across a class of cancer samples, will fail to detect cancer genes, which show differential expression in a subset of cancer samples or cancer outliers. They developed the cancer outlier profile analysis (COPA) method to detect cancer genes and outliers. Inspired by the COPA statistic, some authors have proposed other methods for detecting cancer-related genes with cancer outlier profiles in the framework of multiple testing (Tibshirani and Hastie Tibshirani and Hastie Biostatistics 8:2–8, 2007; Wu Wu Biostatistics 8:566–575, 2007; Lian Lian Biostatistics 9:411–418, 2008; Wang and Rekaya Wang and Rekaya Biomarker Insights 5:69–78, 2010). Such cancer outlier analyses are affected by many problems especially if there is an outlier in the dataset then classical measures of location and scale are seriously affected. So the test statistic using these parameters might not be appropriate to detect outliers. In this study, we try to robustify one existing method. We propose a new technique called expressed robust t-statistic (ERT) for the identification of outliers. The usefulness of the proposed methods is then investigated through a Monte Carlo simulation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Barnett, V., & Lewis, T. B. (1994). Outliers in statistical data (3rd ed.). Wiley.MATH Barnett, V., & Lewis, T. B. (1994). Outliers in statistical data (3rd ed.). Wiley.MATH
go back to reference Breunig, M. M., Kriegel, H. P., Ng, & Sander, J. R. (1999). OPTICS-OF: Identifying local outliers. In Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery (pp. 262–270). Breunig, M. M., Kriegel, H. P., Ng, & Sander, J. R. (1999). OPTICS-OF: Identifying local outliers. In Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery (pp. 262–270).
go back to reference Fan, H., Zaïane, O. R., Foss, A., & Wu, J. (2006). A nonparametric outlier detection for efficiently discovering top-n outliers from engineering data. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore (pp. 557–566). Fan, H., Zaïane, O. R., Foss, A., & Wu, J. (2006). A nonparametric outlier detection for efficiently discovering top-n outliers from engineering data. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore (pp. 557–566).
go back to reference Filzmoser, P., Ruiz-, A., & Thomas-, C. (2014). Identification of local multivariate outliers. Statistical Papers, 55, 29–47. Filzmoser, P., Ruiz-, A., & Thomas-, C. (2014). Identification of local multivariate outliers. Statistical Papers, 55, 29–47.
go back to reference Hadi, A. S., & Imon, A. H. M. R. (2018). Identification of multiple outliers in spatial data. International Journal of Statistical Sciences, 16, 87–96. Hadi, A. S., & Imon, A. H. M. R. (2018). Identification of multiple outliers in spatial data. International Journal of Statistical Sciences, 16, 87–96.
go back to reference Hadi, A. S., Imon, A. H. M. R., & Werner, M. (2009). Detection of outliers, wiley interdisciplinary reviews. Computational Statistics, 1, 57–70. Hadi, A. S., Imon, A. H. M. R., & Werner, M. (2009). Detection of outliers, wiley interdisciplinary reviews. Computational Statistics, 1, 57–70.
go back to reference Imon, A. H. M. R., & Hadi, A. S. (2013). Identification of multiple high leverage points in logistic regression. Journal of Applied Statistics, 40, 2601–2616.MathSciNetCrossRef Imon, A. H. M. R., & Hadi, A. S. (2013). Identification of multiple high leverage points in logistic regression. Journal of Applied Statistics, 40, 2601–2616.MathSciNetCrossRef
go back to reference Imon, A. H. M. R., & Hadi, A. S. (2020). Identification of multiple unusual observations in spatial regression. Journal of Statistics and Applications ((A Special Issue in Honour of Prof. Bimal K Sinha and Prof. Bikas K Sinha).), 18, 155–162. Imon, A. H. M. R., & Hadi, A. S. (2020). Identification of multiple unusual observations in spatial regression. Journal of Statistics and Applications ((A Special Issue in Honour of Prof. Bimal K Sinha and Prof. Bikas K Sinha).), 18, 155–162.
go back to reference Knorr, E., & Ng, R. (1997). A unified notion of outliers: properties and computation. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (pp. 219–222). Knorr, E., & Ng, R. (1997). A unified notion of outliers: properties and computation. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (pp. 219–222).
go back to reference Knorr, E., and Ng, R. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of 24th International Conference on Very Large Data Bases (pp. 392–403). Knorr, E., and Ng, R. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of 24th International Conference on Very Large Data Bases (pp. 392–403).
go back to reference Lian, H. (2008). MOST: Detecting cancer differential gene expression. Biostatistics, 9, 411–418.CrossRef Lian, H. (2008). MOST: Detecting cancer differential gene expression. Biostatistics, 9, 411–418.CrossRef
go back to reference Liu, F. T., Ting, K. M., & Zhou, Z. (2008). Isolation forest. In Eighth IEEE International Conference on Data Mining (pp. 413–22). Liu, F. T., Ting, K. M., & Zhou, Z. (2008). Isolation forest. In Eighth IEEE International Conference on Data Mining (pp. 413–22).
go back to reference Nurunnabi, A. A. M., Imon, A. H. M. R., & Nasser, M. (2010). Identification of multiple influential observations in logistic regression. Journal of Applied Statistics, 37, 1605–1624.MathSciNetCrossRef Nurunnabi, A. A. M., Imon, A. H. M. R., & Nasser, M. (2010). Identification of multiple influential observations in logistic regression. Journal of Applied Statistics, 37, 1605–1624.MathSciNetCrossRef
go back to reference Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 427–438). Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 427–438).
go back to reference Tibshirani, R., & Hastie, T. (2007). Outlier sums for differential gene expression analysis. Biostatistics, 8, 2–8.CrossRef Tibshirani, R., & Hastie, T. (2007). Outlier sums for differential gene expression analysis. Biostatistics, 8, 2–8.CrossRef
go back to reference Tomlins, S. A., Rhodes, D. R., & Perner, S. (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 310, 644–648.CrossRef Tomlins, S. A., Rhodes, D. R., & Perner, S. (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 310, 644–648.CrossRef
go back to reference Wang, Y., & Rekaya, R. (2010). LSOSS: Detection of cancer outlier differential gene expression. Biomarker Insights, 5, 69–78.CrossRef Wang, Y., & Rekaya, R. (2010). LSOSS: Detection of cancer outlier differential gene expression. Biomarker Insights, 5, 69–78.CrossRef
go back to reference Wu, B. (2007). Cancer outlier differential gene expression detection. Biostatistics, 8, 566–575.CrossRef Wu, B. (2007). Cancer outlier differential gene expression detection. Biostatistics, 8, 566–575.CrossRef
Metadata
Title
Identification of Outliers in Gene Expression Data
Authors
Md. Manzur Rahman Farazi
A. H. M. Rahmatullah Imon
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-1919-9_11