Top

Published in:

2021 | OriginalPaper | Chapter

Identification of Outliers in Gene Expression Data

Authors : Md. Manzur Rahman Farazi, A. H. M. Rahmatullah Imon

Published in: Data Science and SDGs

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Identification of outliers is a big challenge in big data although it has drawn a great deal of attention in recent years. Among all big data problems, the detection of outliers in gene expression data warrants extra attention because of its inherent complexity. Although a variety of outlier detection methods are available in the literature, Tomlins et al. (Tomlins et al. Science 310:644–648, 2005) argued that traditional analytical methods, for example, a two-sample t-statistic, which search for common activation of genes across a class of cancer samples, will fail to detect cancer genes, which show differential expression in a subset of cancer samples or cancer outliers. They developed the cancer outlier profile analysis (COPA) method to detect cancer genes and outliers. Inspired by the COPA statistic, some authors have proposed other methods for detecting cancer-related genes with cancer outlier profiles in the framework of multiple testing (Tibshirani and Hastie Tibshirani and Hastie Biostatistics 8:2–8, 2007; Wu Wu Biostatistics 8:566–575, 2007; Lian Lian Biostatistics 9:411–418, 2008; Wang and Rekaya Wang and Rekaya Biomarker Insights 5:69–78, 2010). Such cancer outlier analyses are affected by many problems especially if there is an outlier in the dataset then classical measures of location and scale are seriously affected. So the test statistic using these parameters might not be appropriate to detect outliers. In this study, we try to robustify one existing method. We propose a new technique called expressed robust t-statistic (ERT) for the identification of outliers. The usefulness of the proposed methods is then investigated through a Monte Carlo simulation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Role of Serum High-Sensitivity C-Reactive Protein Level as Risk Factor in the Prediction of Coronary Artery Disease in Hyperglycemic Subjects

next chapter Selecting Covariance Structure to Analyze Longitudinal Data: A Study to Model the Body Mass Index of Primary School-Going Children in Bangladesh

Barnett, V., & Lewis, T. B. (1994). Outliers in statistical data (3rd ed.). Wiley.MATH

Breunig, M. M., Kriegel, H. P., Ng, & Sander, J. R. (1999). OPTICS-OF: Identifying local outliers. In Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery (pp. 262–270).

Fan, H., Zaïane, O. R., Foss, A., & Wu, J. (2006). A nonparametric outlier detection for efficiently discovering top-n outliers from engineering data. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore (pp. 557–566).

Filzmoser, P., Ruiz-, A., & Thomas-, C. (2014). Identification of local multivariate outliers. Statistical Papers, 55, 29–47.

Hadi, A. S., & Imon, A. H. M. R. (2018). Identification of multiple outliers in spatial data. International Journal of Statistical Sciences, 16, 87–96.

Hadi, A. S., Imon, A. H. M. R., & Werner, M. (2009). Detection of outliers, wiley interdisciplinary reviews. Computational Statistics, 1, 57–70.

Hawkins, D. M. (1980). Identification of outliers. Chapman and Hall.CrossRef

Imon, A. H. M. R., & Hadi, A. S. (2013). Identification of multiple high leverage points in logistic regression. Journal of Applied Statistics, 40, 2601–2616.MathSciNetCrossRef

Imon, A. H. M. R., & Hadi, A. S. (2020). Identification of multiple unusual observations in spatial regression. Journal of Statistics and Applications ((A Special Issue in Honour of Prof. Bimal K Sinha and Prof. Bikas K Sinha).), 18, 155–162.

Knorr, E., & Ng, R. (1997). A unified notion of outliers: properties and computation. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (pp. 219–222).

Knorr, E., and Ng, R. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of 24th International Conference on Very Large Data Bases (pp. 392–403).

Lian, H. (2008). MOST: Detecting cancer differential gene expression. Biostatistics, 9, 411–418.CrossRef

Liu, F. T., Ting, K. M., & Zhou, Z. (2008). Isolation forest. In Eighth IEEE International Conference on Data Mining (pp. 413–22).

Nurunnabi, A. A. M., Imon, A. H. M. R., & Nasser, M. (2010). Identification of multiple influential observations in logistic regression. Journal of Applied Statistics, 37, 1605–1624.MathSciNetCrossRef

Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 427–438).

Tibshirani, R., & Hastie, T. (2007). Outlier sums for differential gene expression analysis. Biostatistics, 8, 2–8.CrossRef

Tomlins, S. A., Rhodes, D. R., & Perner, S. (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 310, 644–648.CrossRef

Wang, Y., & Rekaya, R. (2010). LSOSS: Detection of cancer outlier differential gene expression. Biomarker Insights, 5, 69–78.CrossRef

Wu, B. (2007). Cancer outlier differential gene expression detection. Biostatistics, 8, 566–575.CrossRef

Title: Identification of Outliers in Gene Expression Data
Authors: Md. Manzur Rahman Farazi
A. H. M. Rahmatullah Imon
Publisher: Springer Singapore
Book: Data Science and SDGs
Print ISBN: 978-981-16-1918-2

Electronic ISBN: 978-981-16-1919-9

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-16-1919-9_11

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"