nach oben

Evolutionary Intelligence

Erschienen in:

24.09.2019 | Special Issue

Data augmentation for cancer classification in oncogenomics: an improved KNN based approach

verfasst von: Poonam Chaudhari, Himanshu Agarwal, Vikrant Bhateja

Erschienen in: Evolutionary Intelligence | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

There is currently a great need for research in gene expression data to help with cancer classification in the field of oncogenomics. This is especially true since the disease occurs sporadically and often does not show symptoms. Typically, gene expression data is disproportionate with a large number of features and a low number of samples. A small sample size is likely to adversely affect accuracy of classification, as the performance of a classifier depends largely on the data. There is a pressing need to generate data which could be provided as better input to classifiers. Primitive augmentation techniques like uniform random generation and addition of noise do not assure good probability distribution. Secondly, as we deal with critical applications, the augmented data needs to have greater likelihood to the original values. Thus, we propose an improved variant of K-nearest neighborhood (KNN) rule. We use Counting Quotient Filter, Euclidean distance and mean best value from the k-neighbors for each target sample to get synthetic samples. A comparison is drawn amongst the raw data from public domain (original data), data generated using standard K-nearest neighbor rule and data generated using improved K-nearest neighbor rule. The data generated through these approaches is then further classified using state-of-art classifiers like SVM, J48 and DNN. The samples generated through our improvisation technique yield better recall values than the standard implementation; ensuring sensitivity of data. Average classification accuracy from all the three classifiers conclude enhancement of 7.72% as compared to traditional KNN approach and 16% when raw data is considered as input to the classifiers. Thus, the proposed algorithm attains two objectives; firstly, ensuring sensitivity of data for critical applications and secondly, enhancing classification accuracy.

Vorheriger Artikel An improved analytical approach for customer churn prediction using Grey Wolf Optimization approach based on stochastic customer profiling over a retail shopping analysis: CUPGO

Nächster Artikel Deep learning based dynamic task offloading in mobile cloudlet environments

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bao L, Juan C, Li J, Zhang Y (2016) Boosted near-miss under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets. Neurocomputing 172:198–206CrossRef

Beckmann M, Ebecken NFF, Lima B (2015) A KNN undersampling approach for data balancing. J Intell Learn Syst Appl 7(4):104–116

Bharathi A, Natarajan AM (2011) Cancer classification using support vector machines and relevance vector machine based on analysis of variance features. J Comput Sci 7(9):1393–1399CrossRef

Bhat RR, Viswanath V, Li X (2016) DeepCancer: detecting cancer through gene expressions via deep generative learning. In: IEEE 15th international conference on dependable, autonomic and secure computing, 15th international conference on pervasive intelligence and computing, 3rd international conference on big data intelligence and computing and cyber science and technology congress

Blagus L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinf. https://doi.org/10.1186/1471-2105-14-106CrossRef

Cao Z, Zhang S (2018) Sequence analysis simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics, ISSN 1460-2059

Carpten JC, Mardis ER (2018) The era of precision oncogenomics, Article from Cold Spring Harbor Molecular Case Studies. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5880272. Accessed 8 Nov 2018

Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef

Clarkson K (1987) New applications of random sampling in computational geometry. Discrete Comput Geom 2(2):195–222MathSciNetCrossRef

10.

Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory Arch 13(1):21–27CrossRef

11.

Domingos P, Hulten G (2001) Learning from infinite data in finite time. In: Levi E (ed) Advances in neural information processing systems, pp 673–680

12.

Duda et al (2000) Chapter non parametric techniques. In: Pattern classification, Wiley Interscience Publication, New York

13.

Eghbal-zadeh H, Widmer G (2017) Likelihood estimation for generative adversarial networks, ICML Workshop on Implicit models, Machine Learning. Artificial Intelligence. arXiv:1707.07530

14.

Gu J, Taylor CR, Phil D (2014) Practicing pathology in the era of big data and personalized medicine. Appl Immunohistochem Mol Morphol 22:1–9CrossRef

15.

Hall P, Samworth BU (2008) Choice of neighbor order in nearest-neighbor classification. Ann Stat 36(5):2135–2152MathSciNetMATH

16.

He H, Bai Y et al (2008) ADASYN: adaptive synthetic sampling for imbalanced data. In: IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). https://doi.org/10.1109/ijcnn.2008.4633969

17.

Hu S, Liang Y, Ma L, He Y (2010) MSMOTE: improving classification performance when training data is imbalanced. In: IEEE Xplore second international workshop on computer science and engineering. https://doi.org/10.1109/wcse.2009.756

18.

Hussain Z et al (2018) Differential data augmentation techniques for medical imaging classification tasks. In: AMIA annual symposium, pp 979–984

19.

Kaya Y, Pehlival H (2015) Comparison of classification algorithms in ECG beats by time series. In: IEEE Xplore 23nd signal processing and communications applications conference (SIU). https://doi.org/10.1109/siu.2015.7129845

20.

Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef

21.

Liu CH, Papadopoulou E, Lee D-T (2015) The k-nearest-neighbor Voronoi diagram revisited. J Algorithmica 71(2):429–449MathSciNetCrossRef

22.

Ming H (2018) How to handle imbalance data. https://medium.com/james-blogs/handling-imbalanced-data-in-classification-problems-7de598c1059f. Accessed 27 Apr 2019

23.

Mohsena H, El-Dahshan ESA, El-Horbaty E-SM, Salem A-BM (2018) Classification using deep learning neural networks for brain tumors. Future Comput Inf J 3(1):68–71CrossRef

24.

More A (2016) Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv:1608.06048v1

25.

NCBI. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4115. Accessed 14 Mar 2019

26.

NCBI. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6919. Accessed 1 Jan 2019

27.

NCBI. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4619. Accessed 20 Feb 2019

28.

NCBI. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11223. Accessed 25 Feb 2019

29.

NCBI. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27567. Accessed 19 Apr 2019

30.

O’Rourke J (1982) Computing the relative neighborhood graph in the L1 and L∞ metrics. Pattern Recogn 15(3):189–192CrossRef

31.

Lucas A, Lopez-Tapia S, Molina R, Katsaggelos A (2019) Generative adversarial networks and perceptual losses for video super-resolution, IEEE Transactions on Image Processing-Early Access, Computer vision and pattern recognition. https://doi.org/10.1109/TIP.2019.2895768

32.

Rung J, Brazma A (2012) Reuse of public genome-wide gene expression data. Nat Rev Genet. ISSN 1471-0064

33.

Scitable by Nature Education (2014) Gene expression. https://www.nature.com/scitable/topicpage/gene-expression-14121669

34.

Singh A, Dutta MK, Sharma DK (2016) Unique identification code for medical fundus images using blood vessel pattern for tele-ophthalmology applications. Comput Methods Programs Biomed 135:61–75CrossRef

35.

Thirumuruganathan S (2010) A detailed introduction to K-nearest neighbor (KNN) algorithm. https://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm. Accessed 13 Dec 2018

36.

Venkatesan E, Velmurugan T (2015) Performance analysis of decision tree algorithms for breast cancer classification. Indian J Sci Technol 8:1–8CrossRef

37.

WHA (2004) 57.13: Genomics and World Health, Fifty Seventh World Health Assembly Resolution

38.

WHO (2002) Genomics and World Health: Report of the Advisory Committee on Health research, Geneva. https://apps.who.int/iris/handle/10665/42453. Accessed 21 Dec 2018

39.

WHO (2019) Cancer: Early Detection. https://www.who.int/cancer/detection/en. Accessed 17 May 2019

40.

Wong S et al (2016) Understanding data augmentation for classification: when to warp? In: International conference on digital image computing: techniques and applications (DICTA)

41.

Yadav BSM, Velagaleti SB (2018) Challenges in handling imbalanced big data: a survey. TROI 5(3):1–58

42.

Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for kNN classification. ACM Trans Intell Syst Technol 8(3), Article 43

43.

Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C (2018) A reliable method for colorectal cancer prediction based on feature selection and support vector machine. J Med Biol Eng Comput 57(4):901–912CrossRef

Titel: Data augmentation for cancer classification in oncogenomics: an improved KNN based approach
verfasst von: Poonam Chaudhari
Himanshu Agarwal
Vikrant Bhateja
Publikationsdatum: 24.09.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: Evolutionary Intelligence / Ausgabe 2/2021
Print ISSN: 1864-5909
Elektronische ISSN: 1864-5917
DOI: https://doi.org/10.1007/s12065-019-00283-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2021

On deep ensemble CNN–SAE based novel agro-market price forecasting

Design and analysis of behaviour based DDoS detection algorithm for data centres in cloud

Assessment of electromyograms using genetic algorithm and artificial neural networks

Cardiomyopathy -induced arrhythmia classification and pre-fall alert generation using Convolutional Neural Network and Long Short-Term Memory model

Particle swarm optimization with adaptive inertia weight based on cumulative binomial probability

Automated classification of retinal images into AMD/non-AMD Class—a study using multi-threshold and Gassian-filter enhanced images