Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 4/2020

04-02-2020 | RESEARCH ARTICLE - SPECIAL ISSUE - INTELLIGENT COMPUTING and INTERDISCIPLINARY APPLICATIONS

An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data

Authors: D. M. Deepak Raj, R. Mohanasundaram

Published in: Arabian Journal for Science and Engineering | Issue 4/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Subset feature selection for microarray data is intended to reduce the number of irrelevant features in order to extract significant features from the dataset. Simultaneously, choosing the appropriate features from the high-dimensional dataset may enhance the learning algorithm’s classification precision. Relief algorithms and their variants are successful attribute estimators. MultiSURF is the latest descendant of relief-based approaches; it estimates the feature by preserving target instance-centric neighborhood determination. However, the large number of redundant features can lead to the degraded performance of MultiSURF’s ability to select relevant features from the datasets. In order to select an informative feature from extremely redundant data, we suggest an innovative feature weighting algorithm called boundary margin relief (BMR). BMR’s main concept is to predict the feature weights through the measurement of the local hyperplane, which is typically used in I-Relief techniques. The weights of the features in the proposed methods are very robust in terms of removing redundant features. To show the efficiency of the method, comprehensive studies involving classification tests were conducted on benchmark microarray datasets by combining the suggested technique with conventional classifiers, including support vector machine, k-nearest neighbor, and Naive Bayes. Extensive studies have shown that the proposed method has three notable features: (1) elevated classification accuracy, (2) outstanding redundant robustness, and (3) good stability about different classification algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caliguiri, M.A.; Bloomfield, C.D.; Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRef Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caliguiri, M.A.; Bloomfield, C.D.; Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRef
2.
go back to reference Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)CrossRef Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)CrossRef
3.
go back to reference Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey (2015) Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey (2015)
4.
go back to reference Kohavi, R.; John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRef Kohavi, R.; John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRef
5.
go back to reference Yu, L.; Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference in Machine Learning, pp 856–863. Washington, DC (2003, August) Yu, L.; Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference in Machine Learning, pp 856–863. Washington, DC (2003, August)
6.
go back to reference Huang, J.; Cai, Y.; Xu, X. (2006) A filter approach to feature selection based on mutual information. In: Proceedings of the 5th IEEE International Conference on Cognitive Informatics, pp. 84–89. Beijing, China Huang, J.; Cai, Y.; Xu, X. (2006) A filter approach to feature selection based on mutual information. In: Proceedings of the 5th IEEE International Conference on Cognitive Informatics, pp. 84–89. Beijing, China
7.
go back to reference Fu, L.M.; Fu-Liu, C.S.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinform 6, 67 (2005)CrossRef Fu, L.M.; Fu-Liu, C.S.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinform 6, 67 (2005)CrossRef
8.
go back to reference Risinger, J.I.; Maxwell, G.L.; Chandramouli, G.V.; Jazaeri, A.; Aprelikova, O.; Patterson, T.; Berchuck, A.; Barrett, J.C.: Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer. Cancer Res. 63(1), 6–11 (2003) Risinger, J.I.; Maxwell, G.L.; Chandramouli, G.V.; Jazaeri, A.; Aprelikova, O.; Patterson, T.; Berchuck, A.; Barrett, J.C.: Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer. Cancer Res. 63(1), 6–11 (2003)
9.
go back to reference Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550 (1994)CrossRef Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550 (1994)CrossRef
10.
go back to reference Song, Q.; Ni, J.; Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)CrossRef Song, Q.; Ni, J.; Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)CrossRef
11.
go back to reference Kira, K.; Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. San Jose, California (1992) Kira, K.; Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. San Jose, California (1992)
12.
go back to reference Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Stanford, California (2000) Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Stanford, California (2000)
13.
go back to reference Kira, K.; Rendell, L.: A practical approach to feature selection. In: ML92 Proceedings of the Ninth International Workshop on Machine Learning: pp. 249–256 (1992). https://perma.cc/DY7J-8EGF Kira, K.; Rendell, L.: A practical approach to feature selection. In: ML92 Proceedings of the Ninth International Workshop on Machine Learning: pp. 249–256 (1992). https://​perma.​cc/​DY7J-8EGF
14.
go back to reference Kira, K.; Rendell, L.: The feature selection problem: Traditional method and a new algorithm. In: AAAI’92 Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. (July 1992) Kira, K.; Rendell, L.: The feature selection problem: Traditional method and a new algorithm. In: AAAI’92 Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. (July 1992)
18.
21.
go back to reference Greene, C.S.; Himmelstein, D.S.; Kiralis, J.; Moore, J.H.: The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 6023, pp. 182–193. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12211-8_16 Greene, C.S.; Himmelstein, D.S.; Kiralis, J.; Moore, J.H.: The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 6023, pp. 182–193. Springer, Berlin (2010). https://​doi.​org/​10.​1007/​978-3-642-12211-8_​16
23.
go back to reference Granizo-Mackenzie, D.; Moore, J.H.: Multiple threshold spatially uniform ReliefF for the genetic analysis of complex human diseases. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 7833, pp. 1–10. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-37189-9_1 Granizo-Mackenzie, D.; Moore, J.H.: Multiple threshold spatially uniform ReliefF for the genetic analysis of complex human diseases. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 7833, pp. 1–10. Springer, Berlin (2013). https://​doi.​org/​10.​1007/​978-3-642-37189-9_​1
25.
go back to reference Urbanowicz, R.J.; Meeker, M.; LaCava, W.; Olson, R.S.; Moore, J.H.: Relief-based feature selection: introduction and review. Biomed. Inf. 85, 189–203 (2018). arXiv: 1711.08421 Urbanowicz, R.J.; Meeker, M.; LaCava, W.; Olson, R.S.; Moore, J.H.: Relief-based feature selection: introduction and review. Biomed. Inf. 85, 189–203 (2018). arXiv: 1711.08421
26.
go back to reference Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H.: Benchmarking Relief-based feature selection methods for bioinformatics data mining. Biomed. Inf. 85, 168–188. (April 2018) arXiv: 1711.08477v2. Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H.: Benchmarking Relief-based feature selection methods for bioinformatics data mining. Biomed. Inf. 85, 168–188. (April 2018) arXiv: 1711.08477v2.
27.
go back to reference Sun, Y.; Todorovic, S.; Goodison, S.: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1610–1626 (2010)CrossRef Sun, Y.; Todorovic, S.; Goodison, S.: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1610–1626 (2010)CrossRef
28.
go back to reference Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Francesco, B., Luc, D.-R. (eds.) European Conference on Machine Learning, pp. 171–182. Springer Press, Berlin (1994) Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Francesco, B., Luc, D.-R. (eds.) European Conference on Machine Learning, pp. 171–182. Springer Press, Berlin (1994)
29.
go back to reference Statnikov, A.; Wang, L.; Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319–328 (2008)CrossRef Statnikov, A.; Wang, L.; Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319–328 (2008)CrossRef
Metadata
Title
An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data
Authors
D. M. Deepak Raj
R. Mohanasundaram
Publication date
04-02-2020
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 4/2020
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-020-04380-2

Other articles of this Issue 4/2020

Arabian Journal for Science and Engineering 4/2020 Go to the issue

RESEARCH ARTICLE - SPECIAL ISSUE - INTELLIGENT COMPUTING and INTERDISCIPLINARY APPLICATIONS

New Approaches in Metaheuristic to Classify Medical Data Using Artificial Neural Network

Research Article - Computer Engineering and Computer Science

TQ-Model: A New Evaluation Model for Knowledge-Based Authentication Schemes

Research Article-Computer Engineering and Computer Science

Shortest Path Computation in a Network with Multiple Destinations

Research Article - SPECIAL ISSUE - INTELLIGENT COMPUTING and INTERDISCIPLINARY APPLICATIONS

A Congestion Aware Route Suggestion Protocol for Traffic Management in Internet of Vehicles

Premium Partners