Feature Selection in Gene Expression Data Using Principal Component Analysis and Rough Set Theory

Mishra, Debahuti; Dash, Rajashree; Rath, Amiya Kumar; Acharya, Milu

doi:10.1007/978-1-4419-7046-6_10

Debahuti Mishra³,
Rajashree Dash,
Amiya Kumar Rath &
…
Milu Acharya

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 696))

2854 Accesses
14 Citations

Abstract

In many fields such as data mining, machine learning, pattern recognition and signal processing, data sets containing huge number of features are often involved. Feature selection is an essential data preprocessing technique for such high-dimensional data classification tasks. Traditional dimensionality reduction approach falls into two categories: Feature Extraction (FE) and Feature Selection (FS). Principal component analysis is an unsupervised linear FE method for projecting high-dimensional data into a low-dimensional space with minimum loss of information. It discovers the directions of maximal variances in the data. The Rough set approach to feature selection is used to discover the data dependencies and reduction in the number of attributes contained in a data set using the data alone, requiring no additional information. For selecting discriminative features from principal components, the Rough set theory can be applied jointly with PCA, which guarantees that the selected principal components will be the most adequate for classification. We call this method Rough PCA. The proposed method is successfully applied for choosing the principal features and then applying the Upper and Lower Approximations to find the reduced set of features from a gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jensen R, “Performing feature selection with ACO. Swarm intelligence and data mining”, Abraham A, Grosan C and Ramos V (eds.), Studies in Computational Intelligence, 34, 2006, 45–73.
Google Scholar
Yan J, Zhang B, Liu N, Yan S, Cheng Q, Fan W, Yang Q, Xi W, Chen Z, “Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing”, IEEE transactions on knowledge and data engineering, 18, 3, 2006, 320–333.
Article Google Scholar
Davy M, Luz S, “Dimensionality reduction for active learning with nearest neighbour classifier in text categorisation problems”, Sixth International Conference on Machine Learning and Applications, 2007.
Google Scholar
Swiniarski RW, “Rough sets methods in feature reduction and classification” International Journal of Applied Mathematics and Computer Science, 11, 3, 2001, 565–582.
Google Scholar
Jollie IT, “Principal Component Analysis”, Springer-Verlag, New York, 1986.
Google Scholar
Pawlak Z, “Rough Sets: Theoretical Aspects of Reasoning About Data”, Kluwer Academic Publishing, Dordrecht, 1991.
Google Scholar
Polkowski L, Lin TY, Tsumoto S (Eds), “Rough set methods and applications: new developments in knowledge discovery in information systems”, Vol. 56. Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg, 2000.
Google Scholar
Pawlak Z, “Roughsets” International Journal of Computer and Information Sciences, 11, 1982, 341–356.
Google Scholar
UCI Repository for Machine Learning Databases retrieved from the World Wide Web: http://www.ics.uci.edu
Han J, Kamber M, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001 279–325.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Institute of Technical Education & Research, Siksha O Anusandhan University, Bhubaneswar, Orissa, India
Debahuti Mishra

Authors

Debahuti Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Rajashree Dash
View author publications
You can also search for this author in PubMed Google Scholar
Amiya Kumar Rath
View author publications
You can also search for this author in PubMed Google Scholar
Milu Acharya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debahuti Mishra .

Editor information

Editors and Affiliations

Dept. Computer Science, University of Georgia, Athens, 30602-7404, Georgia, USA
Hamid R. Arabnia
, Department of Computer Science, Lamar University, Beaumont, 77710, Texas, USA
Quoc-Nam Tran

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mishra, D., Dash, R., Rath, A.K., Acharya, M. (2011). Feature Selection in Gene Expression Data Using Principal Component Analysis and Rough Set Theory. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_10

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7046-6_10
Published: 15 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics