Skip to main content

2015 | OriginalPaper | Buchkapitel

A Novel Hybridized Rough Set and Improved Harmony Search Based Feature Selection for Protein Sequence Classification

verfasst von : M. Bagyamathi, H. Hannah Inbarani

Erschienen in: Big Data in Complex Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

The progress in bio-informatics and biotechnology area has generated a big amount of sequence data that requires a detailed analysis. Recent advances in future generation sequencing technologies have resulted in a tremendous raise in the rate of that protein sequence data are being obtained. Big Data analysis is a clear bottleneck in many applications, especially in the field of bio-informatics, because of the complexity of the data that needs to be analyzed. Protein sequence analysis is a significant problem in functional genomics. Proteins play an essential role in organisms as they perform many important tasks in their cells. In general, protein sequences are exhibited by feature vectors. A major problem of protein dataset is the complexity of its analysis due to their enormous number of features. Feature selection techniques are capable of dealing with this high dimensional space of features. In this chapter, the new feature selection algorithm that combines the Improved Harmony Search algorithm with Rough Set theory for Protein sequences is proposed to successfully tackle the big data problems. An Improved harmony search (IHS) algorithm is a comparatively new population based meta-heuristic optimization algorithm. This approach imitates the music improvisation process, where each musician improvises their instrument’s pitch by seeking for a perfect state of harmony and it overcomes the limitations of traditional harmony search (HS) algorithm. An Improved Harmony Search hybridized with Rough Set Quick Reduct for faster and better search capabilities. The feature vectors are extracted from protein sequence database, based on amino acid composition and K-mer patterns or K-tuples and then feature selection is carried out from the extracted feature vectors. The proposed algorithm is compared with the two prominent algorithms, Rough Set Quick Reduct and Rough Set based PSO Quick Reduct. The experiments are carried out on protein primary single sequence data sets that are derived from PDB on SCOP classification, based on the structural class predictions such as all

α

, all

β

, all

α

 + 

β

and all

α

/

β

. The feature subset of the protein sequences predicted by both existing and proposed algorithms are analyzed with the decision tree classification algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Metadaten
Titel
A Novel Hybridized Rough Set and Improved Harmony Search Based Feature Selection for Protein Sequence Classification
verfasst von
M. Bagyamathi
H. Hannah Inbarani
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-11056-1_6

Premium Partner