Skip to main content
Erschienen in: Neural Computing and Applications 1/2014

01.01.2014 | ICONIP 2012

Clustering based on median and closest string via rank distance with applications on DNA

verfasst von: Liviu P. Dinu, Radu Tudor Ionescu

Erschienen in: Neural Computing and Applications | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper aims to present several clustering methods based on rank distance. Rank distance has applications in many different fields such as computational linguistics, biology and computer science. The K-means algorithm represents each cluster by a single mean vector. The mean vector is computed with respect to a distance measure. Two K-means algorithms based on rank distance are described in this paper. Hierarchical clustering builds models based on distance connectivity. This paper describes two hierarchical clustering techniques that use rank distance. Experiments using mitochondrial DNA sequences extracted from several mammals are performed to compare the results of the clustering methods. Results demonstrate the clustering performance and the utility of the proposed algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chimani M, Woste M, Bocker S (2011) A closer look at the closest string and closest substring problem. In: Proceedings of ALENEX, pp 13–24 Chimani M, Woste M, Bocker S (2011) A closer look at the closest string and closest substring problem. In: Proceedings of ALENEX, pp 13–24
2.
Zurück zum Zitat de la Higuera C, Casacuberta F (2000) Topology of strings: median string is np-complete. Theor Comput Sci 230:39–48CrossRefMATH de la Higuera C, Casacuberta F (2000) Topology of strings: median string is np-complete. Theor Comput Sci 230:39–48CrossRefMATH
3.
Zurück zum Zitat Diaconis P, Graham RL (1977) Spearman footrule as a measure of disarray. J R Stat Soc Ser B (Methodological) 39(2):262–268MATHMathSciNet Diaconis P, Graham RL (1977) Spearman footrule as a measure of disarray. J R Stat Soc Ser B (Methodological) 39(2):262–268MATHMathSciNet
4.
Zurück zum Zitat Dinu LP (2003) On the classification and aggregation of hierarchies with different constitutive elements. Fundamenta Informaticae 55(1):39–50MATHMathSciNet Dinu LP (2003) On the classification and aggregation of hierarchies with different constitutive elements. Fundamenta Informaticae 55(1):39–50MATHMathSciNet
5.
Zurück zum Zitat Dinu A, Dinu LP (2005) On the syllabic similarities of romance languages. In: Proceedings of CICLing 3406, pp 785–788 Dinu A, Dinu LP (2005) On the syllabic similarities of romance languages. In: Proceedings of CICLing 3406, pp 785–788
6.
Zurück zum Zitat Dinu LP, Ionescu RT (2012) An efficient rank based approach for closest string and closest substring. PLoS One 7(6):e37576CrossRef Dinu LP, Ionescu RT (2012) An efficient rank based approach for closest string and closest substring. PLoS One 7(6):e37576CrossRef
7.
Zurück zum Zitat Dinu LP, Ionescu RT (2012a) Clustering based on rank distance with applications on DNA. In: Proceedings of ICONIP 7667 Dinu LP, Ionescu RT (2012a) Clustering based on rank distance with applications on DNA. In: Proceedings of ICONIP 7667
8.
Zurück zum Zitat Dinu LP, Ionescu RT (2012b) Clustering methods based on closest string via rank distance. In: Proceedings of SYNASC, pp 207–214 Dinu LP, Ionescu RT (2012b) Clustering methods based on closest string via rank distance. In: Proceedings of SYNASC, pp 207–214
9.
10.
Zurück zum Zitat Dinu LP, Popa A (2012) On the closest string via rank distance. In: Proceedings of CPM 7354, pp 413–426 Dinu LP, Popa A (2012) On the closest string via rank distance. In: Proceedings of CPM 7354, pp 413–426
11.
Zurück zum Zitat Dinu LP, Sgarro A (2006) A low-complexity distance for DNA strings. Fundamenta Informaticae 73(3):361–372MATHMathSciNet Dinu LP, Sgarro A (2006) A low-complexity distance for DNA strings. Fundamenta Informaticae 73(3):361–372MATHMathSciNet
12.
13.
Zurück zum Zitat Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef
14.
Zurück zum Zitat Kailing K, Kriegel HP, Kroger P (2004) Density-connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM international conference on data mining Kailing K, Kriegel HP, Kroger P (2004) Density-connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM international conference on data mining
15.
Zurück zum Zitat Koonin EV (1999) The emerging paradigm and open problems in comparative genomics. Bioinformatics 15:265–266CrossRef Koonin EV (1999) The emerging paradigm and open problems in comparative genomics. Bioinformatics 15:265–266CrossRef
16.
17.
18.
Zurück zum Zitat Liew AW, Yan H, Yang M (2005) Pattern recognition techniques for the emerging field of bioinformatics: a review. Pattern Recognit 38(11):2055–2073CrossRef Liew AW, Yan H, Yang M (2005) Pattern recognition techniques for the emerging field of bioinformatics: a review. Pattern Recognit 38(11):2055–2073CrossRef
19.
Zurück zum Zitat McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of ACM SIGKDD, pp 169–178 McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of ACM SIGKDD, pp 169–178
20.
Zurück zum Zitat Nicolas F, Rivals E (2003) Complexities of centre and median string 2676:315–327 Nicolas F, Rivals E (2003) Complexities of centre and median string 2676:315–327
21.
Zurück zum Zitat Nicolas F, Rivals E (2005) Hardness results for the center and median string problems under the weighted and unweighted edit distances. J Discret Algorithms 3(2–4):390–415MATHMathSciNet Nicolas F, Rivals E (2005) Hardness results for the center and median string problems under the weighted and unweighted edit distances. J Discret Algorithms 3(2–4):390–415MATHMathSciNet
22.
Zurück zum Zitat Palmer J, Herbon L (1988) Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evolut 28:87–89CrossRef Palmer J, Herbon L (1988) Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evolut 28:87–89CrossRef
23.
Zurück zum Zitat Popov YV (2007) Multiple genome rearrangement by swaps and by element duplications. Theor Comput Sci 385(1–3):115–126CrossRefMATH Popov YV (2007) Multiple genome rearrangement by swaps and by element duplications. Theor Comput Sci 385(1–3):115–126CrossRefMATH
24.
Zurück zum Zitat Reyes A, Gissi C, Pesole G, Catzeflis FM, Saccone C (2000) Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris. Mol Biol Evol 17(6):979–983CrossRef Reyes A, Gissi C, Pesole G, Catzeflis FM, Saccone C (2000) Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris. Mol Biol Evol 17(6):979–983CrossRef
25.
Zurück zum Zitat Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell PAMI 6(1):81–87 Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell PAMI 6(1):81–87
27.
Zurück zum Zitat States DJ, Agarwal P (1996) Compact encoding strategies for DNA sequence similarity search. In: Proceedings of the 4th international conference on intelligent systems for molecular biology, pp 211–217 States DJ, Agarwal P (1996) Compact encoding strategies for DNA sequence similarity search. In: Proceedings of the 4th international conference on intelligent systems for molecular biology, pp 211–217
28.
Zurück zum Zitat Tian TZ, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114CrossRef Tian TZ, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114CrossRef
29.
Zurück zum Zitat Wooley JC (1999) Trends in computational biology: a summary based on a recomb plenary lecture. J Comput Biol 6:459–474CrossRef Wooley JC (1999) Trends in computational biology: a summary based on a recomb plenary lecture. J Comput Biol 6:459–474CrossRef
30.
Metadaten
Titel
Clustering based on median and closest string via rank distance with applications on DNA
verfasst von
Liviu P. Dinu
Radu Tudor Ionescu
Publikationsdatum
01.01.2014
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 1/2014
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-013-1468-x

Weitere Artikel der Ausgabe 1/2014

Neural Computing and Applications 1/2014 Zur Ausgabe