Skip to main content

2021 | OriginalPaper | Buchkapitel

Comparison of Hierarchical Clustering Methods for Binary Data From SSR and ISSR Molecular Markers

verfasst von : Emmanouil D. Pratsinakis, Lefkothea Karapetsi, Symela Ntoanidou, Angelos Markos, Panagiotis Madesis, Ilias Eleftherohorinos, George Menexes

Erschienen in: Data Analysis and Rationality in a Complex World

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data from molecular markers, which are used to construct dendrograms based on genetic distances between different plant species, are encoded as binary data. For the construction of the dendrograms, the most commonly used linkage method is the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) in combination with the squared Euclidean distance. It seems that in this scientific field, this is, the “golden standard” clustering method. In this study, a comparison of 189 clustering methods (except the “golden standard”), that is seven linkage methods in the sense that this methodological scheme is used in the vast majority of the corresponding studies by 27 appropriate distances along with the Benzécri’s chi-squared distance in combination with the Ward’s linkage method, is attempted using data originating from molecular markers applied on pear trees species and Sinapis arvensis populations. Fruit trees cluster analysis was performed using SSR markers, while for Sinapis arvensis populations’ clustering, ISSR markers were used. The results showed that the “golden standard” is not the only appropriate method for dendrogram construction based on binary data derived from molecular markers. Ten other hierarchical clustering methods could be used for the construction of dendrograms from SSR markers and thirty-seven other hierarchical clustering methods could be used for the construction of dendrograms using binary data resulted from ISSR markers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Ret. 12(4), 461–486 (2009)CrossRef Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Ret. 12(4), 461–486 (2009)CrossRef
Zurück zum Zitat Backeljau, T., de Bruyn, L., de Wolf, H., Jordaens, K., van Dongen, S., Winnepennincks, B.: Multiple UPGMA and neighbor-joining trees and the performance of some computer packages. Mol. Biol. Evol. 13(2), 309–313 (1996) Backeljau, T., de Bruyn, L., de Wolf, H., Jordaens, K., van Dongen, S., Winnepennincks, B.: Multiple UPGMA and neighbor-joining trees and the performance of some computer packages. Mol. Biol. Evol. 13(2), 309–313 (1996)
Zurück zum Zitat Choi, S.S., Cha S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cyb. Inf. 8(1), 43–48 (2010) Choi, S.S., Cha S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cyb. Inf. 8(1), 43–48 (2010)
Zurück zum Zitat Deza, M.M., Deza, E.: Encyclopedia of Distances, 4th edn. Springer, Berlin (2016)CrossRef Deza, M.M., Deza, E.: Encyclopedia of Distances, 4th edn. Springer, Berlin (2016)CrossRef
Zurück zum Zitat Dillon, W.R., Goldstein, M.: Multivariate Analysis: Methods and Applications. Wiley, New York (1984)MATH Dillon, W.R., Goldstein, M.: Multivariate Analysis: Methods and Applications. Wiley, New York (1984)MATH
Zurück zum Zitat Fernández-Fernández, F., Harvey, N.G., James, C.M.: Isolation and characterization of polymorphic microsatellite markers from European pear (Pyrus communis L.). Mol. Econ. Notes 6(4), 1039–1041 (2006) Fernández-Fernández, F., Harvey, N.G., James, C.M.: Isolation and characterization of polymorphic microsatellite markers from European pear (Pyrus communis L.). Mol. Econ. Notes 6(4), 1039–1041 (2006)
Zurück zum Zitat Finch, H.: Comparison of distance measures in cluster analysis with dichotomous data. J. Data Sci. 3(1), 85–100 (2005) Finch, H.: Comparison of distance measures in cluster analysis with dichotomous data. J. Data Sci. 3(1), 85–100 (2005)
Zurück zum Zitat GenAIEx: A comprehensive Guide to GenAIEx 6.5. Australian National University, Cambera Australia (2012) GenAIEx: A comprehensive Guide to GenAIEx 6.5. Australian National University, Cambera Australia (2012)
Zurück zum Zitat Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis: A Global Perspective, 7th edn. Pearson Education Inc, New Jersey (2010) Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis: A Global Perspective, 7th edn. Pearson Education Inc, New Jersey (2010)
Zurück zum Zitat Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, New York (2012) Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, New York (2012)
Zurück zum Zitat Iodice D’Enza, A., Palumbo, F.: Dynamic data analysis of evolving association patterns. In: Giusti, A., et al. (eds.) Classification and Data Mining, pp. 45–53. Springer, Heidelberg (2013) Iodice D’Enza, A., Palumbo, F.: Dynamic data analysis of evolving association patterns. In: Giusti, A., et al. (eds.) Classification and Data Mining, pp. 45–53. Springer, Heidelberg (2013)
Zurück zum Zitat Khorshidi, S., Davarynejad, G., Samiei, L., Morhaddam, M.: Study of genetic diversity of pear genotypes and cultivars (Pyrus communis L.) using inter-simple sequence repeat markers (ISSR). Erwerbs-Obstbau. 59(4), 301–308 (2017) Khorshidi, S., Davarynejad, G., Samiei, L., Morhaddam, M.: Study of genetic diversity of pear genotypes and cultivars (Pyrus communis L.) using inter-simple sequence repeat markers (ISSR). Erwerbs-Obstbau. 59(4), 301–308 (2017)
Zurück zum Zitat Kumar, S., Stecher, G., Tamura, K.: MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evolu. 33(7), 1870–1874 (2016) Kumar, S., Stecher, G., Tamura, K.: MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evolu. 33(7), 1870–1874 (2016)
Zurück zum Zitat Ludwig, J.A., Reynolds, J.F.: Statistical Ecology: A Primer in Methods and Computing. Wiley, New York (1988) Ludwig, J.A., Reynolds, J.F.: Statistical Ecology: A Primer in Methods and Computing. Wiley, New York (1988)
Zurück zum Zitat Markos, A., Menexes, G., Papadimitriou, I.: The CHIC Analysis Software v1.0. In: Locarek-Junge, H., Weihs, C. (eds.) Classification as a Tool for Research, pp. 409–416. Springer, Heidelberg (2010) Markos, A., Menexes, G., Papadimitriou, I.: The CHIC Analysis Software v1.0. In: Locarek-Junge, H., Weihs, C. (eds.) Classification as a Tool for Research, pp. 409–416. Springer, Heidelberg (2010)
Zurück zum Zitat McIntyre, M., Blashfield, R.: A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Mul. Beh. Res. 15(2), 225–238 (1980) McIntyre, M., Blashfield, R.: A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Mul. Beh. Res. 15(2), 225–238 (1980)
Zurück zum Zitat Menexes, G.: Experimental Designs in Data Analysis. Published Ph.D. thesis, University of Macedonia, Thessaloniki, Greece (in Greek) (2006) Menexes, G.: Experimental Designs in Data Analysis. Published Ph.D. thesis, University of Macedonia, Thessaloniki, Greece (in Greek) (2006)
Zurück zum Zitat Menexes, G., Angelopoulos, S.: Proposals for the financing and development of Greek farms based on a clustering method for categorical data. EuroMed. J. Bus. 3(3), 263–285 (2008)CrossRef Menexes, G., Angelopoulos, S.: Proposals for the financing and development of Greek farms based on a clustering method for categorical data. EuroMed. J. Bus. 3(3), 263–285 (2008)CrossRef
Zurück zum Zitat Mojena, R., Wishart, D.: Stopping rules for Ward’s clustering method. In: Proceedings of COMPSTAT, pp. 426–432. Physika–Verlag, West Germany (1980) Mojena, R., Wishart, D.: Stopping rules for Ward’s clustering method. In: Proceedings of COMPSTAT, pp. 426–432. Physika–Verlag, West Germany (1980)
Zurück zum Zitat Ntoanidou, S., Madesis, P., Diamantidis, G., Eleftherohorinos, I.: Trp574 substitution in the acetolactate synthase of Sinapis arvensis confers cross-resistance to tribenuron and imazamox. Pest. Biochem. Phys. 142, 9–14 (2017)CrossRef Ntoanidou, S., Madesis, P., Diamantidis, G., Eleftherohorinos, I.: Trp574 substitution in the acetolactate synthase of Sinapis arvensis confers cross-resistance to tribenuron and imazamox. Pest. Biochem. Phys. 142, 9–14 (2017)CrossRef
Zurück zum Zitat Schlötterer, C.: The evolution of molecular markers—just a matter of fashion? Nat. Rev. Gen. 5, 63–69 (2004)CrossRef Schlötterer, C.: The evolution of molecular markers—just a matter of fashion? Nat. Rev. Gen. 5, 63–69 (2004)CrossRef
Zurück zum Zitat Sharma, S.: Applied Multivariate Techniques. Willey, New York (1996) Sharma, S.: Applied Multivariate Techniques. Willey, New York (1996)
Zurück zum Zitat Sneath, P., Sokal, R.: Numerical Taxonomy. W. H. Freeman, San Francisco (1973)MATH Sneath, P., Sokal, R.: Numerical Taxonomy. W. H. Freeman, San Francisco (1973)MATH
Zurück zum Zitat Song, Y., Westerhuis, J.A., Aben, N., Michaut, M., Wessels, L.F., Smilde, A.K.: Principal component analysis of binary genomics data. Brief Bioinform. 20(1), 317–329 (2019)CrossRef Song, Y., Westerhuis, J.A., Aben, N., Michaut, M., Wessels, L.F., Smilde, A.K.: Principal component analysis of binary genomics data. Brief Bioinform. 20(1), 317–329 (2019)CrossRef
Zurück zum Zitat Spaans, A., van der Kloot, W.: Permucluster 1.0 user’s guide. Department of Psychology, University of Leiden, Leiden (2004) Spaans, A., van der Kloot, W.: Permucluster 1.0 user’s guide. Department of Psychology, University of Leiden, Leiden (2004)
Zurück zum Zitat Tamasauskas, D., Sakalauskas, V., Kriksciuniene, D.: Evaluation framework of hierarchical clustering methods for binary data. In: 12th International Conference on Hybrid Intelligent Systems (HIS), pp. 421–426. IEEE (2012) Tamasauskas, D., Sakalauskas, V., Kriksciuniene, D.: Evaluation framework of hierarchical clustering methods for binary data. In: 12th International Conference on Hybrid Intelligent Systems (HIS), pp. 421–426. IEEE (2012)
Zurück zum Zitat van der Kloot, W.A., Bouwmeester, S., Heiser, W.J.: Cluster instability as a result of data input order. In: Yanai, H., Okada, A., Shimenasu, K., Kano, Y., Meulman J. (eds.), New Developments in Psychometrics: Proceedings of the International Meeting of the Psychometric Society IMPS 2001, pp. 569–576, Springer, Tokyo (2003) van der Kloot, W.A., Bouwmeester, S., Heiser, W.J.: Cluster instability as a result of data input order. In: Yanai, H., Okada, A., Shimenasu, K., Kano, Y., Meulman J. (eds.), New Developments in Psychometrics: Proceedings of the International Meeting of the Psychometric Society IMPS 2001, pp. 569–576, Springer, Tokyo (2003)
Zurück zum Zitat Warwick S.I., Beckie H.J., Thomas A.G., McDonald T.: The biology of Canadian weeds. 8. Sinapis arvensis L. (updated). Can. J. Plant Sci. 80(4), 939–961 (2000) Warwick S.I., Beckie H.J., Thomas A.G., McDonald T.: The biology of Canadian weeds. 8. Sinapis arvensis L. (updated). Can. J. Plant Sci. 80(4), 939–961 (2000)
Zurück zum Zitat Wijaya, S.H., Afendi, F.M., Batubara, I., Darusman, L.K., Altaf-Ul-Amin, M., Kanaya, S.: Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines. BMC Bioinf. 17(520), 1–19 (2016) Wijaya, S.H., Afendi, F.M., Batubara, I., Darusman, L.K., Altaf-Ul-Amin, M., Kanaya, S.: Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines. BMC Bioinf. 17(520), 1–19 (2016)
Zurück zum Zitat Zietkiewicz, E., Rafalski, A., Labuda, D.: Genome fingerprinting by simple sequence repeat (SSR)-anchored polymerase chain reaction amplification. Genomics. 20(2), 176–183 (1994)CrossRef Zietkiewicz, E., Rafalski, A., Labuda, D.: Genome fingerprinting by simple sequence repeat (SSR)-anchored polymerase chain reaction amplification. Genomics. 20(2), 176–183 (1994)CrossRef
Metadaten
Titel
Comparison of Hierarchical Clustering Methods for Binary Data From SSR and ISSR Molecular Markers
verfasst von
Emmanouil D. Pratsinakis
Lefkothea Karapetsi
Symela Ntoanidou
Angelos Markos
Panagiotis Madesis
Ilias Eleftherohorinos
George Menexes
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-60104-1_26

Premium Partner