Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2021

31.03.2021 | Theoretical advances

Customizable HMM-based measures to accurately compare tree sets

verfasst von: Sylvain Iloga

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Trees have been topics of much interest since many decades due to various emerging applications using data represented as trees. Several techniques have been developed to compare two trees. But there is a serious lack of metrics to compare weighted trees. Existing approaches do not also allow to explicitly specify the targeted nodes properties on which the comparison should be performed. Furthermore, the problem of comparing two tree sets is not specifically addressed by existing techniques. This paper attempts to solve these problems by first proposing a distance and a similarity for the comparison of two finite sets of rooted ordered trees which can be labeled or not, as well as weighted or unweighted. To achieve this goal, a hidden Markov model is associated with each tree set for each targeted nodes property. The model associated with a tree set T for the targeted nodes property p learns how much the nodes of the trees in T verify property p. The resulting models are finally compared to derive a distance and similarity between the two sets of trees. The previous measures are then generalized for the comparison of unrooted and unordered trees. Flat classification experiments were carried out on two synthetic databases named FirstLast-L and FirstLast-LW available online. They both contain four classes of 100 rooted ordered trees whose specific and non-trivial nodes properties are clearly defined. When the distance proposed in this paper is selected as metric for the Nearest Neighbor classifier, a perfect accuracy of \(100\%\) is obtained for these two databases. This performance is \(41\%\) higher than the accuracy exhibited when the widespread tree Edit distance is selected for FirstLast-L.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Valiente G (2001) An efficient bottom-up distance between trees. In: spire, pages 212–219 Valiente G (2001) An efficient bottom-up distance between trees. In: spire, pages 212–219
2.
Zurück zum Zitat Bille P (2003) Tree edit distance, alignment distance and inclusion. Technical report, Citeseer Bille P (2003) Tree edit distance, alignment distance and inclusion. Technical report, Citeseer
3.
Zurück zum Zitat Liu T-L, Geiger D (1999) Approximate tree matching and shape similarity. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 1, pages 456–462. IEEE Liu T-L, Geiger D (1999) Approximate tree matching and shape similarity. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 1, pages 456–462. IEEE
4.
Zurück zum Zitat Bhavsar VC, Boley H, Yang L (2004) A weighted-tree similarity algorithm for multi-agent systems in e-business environments. Comput Intell 20(4):584–602MathSciNetCrossRef Bhavsar VC, Boley H, Yang L (2004) A weighted-tree similarity algorithm for multi-agent systems in e-business environments. Comput Intell 20(4):584–602MathSciNetCrossRef
6.
Zurück zum Zitat Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262MathSciNetCrossRef Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262MathSciNetCrossRef
7.
Zurück zum Zitat Zhang K, Statman R, Shasha D (1992) On the editing distance between unordered labeled trees. Inf Process Lett 42(3):133–139MathSciNetCrossRef Zhang K, Statman R, Shasha D (1992) On the editing distance between unordered labeled trees. Inf Process Lett 42(3):133–139MathSciNetCrossRef
8.
Zurück zum Zitat Zhang K, Jiang T (1994) Some max snp-hard results concerning unordered labeled trees. Inf Process Lett 49(5):249–254MathSciNetCrossRef Zhang K, Jiang T (1994) Some max snp-hard results concerning unordered labeled trees. Inf Process Lett 49(5):249–254MathSciNetCrossRef
9.
Zurück zum Zitat Klein PN (1998) Computing the edit-distance between unrooted ordered trees. In: European Symposium on Algorithms, pages 91–102. Springer Klein PN (1998) Computing the edit-distance between unrooted ordered trees. In: European Symposium on Algorithms, pages 91–102. Springer
10.
11.
12.
Zurück zum Zitat Demaine ED, Mozes S, Rossman B, Weimann O (2009) An optimal decomposition algorithm for tree edit distance. ACM Trans Algorithms (TALG) 6(1):2MathSciNetMATH Demaine ED, Mozes S, Rossman B, Weimann O (2009) An optimal decomposition algorithm for tree edit distance. ACM Trans Algorithms (TALG) 6(1):2MathSciNetMATH
13.
Zurück zum Zitat Pawlik M, Augsten N (2015) Efficient computation of the tree edit distance. ACM Trans Database Syst (TODS) 40(1):1–40MathSciNetCrossRef Pawlik M, Augsten N (2015) Efficient computation of the tree edit distance. ACM Trans Database Syst (TODS) 40(1):1–40MathSciNetCrossRef
14.
Zurück zum Zitat Pawlik M, Augsten N (2016) Tree edit distance: robust and memory-efficient. Inf Syst 56:157–173CrossRef Pawlik M, Augsten N (2016) Tree edit distance: robust and memory-efficient. Inf Syst 56:157–173CrossRef
15.
Zurück zum Zitat Schwarz S, Pawlik M, Augsten N (2017) A new perspective on the tree edit distance. In: International Conference on Similarity Search and Applications, pages 156–170. Springer Schwarz S, Pawlik M, Augsten N (2017) A new perspective on the tree edit distance. In: International Conference on Similarity Search and Applications, pages 156–170. Springer
16.
Zurück zum Zitat Zhang K (1995) Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn 28(3):463–474CrossRef Zhang K (1995) Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn 28(3):463–474CrossRef
17.
18.
Zurück zum Zitat Richter T (1997) A new measure of the distance between ordered trees and its applications. Inst für Informatik Richter T (1997) A new measure of the distance between ordered trees and its applications. Inst für Informatik
19.
Zurück zum Zitat Lu CL, Su Z-Y, Tang CY (2001) A new measure of edit distance between labeled trees. In: International Computing and Combinatorics Conference, pages 338–348. Springer Lu CL, Su Z-Y, Tang CY (2001) A new measure of edit distance between labeled trees. In: International Computing and Combinatorics Conference, pages 338–348. Springer
20.
Zurück zum Zitat Ouangraoua A, Ferraro P, Tichit L, Dulucq S (2007) Local similarity between quotiented ordered trees. J Discrete Algorithms 5(1):23–35MathSciNetCrossRef Ouangraoua A, Ferraro P, Tichit L, Dulucq S (2007) Local similarity between quotiented ordered trees. J Discrete Algorithms 5(1):23–35MathSciNetCrossRef
22.
Zurück zum Zitat Shin-Yee L (1979) A tree-to-tree distance and its application to cluster analysis. IEEE Trans Pattern Anal Mach Intell 2:219–224MATH Shin-Yee L (1979) A tree-to-tree distance and its application to cluster analysis. IEEE Trans Pattern Anal Mach Intell 2:219–224MATH
23.
Zurück zum Zitat Tanaka E, Tanaka K (1988) The tree-to-tree editing problem. Int J Pattern Recognit Artif Intell 2(02):221–240CrossRef Tanaka E, Tanaka K (1988) The tree-to-tree editing problem. Int J Pattern Recognit Artif Intell 2(02):221–240CrossRef
24.
Zurück zum Zitat Shasha D, Zhang K (1990) Fast algorithms for the unit cost editing distance between trees. J Algorithms 11(4):581–621MathSciNetCrossRef Shasha D, Zhang K (1990) Fast algorithms for the unit cost editing distance between trees. J Algorithms 11(4):581–621MathSciNetCrossRef
25.
Zurück zum Zitat Sridharamurthy R, Talha BM, Adhitya K, Vijay N (2018) Edit distance between merge trees. In: IEEE transactions on visualization and computer graphics, pages 1–14 Sridharamurthy R, Talha BM, Adhitya K, Vijay N (2018) Edit distance between merge trees. In: IEEE transactions on visualization and computer graphics, pages 1–14
26.
Zurück zum Zitat Jiang T, Wang L, Zhang K (1995) Alignment of trees–an alternative to tree edit. Theoret Comput Sci 143(1):137–148MathSciNetCrossRef Jiang T, Wang L, Zhang K (1995) Alignment of trees–an alternative to tree edit. Theoret Comput Sci 143(1):137–148MathSciNetCrossRef
27.
Zurück zum Zitat Jansson J, Lingas A (2001) A fast algorithm for optimal alignment between similar ordered trees. In: Annual Symposium on Combinatorial Pattern Matching, pages 232–240. Springer Jansson J, Lingas A (2001) A fast algorithm for optimal alignment between similar ordered trees. In: Annual Symposium on Combinatorial Pattern Matching, pages 232–240. Springer
28.
Zurück zum Zitat Kilpeläinen P, et al (1992) Tree matching problems with applications to structured text databases Kilpeläinen P, et al (1992) Tree matching problems with applications to structured text databases
29.
Zurück zum Zitat Alonso L, Schott R (1993) On the tree inclusion problem. In: International Symposium on Mathematical Foundations of Computer Science, pages 211–221. Springer Alonso L, Schott R (1993) On the tree inclusion problem. In: International Symposium on Mathematical Foundations of Computer Science, pages 211–221. Springer
30.
31.
Zurück zum Zitat Richter T (1997) A new algorithm for the ordered tree inclusion problem. In: Annual Symposium on Combinatorial Pattern Matching, pages 150–166. Springer Richter T (1997) A new algorithm for the ordered tree inclusion problem. In: Annual Symposium on Combinatorial Pattern Matching, pages 150–166. Springer
34.
Zurück zum Zitat Kosaraju SR (1989) Efficient tree pattern matching. In: 30th Annual Symposium on Foundations of Computer Science, pages 178–183. IEEE Kosaraju SR (1989) Efficient tree pattern matching. In: 30th Annual Symposium on Foundations of Computer Science, pages 178–183. IEEE
35.
Zurück zum Zitat Dubiner M, Galil Z, Magen E (1990) Faster tree pattern matching. In: Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, pages 145–150. IEEE Dubiner M, Galil Z, Magen E (1990) Faster tree pattern matching. In: Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, pages 145–150. IEEE
36.
Zurück zum Zitat Ramesh RAMAKRISHNAN, Ramakrishnan IV (1992) Nonlinear pattern matching in trees. J ACM (JACM) 39(2):295–316MathSciNetCrossRef Ramesh RAMAKRISHNAN, Ramakrishnan IV (1992) Nonlinear pattern matching in trees. J ACM (JACM) 39(2):295–316MathSciNetCrossRef
37.
Zurück zum Zitat Zhang KZ, Shasha D, Wang JT-L (1994) Approximate tree matching in the presence of variable length don’t cares. J Algorithms 16(1):33–66MathSciNetCrossRef Zhang KZ, Shasha D, Wang JT-L (1994) Approximate tree matching in the presence of variable length don’t cares. J Algorithms 16(1):33–66MathSciNetCrossRef
39.
Zurück zum Zitat Amir A, Keselman D (1997) Maximum agreement subtree in a set of evolutionary trees: metrics and efficient algorithms. SIAM J Comput 26(6):1656–1669MathSciNetCrossRef Amir A, Keselman D (1997) Maximum agreement subtree in a set of evolutionary trees: metrics and efficient algorithms. SIAM J Comput 26(6):1656–1669MathSciNetCrossRef
40.
Zurück zum Zitat Khanna S, Motwani R, Yao FF (1995) Approximation algorithms for the largest common subtree problem. Citeseer Khanna S, Motwani R, Yao FF (1995) Approximation algorithms for the largest common subtree problem. Citeseer
41.
Zurück zum Zitat Akutsu T, Halldórsson MM (2000) On the approximation of largest common subtrees and largest common point sets. Theor Comput Sci 233(1–2):33–50MathSciNetCrossRef Akutsu T, Halldórsson MM (2000) On the approximation of largest common subtrees and largest common point sets. Theor Comput Sci 233(1–2):33–50MathSciNetCrossRef
42.
43.
Zurück zum Zitat Nishimura N, Ragde P, Thilikos DM (2000) Finding smallest supertrees under minor containment. Int J Found Comput Sci 11(03):445–465MathSciNetCrossRef Nishimura N, Ragde P, Thilikos DM (2000) Finding smallest supertrees under minor containment. Int J Found Comput Sci 11(03):445–465MathSciNetCrossRef
44.
Zurück zum Zitat Tan P-N, Steinbach M, Kumar V et al (2006) Cluster analysis: basic concepts and algorithms. Intro Data Min 8:487–568 Tan P-N, Steinbach M, Kumar V et al (2006) Cluster analysis: basic concepts and algorithms. Intro Data Min 8:487–568
45.
Zurück zum Zitat Mucherino A, Papajorgji PJ, Pardalos PM (2009) Data Mining in Agriculture, volume 34, chapter k-Nearest Neighbor Classification. Springer, New York Mucherino A, Papajorgji PJ, Pardalos PM (2009) Data Mining in Agriculture, volume 34, chapter k-Nearest Neighbor Classification. Springer, New York
46.
Zurück zum Zitat Bondy JA, Uppaluri SRM, et al (1976) Graph theory with applications, volume 290. Macmillan London Bondy JA, Uppaluri SRM, et al (1976) Graph theory with applications, volume 290. Macmillan London
47.
Zurück zum Zitat Cheung T-Y (1983) Graph traversal techniques and the maximum flow problem in distributed computation. IEEE Trans Software Eng 4:504–512CrossRef Cheung T-Y (1983) Graph traversal techniques and the maximum flow problem in distributed computation. IEEE Trans Software Eng 4:504–512CrossRef
48.
49.
Zurück zum Zitat Matoušek J, Thomas R (1992) On the complexity of finding iso-and other morphisms for partial k-trees. Discrete Math 108(1–3):343–364MathSciNetCrossRef Matoušek J, Thomas R (1992) On the complexity of finding iso-and other morphisms for partial k-trees. Discrete Math 108(1–3):343–364MathSciNetCrossRef
50.
Zurück zum Zitat Torsello A, Hancock ER (2006) Learning shape-classes using a mixture of tree-unions. IEEE Trans Pattern Anal Mach Intell 28(6):954–967CrossRef Torsello A, Hancock ER (2006) Learning shape-classes using a mixture of tree-unions. IEEE Trans Pattern Anal Mach Intell 28(6):954–967CrossRef
51.
Zurück zum Zitat Torsello A, Rossi L (2011) Supervised learning of graph structure. In: International Workshop on Similarity-Based Pattern Recognition, pages 117–132. Springer Torsello A, Rossi L (2011) Supervised learning of graph structure. In: International Workshop on Similarity-Based Pattern Recognition, pages 117–132. Springer
52.
Zurück zum Zitat Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef
53.
Zurück zum Zitat Iloga S, Romain O, Tchuenté M (2020) An efficient generic approach for automatic taxonomy generation using HMMs. Pattern Anal Appl 1–22 Iloga S, Romain O, Tchuenté M (2020) An efficient generic approach for automatic taxonomy generation using HMMs. Pattern Anal Appl 1–22
54.
Zurück zum Zitat Falkhausen M, Reininger H, Wolf D (1995) Calculation of distance measures between hidden markov models. In: Fourth European Conference on Speech Communication and Technology Falkhausen M, Reininger H, Wolf D (1995) Calculation of distance measures between hidden markov models. In: Fourth European Conference on Speech Communication and Technology
55.
Zurück zum Zitat Do MN (2003) Fast approximation of kullback-leibler distance for dependence trees and hidden markov models. IEEE Signal Process Lett 10(4):115–118CrossRef Do MN (2003) Fast approximation of kullback-leibler distance for dependence trees and hidden markov models. IEEE Signal Process Lett 10(4):115–118CrossRef
56.
Zurück zum Zitat Silva J, Narayanan S (2008) Upper bound kullback-leibler divergence for transient hidden markov models. IEEE Trans Signal Process 56(9):4176–4188MathSciNetCrossRef Silva J, Narayanan S (2008) Upper bound kullback-leibler divergence for transient hidden markov models. IEEE Trans Signal Process 56(9):4176–4188MathSciNetCrossRef
57.
Zurück zum Zitat Lyngso RB, Pedersen CN, Nielsen H (1999) Metrics and similarity measures for hidden markov models. In: Proc Int Conf Intell Syst Mol Biol, pages 178–186 Lyngso RB, Pedersen CN, Nielsen H (1999) Metrics and similarity measures for hidden markov models. In: Proc Int Conf Intell Syst Mol Biol, pages 178–186
58.
Zurück zum Zitat Zeng J, Duan J, Chengrong W (2010) A new distance measure for hidden markov models. Expert Syst Appl 37(2):1550–1555CrossRef Zeng J, Duan J, Chengrong W (2010) A new distance measure for hidden markov models. Expert Syst Appl 37(2):1550–1555CrossRef
59.
Zurück zum Zitat Iloga S, Romain O, Tchuenté M (2018) An accurate hmm-based similarity measure between finite sets of histograms. Pattern Anal Appl 1–26 Iloga S, Romain O, Tchuenté M (2018) An accurate hmm-based similarity measure between finite sets of histograms. Pattern Anal Appl 1–26
60.
Zurück zum Zitat Sahraeian SME, Yoon B-J (2011) A novel low-complexity hmm similarity measure. IEEE Signal Process Lett 18(2):87–90CrossRef Sahraeian SME, Yoon B-J (2011) A novel low-complexity hmm similarity measure. IEEE Signal Process Lett 18(2):87–90CrossRef
61.
Zurück zum Zitat Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pages 49–56 Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pages 49–56
62.
Zurück zum Zitat Nothman J, Qin H, Yurchak R (2018) Stop word lists in free open-source software packages. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 7–12 Nothman J, Qin H, Yurchak R (2018) Stop word lists in free open-source software packages. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 7–12
63.
Zurück zum Zitat Rico-Juan JR, Micó L (2003) Some results about the use of tree/string edit distances in a\(^\sim\) nearest neighbour classification task. In: Iberian Conference on Pattern Recognition and Image Analysis, pages 821–828. Springer Rico-Juan JR, Micó L (2003) Some results about the use of tree/string edit distances in a\(^\sim\) nearest neighbour classification task. In: Iberian Conference on Pattern Recognition and Image Analysis, pages 821–828. Springer
65.
Zurück zum Zitat Espinosa-Manzo ALA, Arias-Estrada MO (2001) Implementing hidden markov models in a hardware architecture. In: Proceedings of the International Meeting of Computer Science (ENC’01), Aguascalientes, Mexico, volume II, pages 1007–1016 Espinosa-Manzo ALA, Arias-Estrada MO (2001) Implementing hidden markov models in a hardware architecture. In: Proceedings of the International Meeting of Computer Science (ENC’01), Aguascalientes, Mexico, volume II, pages 1007–1016
Metadaten
Titel
Customizable HMM-based measures to accurately compare tree sets
verfasst von
Sylvain Iloga
Publikationsdatum
31.03.2021
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2021
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-021-00971-3

Weitere Artikel der Ausgabe 3/2021

Pattern Analysis and Applications 3/2021 Zur Ausgabe

Premium Partner