Skip to main content
Erschienen in: Discover Computing 3-4/2019

17.12.2018 | Knowledge Graphs and Semantics in Text Analysis and Retrieval

Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs

verfasst von: Guruprasad Nayak, Sourav Dutta, Deepak Ajwani, Patrick Nicholson, Alessandra Sala

Erschienen in: Discover Computing | Ausgabe 3-4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automated construction of knowledge hierarchies from huge data corpora is gaining increasing attention in recent years, in order to tackle the infeasibility of manually extracting and semantically linking millions of concepts. As a knowledge hierarchy evolves with these automated techniques, there is a need for measures to assess its temporal evolution, quantifying the similarities between different versions and identifying the relative growth of different subgraphs in the knowledge hierarchy. In this paper, we focus on measures that leverage structural properties of the knowledge hierarchy graph to assess the temporal changes. We propose a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing different versions of a knowledge hierarchy, modeled as a generic directed acyclic graph. We present theoretical analysis to depict that the proposed measure accurately captures the salient properties of taxonomic hierarchies, assesses changes in the ordering of nodes, along with the logical subsumption of relationships among concepts. We also present a linear time variant of the measure, and show that our measures, unlike previous approaches, are tunable to cater to diverse application needs. We further show that our measure provides interpretability, thereby identifying the key structural and logical difference in the hierarchies. Experiments on a real DBpedia and biological knowledge hierarchy showcase that our measures accurately capture structural similarity, while providing enhanced scalability and tunability. Also, we demonstrate that the temporal evolution of different subgraphs in this knowledge hierarchy, as captured purely by our structural measure, corresponds well with the known disruptions in the related subject areas.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Hierarchical ordering of DBpedia concepts and Wikipedia categories based on Simple Knowledge Organization System (SKOS)  : broader relationships between the corresponding Wikipedia pages is henceforth referred to as the DBpedia taxonomy.
 
2
For non-identical sets, nodes present in one but absent in other are added as singletons.
 
3
The Katz similarity vector of Eq. (1) is obtained by flattening this matrix into a vector.
 
4
DBpedia, derived from Wikipedia pages, contains over 6 million vertices and 25 million edges for a hierarchical ordering of Wikipedia concepts using the Simple Knowledge Organization System (SKOS) broader (i.e., skos:broader) relationships.
 
Literatur
Zurück zum Zitat Adams, E, I. I. I. (1972). Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21(4), 390–397.CrossRef Adams, E, I. I. I. (1972). Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21(4), 390–397.CrossRef
Zurück zum Zitat Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08 (pp. 1247–1250). ISBN 978-1-60558-102-6. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08 (pp. 1247–1250). ISBN 978-1-60558-102-6.
Zurück zum Zitat Bordea, G., Buitelaar, P., Feralli, S., & Navigli, R. (2015). Semeval-2015 task 17: Taxonomy extraction evaluation (texeval). In SemEval-2015. Bordea, G., Buitelaar, P., Feralli, S., & Navigli, R. (2015). Semeval-2015 task 17: Taxonomy extraction evaluation (texeval). In SemEval-2015.
Zurück zum Zitat Bordea, G., Lefever, E., & Buitelaar, P. (2016). Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2). In SemEval-2016 (pp. 1081–1091). Bordea, G., Lefever, E., & Buitelaar, P. (2016). Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2). In SemEval-2016 (pp. 1081–1091).
Zurück zum Zitat Brandenburg, F., Gleiner, A., & Hofmeier, A. (2012). Comparing and aggregating partial orders with Kendall tau distances. In WALCOM: Algorithms and computation (pp. 88–99). Brandenburg, F., Gleiner, A., & Hofmeier, A. (2012). Comparing and aggregating partial orders with Kendall tau distances. In WALCOM: Algorithms and computation (pp. 88–99).
Zurück zum Zitat Brank, J., Grobelnik, M., & Mladenić, D. (2005) A survey of ontology evaluation techniques. In SiKDD. Brank, J., Grobelnik, M., & Mladenić, D. (2005) A survey of ontology evaluation techniques. In SiKDD.
Zurück zum Zitat Brank, J., Mladenic, D., & Grobelnik, M. (2006). Gold standard based ontology evaluation using instance assignment. In Workshop on evaluation of ontologies for the web, EON. Brank, J., Mladenic, D., & Grobelnik, M. (2006). Gold standard based ontology evaluation using instance assignment. In Workshop on evaluation of ontologies for the web, EON.
Zurück zum Zitat Cai, S., & Knight, K. (2013). Smatch: An evaluation metric for semantic feature structures. In Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 748–752). Association for Computational Linguistics. Cai, S., & Knight, K. (2013). Smatch: An evaluation metric for semantic feature structures. In Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 748–752). Association for Computational Linguistics.
Zurück zum Zitat Cross, V., Yu, X., & Hu, X. (2013). Unifying ontological similarity measures: A theoretical and empirical investigation. International Journal of Approximate Reasoning, 54(7), 861–875.CrossRef Cross, V., Yu, X., & Hu, X. (2013). Unifying ontological similarity measures: A theoretical and empirical investigation. International Journal of Approximate Reasoning, 54(7), 861–875.CrossRef
Zurück zum Zitat d’Aquin, M. (2009). Formally measuring agreement and disagreement in ontologies. In K-CAP (pp. 145–152). d’Aquin, M. (2009). Formally measuring agreement and disagreement in ontologies. In K-CAP (pp. 145–152).
Zurück zum Zitat David, J., Euzenat, J., & Šváb-Zamazal, O. (2010). Ontology similarity in the alignment space. In ISWC (pp. 129–144). David, J., Euzenat, J., & Šváb-Zamazal, O. (2010). Ontology similarity in the alignment space. In ISWC (pp. 129–144).
Zurück zum Zitat de Knijff, J., Meijer, K., Frasincar, F., & Hogenboom, F. (2011). Word sense disambiguation for automatic taxonomy construction from text-based web corpora. In WISE. de Knijff, J., Meijer, K., Frasincar, F., & Hogenboom, F. (2011). Word sense disambiguation for automatic taxonomy construction from text-based web corpora. In WISE.
Zurück zum Zitat Dellschaft, K., Staab, S. (2006). On how to perform a gold standard based evaluation of ontology learning. In International semantic web conference (pp. 228–241). Dellschaft, K., Staab, S. (2006). On how to perform a gold standard based evaluation of ontology learning. In International semantic web conference (pp. 228–241).
Zurück zum Zitat Elghawalby, H., & Hancock, E. R. (2008). Measuring graph similarity using spectral geometry. In ICIAR (pp. 517–526). Elghawalby, H., & Hancock, E. R. (2008). Measuring graph similarity using spectral geometry. In ICIAR (pp. 517–526).
Zurück zum Zitat Fensel, D. (2005). Spinning the semantic web: Bringing the World Wide Web to its full potential. Cambridge: MIT Press.CrossRef Fensel, D. (2005). Spinning the semantic web: Bringing the World Wide Web to its full potential. Cambridge: MIT Press.CrossRef
Zurück zum Zitat Foggia, P., Percannella, G., & Vento, M. (2014). Graph matching and learning in pattern recognition in the last 10 years. Pattern Recognition and AI, 28(01), 1450001.MathSciNetCrossRef Foggia, P., Percannella, G., & Vento, M. (2014). Graph matching and learning in pattern recognition in the last 10 years. Pattern Recognition and AI, 28(01), 1450001.MathSciNetCrossRef
Zurück zum Zitat Fowlkes, E., & Mallows, C. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553–569.CrossRefMATH Fowlkes, E., & Mallows, C. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553–569.CrossRefMATH
Zurück zum Zitat Gao, X., Xiao, B., Tao, D., & Li, X. (2010). A survey of graph edit distance. Pattern Analysis and Applications, 13(1), 113–129.MathSciNetCrossRef Gao, X., Xiao, B., Tao, D., & Li, X. (2010). A survey of graph edit distance. Pattern Analysis and Applications, 13(1), 113–129.MathSciNetCrossRef
Zurück zum Zitat Geffet, M., & Dagan, I. (2005). The distributional inclusion hypotheses and lexical entailment. In ACL (pp. 107–114). Geffet, M., & Dagan, I. (2005). The distributional inclusion hypotheses and lexical entailment. In ACL (pp. 107–114).
Zurück zum Zitat Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78–94.CrossRef Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78–94.CrossRef
Zurück zum Zitat Guarino, N., & Welty, C. (2002). Evaluating ontological decisions with ontoclean. Communications of ACM, 45, 61–65.CrossRef Guarino, N., & Welty, C. (2002). Evaluating ontological decisions with ontoclean. Communications of ACM, 45, 61–65.CrossRef
Zurück zum Zitat Harabagiu, S. M., Maiorano, S. J., & Paşca, M. A. (2003). Open-domain textual question answering techniques. Natural Language Engineering, 9(3), 231–267.CrossRef Harabagiu, S. M., Maiorano, S. J., & Paşca, M. A. (2003). Open-domain textual question answering techniques. Natural Language Engineering, 9(3), 231–267.CrossRef
Zurück zum Zitat Hulpuş, I., Prangnawarat, N., & Hayes, C. (2015). Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In ISWC (pp. 442–457). Hulpuş, I., Prangnawarat, N., & Hayes, C. (2015). Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In ISWC (pp. 442–457).
Zurück zum Zitat Jurgens, D., & Pilehvar, M. T. (2016). Semeval-2016: Semantic taxonomy enrichment. In SemEval (pp. 1092–1102). Jurgens, D., & Pilehvar, M. T. (2016). Semeval-2016: Semantic taxonomy enrichment. In SemEval (pp. 1092–1102).
Zurück zum Zitat Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.CrossRefMATH Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.CrossRefMATH
Zurück zum Zitat Koutra, D., Ke, T.-Y., Kang, U., Chau, D., Pao, H.-K., & Faloutsos, C. (2011). Unifying guilt-by-association approaches: Theorems and fast algorithms. In ECML-PKDD (pp. 245–260). Koutra, D., Ke, T.-Y., Kang, U., Chau, D., Pao, H.-K., & Faloutsos, C. (2011). Unifying guilt-by-association approaches: Theorems and fast algorithms. In ECML-PKDD (pp. 245–260).
Zurück zum Zitat Koutra, D., Shah, N., Vogelstein, J. T., Gallagher, B., & Faloutsos, C. (2016). Deltacon: A principled massive-graph similarity function with attribution. TKDD, 10(3), 28.CrossRef Koutra, D., Shah, N., Vogelstein, J. T., Gallagher, B., & Faloutsos, C. (2016). Deltacon: A principled massive-graph similarity function with attribution. TKDD, 10(3), 28.CrossRef
Zurück zum Zitat Kozareva, Z., & Hovy, E. (2010). A semi-supervised method to learn and construct taxonomies using the web. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1110–1118). Association for Computational Linguistics. Kozareva, Z., & Hovy, E. (2010). A semi-supervised method to learn and construct taxonomies using the web. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1110–1118). Association for Computational Linguistics.
Zurück zum Zitat Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., et al. (2015). Dbpedia: A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2), 167–195. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., et al. (2015). Dbpedia: A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2), 167–195.
Zurück zum Zitat Levin, R., Abassi, H., & Cohen, U. (2016). Guided walk: A scalable recommendation algorithm for complex heterogeneous social networks. In Recommender systems (pp. 293–300). Levin, R., Abassi, H., & Cohen, U. (2016). Guided walk: A scalable recommendation algorithm for complex heterogeneous social networks. In Recommender systems (pp. 293–300).
Zurück zum Zitat Liu, X., Song, Y., Liu, S., & Wang, H. (2012). Automatic taxonomy construction from keywords. In KDD (pp. 1433–1441). Liu, X., Song, Y., Liu, S., & Wang, H. (2012). Automatic taxonomy construction from keywords. In KDD (pp. 1433–1441).
Zurück zum Zitat Maedche, A., & Staab, S. (2002). Measuring similarity between ontologies. In EKAW (pp. 251–263). Maedche, A., & Staab, S. (2002). Measuring similarity between ontologies. In EKAW (pp. 251–263).
Zurück zum Zitat Malmi, E., Tatti, N., & Gionis, A. (2015). Beyond rankings: Comparing directed acyclic graphs. Data Mining and Knowledge Discovery, 29(5), 1233–1257.MathSciNetCrossRefMATH Malmi, E., Tatti, N., & Gionis, A. (2015). Beyond rankings: Comparing directed acyclic graphs. Data Mining and Knowledge Discovery, 29(5), 1233–1257.MathSciNetCrossRefMATH
Zurück zum Zitat McVicar, M., Sach, B., Mesnage, C., Lijffijt, J., Spyropoulou, E., & De Bie, T. (2016). Sumoted: An intuitive edit distance between rooted unordered uniquely-labelled trees. Pattern Recognition Letters, 79, 52–59.CrossRef McVicar, M., Sach, B., Mesnage, C., Lijffijt, J., Spyropoulou, E., & De Bie, T. (2016). Sumoted: An intuitive edit distance between rooted unordered uniquely-labelled trees. Pattern Recognition Letters, 79, 52–59.CrossRef
Zurück zum Zitat Ou, M., Cui, P., Pei, J., Zhang, Z., & Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In KDD (pp. 1105–1114). Ou, M., Cui, P., Pei, J., Zhang, Z., & Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In KDD (pp. 1105–1114).
Zurück zum Zitat Pesquita, C., Faria, D., Falco, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLOS Computational Biology, 5(7), 1–12.MathSciNetCrossRef Pesquita, C., Faria, D., Falco, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLOS Computational Biology, 5(7), 1–12.MathSciNetCrossRef
Zurück zum Zitat Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. JASA, 66(336), 846–850.CrossRef Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. JASA, 66(336), 846–850.CrossRef
Zurück zum Zitat Resnik, P. (1995). Using information content to evaluate semantic similarity in taxonomy. In IJCAI (pp. 448–453). Resnik, P. (1995). Using information content to evaluate semantic similarity in taxonomy. In IJCAI (pp. 448–453).
Zurück zum Zitat Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., & Borgwardt, K. (2009). Efficient graphlet kernels for large graph comparison. In AI and Statistics (pp. 488–495). Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., & Borgwardt, K. (2009). Efficient graphlet kernels for large graph comparison. In AI and Statistics (pp. 488–495).
Zurück zum Zitat Suchanek, F. M., Kasneci, G., & Weikum, G. (2008). YAGO: A large ontology from wikipedia and wordnet. Journal of Web Semantics, 6(3), 203–217.CrossRef Suchanek, F. M., Kasneci, G., & Weikum, G. (2008). YAGO: A large ontology from wikipedia and wordnet. Journal of Web Semantics, 6(3), 203–217.CrossRef
Zurück zum Zitat Sun, J., Ajwani, D., Nicholson, P. K., Sala, A., & Parthasarathy, S. (2017). Breaking cycles in noisy hierarchies. In WebSci (pp. 151–160). Sun, J., Ajwani, D., Nicholson, P. K., Sala, A., & Parthasarathy, S. (2017). Breaking cycles in noisy hierarchies. In WebSci (pp. 151–160).
Zurück zum Zitat Suominen, O., & Hyvönen, E. (2012). Improving the quality of SKOS vocabularies with skosify. In EKAW (pp. 383–397). Suominen, O., & Hyvönen, E. (2012). Improving the quality of SKOS vocabularies with skosify. In EKAW (pp. 383–397).
Zurück zum Zitat Vedula, N., Nicholson, P. K., Ajwani, D., Dutta, S., Sala, A., & Parthasarathy, S. (2018). Enriching taxonomies with functional domain knowledge. In SIGIR. Vedula, N., Nicholson, P. K., Ajwani, D., Dutta, S., Sala, A., & Parthasarathy, S. (2018). Enriching taxonomies with functional domain knowledge. In SIGIR.
Zurück zum Zitat Velardi, P., Navigli, R., Faralli, S., & María Ruiz-Martínez, J. (2012). A new method for evaluating automatically learned terminological taxonomies. In LREC (pp. 1498–1504). Velardi, P., Navigli, R., Faralli, S., & María Ruiz-Martínez, J. (2012). A new method for evaluating automatically learned terminological taxonomies. In LREC (pp. 1498–1504).
Zurück zum Zitat Velardi, P., Faralli, S., & Navigli, R. (2013). Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics, 39(3), 665–707.CrossRef Velardi, P., Faralli, S., & Navigli, R. (2013). Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics, 39(3), 665–707.CrossRef
Zurück zum Zitat Volker, J., Vrandecic, J., Sure, J., & Hotho, A. (2010). AEON: An approach to the automatic evaluation of ontologies. Journal of Applied Ontology, 3, 41–62. Volker, J., Vrandecic, J., Sure, J., & Hotho, A. (2010). AEON: An approach to the automatic evaluation of ontologies. Journal of Applied Ontology, 3, 41–62.
Zurück zum Zitat Wagner, S., & Wagner, D. (2007). Comparing clusterings: An overview. Analysis, 4769, 1–19. Wagner, S., & Wagner, D. (2007). Comparing clusterings: An overview. Analysis, 4769, 1–19.
Zurück zum Zitat Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In ACL (pp. 133–138). Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In ACL (pp. 133–138).
Zurück zum Zitat Zavitsanos, E., Paliouras, G., & Vouros, G. A. (2011). Gold standard evaluation of ontology learning methods through ontology transformation and alignment. TKDE, 23, 1635–1648.MATH Zavitsanos, E., Paliouras, G., & Vouros, G. A. (2011). Gold standard evaluation of ontology learning methods through ontology transformation and alignment. TKDE, 23, 1635–1648.MATH
Metadaten
Titel
Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs
verfasst von
Guruprasad Nayak
Sourav Dutta
Deepak Ajwani
Patrick Nicholson
Alessandra Sala
Publikationsdatum
17.12.2018
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 3-4/2019
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-018-9345-y

Weitere Artikel der Ausgabe 3-4/2019

Discover Computing 3-4/2019 Zur Ausgabe

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Identifying and exploiting target entity type information for ad hoc entity retrieval

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Neural architecture for question answering using a knowledge graph and web corpus

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Neural variational entity set expansion for automatically populated knowledge graphs

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Payoffs and pitfalls in using knowledge-bases for consumer health search

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Overcoming low-utility facets for complex answer retrieval

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Special issue on knowledge graphs and semantics in text analysis and retrieval