Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2020

23-11-2019

A multi-aspect approach to ontology matching based on Bayesian cluster ensembles

Authors: Andre Ippolito, Jorge Rady de Almeida Junior

Published in: Journal of Intelligent Information Systems | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the progressive increase in the number of existing ontologies, ontology matching became a challenging task. Ontology matching is a crucial step in the ontology integration process and its goal is to find correspondent elements in heterogeneous ontologies. A trend of clustering-based solutions for ontology matching has evolved, based on a divide-and-conquer strategy, which partitions ontologies, clusters similar partitions and restricts the matching to ontology elements of similar partitions. Nevertheless, most of these solutions considered solely the terminological aspect, ignoring other ontology aspects that can contribute to the final matching results. In this work, we developed a novel solution for ontology matching based on a consensus clustering of multiple aspects of ontology partitons. We partitioned the ontologies applying Community Detection techniques and applied Bayesian Cluster Ensembles (BCE) to find a consensus clustering among the terminological, topological and extensional aspects of ontology partitions. The matching results of our experimental study indicated that a BCE-based solution with three clusters best captured the contributions of the aspects, in comparison to other consensual solutions. The results corroborated the benefits of the synergy between the ontology aspects to the ontology alignment. We also verified that the BCE-based solution for three clusters yielded higher matching scores than other state-of-the-art solutions. Besides, our proposed methods structurize a configurable framework, which allows adding other ontology aspects and also other techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Algergawy, A., Massmann, S., Rahm, E. (2011). A clustering-based approach for large-scale ontology matching. ADBIS, 6909, 415–428. Algergawy, A., Massmann, S., Rahm, E. (2011). A clustering-based approach for large-scale ontology matching. ADBIS, 6909, 415–428.
go back to reference Algergawy, A., Moawed, S., Sarhan, A., Eldosouky, A., Saake, G. (2014). Improving clustering-based schema matching using latent semantic indexing. Trans Large-Scale Data- and Knowledge-Centered Systems, 15, 102–123. Algergawy, A., Moawed, S., Sarhan, A., Eldosouky, A., Saake, G. (2014). Improving clustering-based schema matching using latent semantic indexing. Trans Large-Scale Data- and Knowledge-Centered Systems, 15, 102–123.
go back to reference Blei, D., Ng, A., Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine learning Research (3) 993–1022. Blei, D., Ng, A., Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine learning Research (3) 993–1022.
go back to reference Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10,008.CrossRef Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10,008.CrossRef
go back to reference Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20(2), 172–188.CrossRef Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20(2), 172–188.CrossRef
go back to reference Clauset, A., Newman, M.E.J., Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70, 066,111.CrossRef Clauset, A., Newman, M.E.J., Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70, 066,111.CrossRef
go back to reference Coskun, G, Rothe, M, Teymourian, K, Paschke, A. (2011). Applying community detection algorithms on ontologies for identifying concept groups, Frontiers in Artificial Intelligence and Applications, vol 230. IOS Press Books. Coskun, G, Rothe, M, Teymourian, K, Paschke, A. (2011). Applying community detection algorithms on ontologies for identifying concept groups, Frontiers in Artificial Intelligence and Applications, vol 230. IOS Press Books.
go back to reference Euzenat, J, & Shvaiko, P. (2013). Ontology matching. Springer. Euzenat, J, & Shvaiko, P. (2013). Ontology matching. Springer.
go back to reference Ferrara, A., Genta, L., Montanelli, S., Castano, S. (2015). Dimensional clustering of linked data: techniques and applications. Trans Large-Scale Data- and Knowledge-Centered Systems, 19, 55–86.MathSciNetCrossRef Ferrara, A., Genta, L., Montanelli, S., Castano, S. (2015). Dimensional clustering of linked data: techniques and applications. Trans Large-Scale Data- and Knowledge-Centered Systems, 19, 55–86.MathSciNetCrossRef
go back to reference Fortunato, S. (2009). Community detection in graphs. arXiv:0906. Fortunato, S. (2009). Community detection in graphs. arXiv:0906.
go back to reference Ghosh, J, & Acharya, A. (2013). Cluster ensembles: theory and applications. In: Data Clustering: Algorithms and Applications, pp 551–570. Ghosh, J, & Acharya, A. (2013). Cluster ensembles: theory and applications. In: Data Clustering: Algorithms and Applications, pp 551–570.
go back to reference Harary, F. (1969). Graph theory. Addison-Wesley. Harary, F. (1969). Graph theory. Addison-Wesley.
go back to reference Honkela, T., Hyvärinen, A, Väyrynen, JJ. (2010). Wordica - emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3), 277–308.CrossRef Honkela, T., Hyvärinen, A, Väyrynen, JJ. (2010). Wordica - emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3), 277–308.CrossRef
go back to reference Hu, B., Kalfoglou, Y., Alani, H., Dupplaw, D., Lewis, P.H., Shadbolt, N. (2006). Semantic metrics. In Staab, S., & Svátek, V. (Eds.) EKAW, (Vol. 4248 pp. 166–181). Berlin: Springer, Lecture Notes in Computer Science. Hu, B., Kalfoglou, Y., Alani, H., Dupplaw, D., Lewis, P.H., Shadbolt, N. (2006). Semantic metrics. In Staab, S., & Svátek, V. (Eds.) EKAW, (Vol. 4248 pp. 166–181). Berlin: Springer, Lecture Notes in Computer Science.
go back to reference Hyvärinen, A, Karhunen, J, Oja, E. (2001). Independent component analysis. John Wiley and Sons. Hyvärinen, A, Karhunen, J, Oja, E. (2001). Independent component analysis. John Wiley and Sons.
go back to reference Ippolito, A., & de Almeida Junior, J.R. (2016). Ontology matching based on multi-aspect consensus clustering of communities. In Hammoudi, S., Maciaszek, L.A., Missikoff, M., Camp, O., Cordeiro, J. (Eds.) ICEIS 2016 - Proceedings of the 18th International Conference on Enterprise Information Systems, Volume 2, SciTePress (pp. 321–326). Ippolito, A., & de Almeida Junior, J.R. (2016). Ontology matching based on multi-aspect consensus clustering of communities. In Hammoudi, S., Maciaszek, L.A., Missikoff, M., Camp, O., Cordeiro, J. (Eds.) ICEIS 2016 - Proceedings of the 18th International Conference on Enterprise Information Systems, Volume 2, SciTePress (pp. 321–326).
go back to reference Jain, A.K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.CrossRef Jain, A.K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.CrossRef
go back to reference Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.CrossRef Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.CrossRef
go back to reference Karpis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359–392.MathSciNetCrossRef Karpis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359–392.MathSciNetCrossRef
go back to reference Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S. (1999). Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(1), 69–79.CrossRef Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S. (1999). Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(1), 69–79.CrossRef
go back to reference Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis. John Wiley. Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis. John Wiley.
go back to reference Kondrak, G. (2005). N-gram similarity and distance. In Consens, M.P., & Navarro, G. (Eds.) 12Th International Conference String Processing and Information Retrieval (SPIRE), (Vol. 3772 pp. 115–126). Berlin: Springer, Lecture Notes in Computer Science. Kondrak, G. (2005). N-gram similarity and distance. In Consens, M.P., & Navarro, G. (Eds.) 12Th International Conference String Processing and Information Retrieval (SPIRE), (Vol. 3772 pp. 115–126). Berlin: Springer, Lecture Notes in Computer Science.
go back to reference Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86.MathSciNetCrossRef Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86.MathSciNetCrossRef
go back to reference Landauer, T.K., Foltz, P.W., Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259–284.CrossRef Landauer, T.K., Foltz, P.W., Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259–284.CrossRef
go back to reference Levenshtein, V. (1966). Binary codes capable of correcting deletions and insertions and reversals. Soviet Physics Doklady, 10, 707–710.MathSciNet Levenshtein, V. (1966). Binary codes capable of correcting deletions and insertions and reversals. Soviet Physics Doklady, 10, 707–710.MathSciNet
go back to reference Manning, C.D., Raghavan, P, Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press. Manning, C.D., Raghavan, P, Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
go back to reference Miller, G.A. (1995). WordNet: a lexical database for english. Communications of the ACM, pp. 39–45. Miller, G.A. (1995). WordNet: a lexical database for english. Communications of the ACM, pp. 39–45.
go back to reference Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2), 191–218.MathSciNetCrossRef Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2), 191–218.MathSciNetCrossRef
go back to reference Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74, 016,110.MathSciNetCrossRef Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74, 016,110.MathSciNetCrossRef
go back to reference Rousseeuw, P. (1987). Silhouette: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.CrossRef Rousseeuw, P. (1987). Silhouette: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.CrossRef
go back to reference Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409–1438. Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409–1438.
go back to reference Strehl, A., & Ghosh, J. (2003). Cluster ensembles: a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.MathSciNetMATH Strehl, A., & Ghosh, J. (2003). Cluster ensembles: a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.MathSciNetMATH
go back to reference Tran, T., Wang, H., Haase, P. (2009). Hermes: Data web search on a pay-as-you-go integration infrastructure. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3), 189–203.CrossRef Tran, T., Wang, H., Haase, P. (2009). Hermes: Data web search on a pay-as-you-go integration infrastructure. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3), 189–203.CrossRef
go back to reference Wang, H., Shan, H., Banerjee, A. (2011). Bayesian cluster ensembles. Statistical Analysis and Data Mining, 4(1), 54–70.MathSciNetCrossRef Wang, H., Shan, H., Banerjee, A. (2011). Bayesian cluster ensembles. Statistical Analysis and Data Mining, 4(1), 54–70.MathSciNetCrossRef
go back to reference Ward, J. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRef Ward, J. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRef
go back to reference Wasserman, S., & Faust, K. (1994). Social network analysis: methods and applications. Cambridge University Press. Wasserman, S., & Faust, K. (1994). Social network analysis: methods and applications. Cambridge University Press.
go back to reference West, D.B. (2001). Introduction to graph theory, 2nd edn. Prentice Hall. West, D.B. (2001). Introduction to graph theory, 2nd edn. Prentice Hall.
go back to reference Zhang, H., Hu, W., Qu, Y. (2012). Vdoc+: a virtual document based approach for matching large ontologies using mapreduce. Journal of Zhejiang University - Science C, 13(4), 257–267.CrossRef Zhang, H., Hu, W., Qu, Y. (2012). Vdoc+: a virtual document based approach for matching large ontologies using mapreduce. Journal of Zhejiang University - Science C, 13(4), 257–267.CrossRef
Metadata
Title
A multi-aspect approach to ontology matching based on Bayesian cluster ensembles
Authors
Andre Ippolito
Jorge Rady de Almeida Junior
Publication date
23-11-2019
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2020
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00583-8

Other articles of this Issue 1/2020

Journal of Intelligent Information Systems 1/2020 Go to the issue

Premium Partner