skip to main content
10.1145/2457317.2457352acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

A similarity measure for approximate querying over RDF data

Published:18 March 2013Publication History

ABSTRACT

Approximate query answering relies on a similarity measure that evaluates the relevance, for a given query, of a set of data extracted from the underlying database. In the context of graph-modeled data, many methods (such as, subgraph isomorphism, graph edit distance, and maximum common subgraph) have been proposed to face this problem. Unfortunately, they are usually hard to compute and when they are used on RDF data, several drawbacks arise. In this paper, we propose a measure to evaluate the similarity between a (small) graph representing a query and a portion of a (large) graph representing an RDF data set. We show that this measure: (i) can be evaluated in linear time with respect to the size of the given graphs and, (ii) guarantees other interesting properties. In order to show the feasibility of our approach, we have used such similarity measure in a technique for approximate query answering. The technique has been implemented in a prototypical system and a number of experimental results obtained with this system confirm the effectiveness of the proposed measure.

References

  1. C. Bizer and A. Schultz. The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst., 5(2):1--24, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. Dogma: A disk-oriented graph matching algorithm for rdf databases. In ISWC, pages 97--113, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. P. F. Chan and H. Lim. Optimization and evaluation of shortest path queries. VLDB J., 16(3):343--369, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. De Virgilio, F. Giunchiglia, and L. Tanca, editors. Semantic Web Information Management - A Model-Based Perspective. Springer Verlag, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. De Virgilio, F. Guerra, and Y. Velegrakis, editors. Semantic Search over the Web. Springer Verlag, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. De Virgilio, A. Maccioni, and R. Torlone. Approximate querying of rdf graphs via path alignment. Technical Report RT-DIA-200, http://www.dia.uniroma3.it/Plone/ricerca/technical-reports/2012, Dipartimento di Informatica e Automazione, November 2012 - {submitted}.Google ScholarGoogle Scholar
  8. R. De Virgilio, G. Orsi, L. Tanca, and R. Torlone. Nyaya: A system supporting the uniform management of large sets of semantic data. In ICDE, pages 1309--1312, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Fan and P. Bohannon. Information preserving xml schema embedding. ACM Trans. Database Syst., 33(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1):264--275, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Fellbaum, editor. WordNet An Electronic Lexical Database. The MIT Press, 1998.Google ScholarGoogle Scholar
  12. B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. Artificial Intelligence, pages 45--53, 2006.Google ScholarGoogle Scholar
  13. Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Sem., 3(2-3):158--182, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Hassanzadeh and M. P. Consens. Linked Movie Data Base (Triplification Challenge Report). In I-SEMANTICS, pages 194--196, 2008.Google ScholarGoogle Scholar
  15. W. Hu, N. Jian, Y. Qu, and Y. Wang. Gmo: A graph matching for ontologies. In Integrating Ontologies, 2005.Google ScholarGoogle Scholar
  16. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, pages 813--826, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Justice and A. O. Hero. A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell., 28(8):1200--1214, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neighborhood based fast graph search in large networks. In SIGMOD Conference, pages 901--912, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Ma, Y. Yang, Z. Qiu, G. T. Xie, Y. Pan, and S. Liu. Towards a complete owl ontology benchmark. In ESWC, pages 125--139, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Neumann and G. Weikum. x-rdf-3x: Fast querying, high update rates, and consistency for rdf databases. PVLDB, 3(1):256--263, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Poulovassilis and P. T. Wood. Combining approximation and relaxation in semantic web path queries. In ISWC, pages 631--646, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In ICDE Conference, pages 405--416, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25--36, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Zhang, S. Li, and J. Yang. Gaddi: distance index based subgraph matching in biological networks. In EDBT, pages 192--203, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Zhang, J. Yang, and W. Jin. Sapper: Subgraph indexing and approximate matching in large graphs. PVLDB, 3(1):1185--1194, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A similarity measure for approximate querying over RDF data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
          March 2013
          423 pages
          ISBN:9781450315999
          DOI:10.1145/2457317

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 March 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          EDBT '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate7of10submissions,70%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader