ABSTRACT
Approximate query answering relies on a similarity measure that evaluates the relevance, for a given query, of a set of data extracted from the underlying database. In the context of graph-modeled data, many methods (such as, subgraph isomorphism, graph edit distance, and maximum common subgraph) have been proposed to face this problem. Unfortunately, they are usually hard to compute and when they are used on RDF data, several drawbacks arise. In this paper, we propose a measure to evaluate the similarity between a (small) graph representing a query and a portion of a (large) graph representing an RDF data set. We show that this measure: (i) can be evaluated in linear time with respect to the size of the given graphs and, (ii) guarantees other interesting properties. In order to show the feasibility of our approach, we have used such similarity measure in a technique for approximate query answering. The technique has been implemented in a prototypical system and a number of experimental results obtained with this system confirm the effectiveness of the proposed measure.
- C. Bizer and A. Schultz. The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst., 5(2):1--24, 2009.Google ScholarCross Ref
- M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. Dogma: A disk-oriented graph matching algorithm for rdf databases. In ISWC, pages 97--113, 2009. Google ScholarDigital Library
- E. P. F. Chan and H. Lim. Optimization and evaluation of shortest path queries. VLDB J., 16(3):343--369, 2007. Google ScholarDigital Library
- J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
- R. De Virgilio, F. Giunchiglia, and L. Tanca, editors. Semantic Web Information Management - A Model-Based Perspective. Springer Verlag, 2010. Google ScholarDigital Library
- R. De Virgilio, F. Guerra, and Y. Velegrakis, editors. Semantic Search over the Web. Springer Verlag, 2012. Google ScholarDigital Library
- R. De Virgilio, A. Maccioni, and R. Torlone. Approximate querying of rdf graphs via path alignment. Technical Report RT-DIA-200, http://www.dia.uniroma3.it/Plone/ricerca/technical-reports/2012, Dipartimento di Informatica e Automazione, November 2012 - {submitted}.Google Scholar
- R. De Virgilio, G. Orsi, L. Tanca, and R. Torlone. Nyaya: A system supporting the uniform management of large sets of semantic data. In ICDE, pages 1309--1312, 2012. Google ScholarDigital Library
- W. Fan and P. Bohannon. Information preserving xml schema embedding. ACM Trans. Database Syst., 33(1), 2008. Google ScholarDigital Library
- W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1):264--275, 2010. Google ScholarDigital Library
- C. Fellbaum, editor. WordNet An Electronic Lexical Database. The MIT Press, 1998.Google Scholar
- B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. Artificial Intelligence, pages 45--53, 2006.Google Scholar
- Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Sem., 3(2-3):158--182, 2005. Google ScholarDigital Library
- O. Hassanzadeh and M. P. Consens. Linked Movie Data Base (Triplification Challenge Report). In I-SEMANTICS, pages 194--196, 2008.Google Scholar
- W. Hu, N. Jian, Y. Qu, and Y. Wang. Gmo: A graph matching for ontologies. In Integrating Ontologies, 2005.Google Scholar
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, pages 813--826, 2009. Google ScholarDigital Library
- D. Justice and A. O. Hero. A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell., 28(8):1200--1214, 2006. Google ScholarDigital Library
- A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neighborhood based fast graph search in large networks. In SIGMOD Conference, pages 901--912, 2011. Google ScholarDigital Library
- L. Ma, Y. Yang, Z. Qiu, G. T. Xie, Y. Pan, and S. Liu. Towards a complete owl ontology benchmark. In ESWC, pages 125--139, 2006. Google ScholarDigital Library
- T. Neumann and G. Weikum. x-rdf-3x: Fast querying, high update rates, and consistency for rdf databases. PVLDB, 3(1):256--263, 2010. Google ScholarDigital Library
- A. Poulovassilis and P. T. Wood. Combining approximation and relaxation in semantic web path queries. In ISWC, pages 631--646, 2010. Google ScholarDigital Library
- Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarDigital Library
- T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In ICDE Conference, pages 405--416, 2009. Google ScholarDigital Library
- P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, 2012. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25--36, 2009. Google ScholarDigital Library
- S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarCross Ref
- S. Zhang, S. Li, and J. Yang. Gaddi: distance index based subgraph matching in biological networks. In EDBT, pages 192--203, 2009. Google ScholarDigital Library
- S. Zhang, J. Yang, and W. Jin. Sapper: Subgraph indexing and approximate matching in large graphs. PVLDB, 3(1):1185--1194, 2010. Google ScholarDigital Library
- L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarDigital Library
Index Terms
- A similarity measure for approximate querying over RDF data
Recommendations
RDF approximate queries based on semantic similarity
In this paper, we propose a query relaxation approach to handle the problem of empty or too little answers returned from RDF query. We apply RDF entailment to triple patterns in the original query to get more general answers. We propose the notion of ...
Approximate querying of RDF graphs via path alignment
A query over RDF data is usually expressed in terms of matching between a graph representing the target and a huge graph representing the source. Unfortunately, graph matching is typically performed in terms of subgraph isomorphism, which makes semantic ...
A Semantic Similarity-based Subgraph Matching Method for Improving Question Answering over RDF
WWW '20: Companion Proceedings of the Web Conference 2020RDF question/answering (Q/A) system can explore RDF data by translating natural language questions into SPARQL queries. In this poster, we design a generation-and-ranking approach to translate natural language questions into SPARQL queries based on ...
Comments