research-article

A similarity measure for approximate querying over RDF data

Authors:
Roberto De Virgilio

Università Roma Tre, Rome, Italy

Università Roma Tre, Rome, Italy
View Profile

,
Antonio Maccioni

Università Roma Tre, Rome, Italy

Università Roma Tre, Rome, Italy
View Profile

,
Riccardo Torlone

Università Roma Tre, Rome, Italy

Università Roma Tre, Rome, Italy
View Profile

EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 WorkshopsMarch 2013Pages 205–213https://doi.org/10.1145/2457317.2457352

Published:18 March 2013Publication History

EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops

Pages 205–213

ABSTRACT

Approximate query answering relies on a similarity measure that evaluates the relevance, for a given query, of a set of data extracted from the underlying database. In the context of graph-modeled data, many methods (such as, subgraph isomorphism, graph edit distance, and maximum common subgraph) have been proposed to face this problem. Unfortunately, they are usually hard to compute and when they are used on RDF data, several drawbacks arise. In this paper, we propose a measure to evaluate the similarity between a (small) graph representing a query and a portion of a (large) graph representing an RDF data set. We show that this measure: (i) can be evaluated in linear time with respect to the size of the given graphs and, (ii) guarantees other interesting properties. In order to show the feasibility of our approach, we have used such similarity measure in a technique for approximate query answering. The technique has been implemented in a prototypical system and a number of experimental results obtained with this system confirm the effectiveness of the proposed measure.

References

C. Bizer and A. Schultz. The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst., 5(2):1--24, 2009.Google ScholarCross Ref
M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. Dogma: A disk-oriented graph matching algorithm for rdf databases. In ISWC, pages 97--113, 2009. Google ScholarDigital Library
E. P. F. Chan and H. Lim. Optimization and evaluation of shortest path queries. VLDB J., 16(3):343--369, 2007. Google ScholarDigital Library
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
R. De Virgilio, F. Giunchiglia, and L. Tanca, editors. Semantic Web Information Management - A Model-Based Perspective. Springer Verlag, 2010. Google ScholarDigital Library
R. De Virgilio, F. Guerra, and Y. Velegrakis, editors. Semantic Search over the Web. Springer Verlag, 2012. Google ScholarDigital Library
R. De Virgilio, A. Maccioni, and R. Torlone. Approximate querying of rdf graphs via path alignment. Technical Report RT-DIA-200, http://www.dia.uniroma3.it/Plone/ricerca/technical-reports/2012, Dipartimento di Informatica e Automazione, November 2012 - {submitted}.Google Scholar
R. De Virgilio, G. Orsi, L. Tanca, and R. Torlone. Nyaya: A system supporting the uniform management of large sets of semantic data. In ICDE, pages 1309--1312, 2012. Google ScholarDigital Library
W. Fan and P. Bohannon. Information preserving xml schema embedding. ACM Trans. Database Syst., 33(1), 2008. Google ScholarDigital Library
W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1):264--275, 2010. Google ScholarDigital Library
C. Fellbaum, editor. WordNet An Electronic Lexical Database. The MIT Press, 1998.Google Scholar
B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. Artificial Intelligence, pages 45--53, 2006.Google Scholar
Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Sem., 3(2-3):158--182, 2005. Google ScholarDigital Library
O. Hassanzadeh and M. P. Consens. Linked Movie Data Base (Triplification Challenge Report). In I-SEMANTICS, pages 194--196, 2008.Google Scholar
W. Hu, N. Jian, Y. Qu, and Y. Wang. Gmo: A graph matching for ontologies. In Integrating Ontologies, 2005.Google Scholar
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, pages 813--826, 2009. Google ScholarDigital Library
D. Justice and A. O. Hero. A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell., 28(8):1200--1214, 2006. Google ScholarDigital Library
A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neighborhood based fast graph search in large networks. In SIGMOD Conference, pages 901--912, 2011. Google ScholarDigital Library
L. Ma, Y. Yang, Z. Qiu, G. T. Xie, Y. Pan, and S. Liu. Towards a complete owl ontology benchmark. In ESWC, pages 125--139, 2006. Google ScholarDigital Library
T. Neumann and G. Weikum. x-rdf-3x: Fast querying, high update rates, and consistency for rdf databases. PVLDB, 3(1):256--263, 2010. Google ScholarDigital Library
A. Poulovassilis and P. T. Wood. Combining approximation and relaxation in semantic web path queries. In ISWC, pages 631--646, 2010. Google ScholarDigital Library
Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarDigital Library
T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In ICDE Conference, pages 405--416, 2009. Google ScholarDigital Library
P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, 2012. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25--36, 2009. Google ScholarDigital Library
S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarCross Ref
S. Zhang, S. Li, and J. Yang. Gaddi: distance index based subgraph matching in biological networks. In EDBT, pages 192--203, 2009. Google ScholarDigital Library
S. Zhang, J. Yang, and W. Jin. Sapper: Subgraph indexing and approximate matching in large graphs. PVLDB, 3(1):1185--1194, 2010. Google ScholarDigital Library
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarDigital Library

Index Terms

A similarity measure for approximate querying over RDF data
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
  2. Information retrieval
    1. Information retrieval query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

RDF approximate queries based on semantic similarity

In this paper, we propose a query relaxation approach to handle the problem of empty or too little answers returned from RDF query. We apply RDF entailment to triple patterns in the original query to get more general answers. We propose the notion of ...
Read More
Approximate querying of RDF graphs via path alignment

A query over RDF data is usually expressed in terms of matching between a graph representing the target and a huge graph representing the source. Unfortunately, graph matching is typically performed in terms of subgraph isomorphism, which makes semantic ...
Read More
A Semantic Similarity-based Subgraph Matching Method for Improving Question Answering over RDF
WWW '20: Companion Proceedings of the Web Conference 2020

RDF question/answering (Q/A) system can explore RDF data by translating natural language questions into SPARQL queries. In this poster, we design a generation-and-ranking approach to translate natural language questions into SPARQL queries based on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
March 2013
423 pages
ISBN:9781450315999
DOI:10.1145/2457317
General Chair:
Giovanna Guerrini
Università di Genova, Italy
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 March 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
RDF
graph matching
monotonicity
top-k
Qualifiers
- research-article
Conference

Acceptance Rates
EDBT '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate7of10submissions,70%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 234
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A similarity measure for approximate querying over RDF data

EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops

ABSTRACT

References

Cited By

Index Terms

Recommendations

RDF approximate queries based on semantic similarity

Approximate querying of RDF graphs via path alignment

A Semantic Similarity-based Subgraph Matching Method for Improving Question Answering over RDF