research-article

Feedback-driven result ranking and query refinement for exploring semi-structured data collections

Authors:
Huiping Cao

Arizona State Univ., Tempe, AZ

Arizona State Univ., Tempe, AZ
View Profile

,
Yan Qi

Arizona State Univ., Tempe, AZ

Arizona State Univ., Tempe, AZ
View Profile

,
K. Selçuk Candan

Arizona State Univ., Tempe, AZ

Arizona State Univ., Tempe, AZ
View Profile

,
Maria Luisa Sapino

Univ. di Torino, Torino, Italy

Univ. di Torino, Torino, Italy
View Profile

EDBT '10: Proceedings of the 13th International Conference on Extending Database TechnologyMarch 2010Pages 3–14https://doi.org/10.1145/1739041.1739046

Published:22 March 2010Publication History

EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

Pages 3–14

ABSTRACT

Feedback process has been used extensively in document-centric applications, such as text retrieval and multimedia retrieval. Recently, there have been efforts to apply feedback to semi-structured XML document collections as well. In this paper, we note that feedback can also be an effective tool for exploring (through result ranking and query refinement) large semi-structured data collections. In particular, in large scale data sharing and curation environments, where the user may not know the structure of the data, queries may initially be overly vague. Given a path query and a set of results identified by the system to this query over the data, we consider two types of feedback: Soft feedback captures the user's preference for some features over the others. Hard feedback, on the other hand, expresses users' assertions regarding whether certain features should be further enforced or, in contrast, are to be avoided. Both soft and hard feedback can be "positive" or "negative". For soft feedback, we develop a probabilistic feature significance measure and describe how to use this for ranking results in the presence of dependencies between the path features. To deal with the hard feedback efficiently (i.e., fast enough for interactive exploration), we present finite automata based query refinement solutions. In particular, we present a novel LazyDFA⁺ algorithm for managing hard feedback. We also describe optimizations that leverage the inherently iterative nature of the feedback process. We bring together these techniques in AXP, a system for adaptive and exploratory path retrieval. The experimental results show the effectiveness of the proposed techniques.

References

Initiative for the evaluation of XML retrieval (INEX). http://www.inex.otago.ac.nz/.Google Scholar
The penn treebank project, http://www.cis.upenn.edu/treebank/.Google Scholar
Treebank search tools in the tiger project, http://www.ims.unistuttgart.de/projekte/tiger/related/links.shtml#xml.Google Scholar
S. Amer-Yahia, D. Hiemstra, T. Roelleke, D. Srivastava, and G. Weikum. Db&ir integration: Report on the dagstuhl seminar "ranked xml querying". SIGMOD Record, 37(3):46--49, 2008. Google ScholarDigital Library
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
K. S. Candan, H. Cao, Y. Qi, and M. L. Sapino. System support for exploration and expert feedback in resolving conflicts during integration of metadata. VLDB J., 17(6):1407--1444, 2008. Google ScholarDigital Library
H. Cao, Y. Qi, K. S. Candan, and M. L. Sapino. Exploring path query results through relevance feedback. In CIKM, pages 1959--1962, 2009. Google ScholarDigital Library
D. Z. Chen and R. K. Wong. Optimizing the lazy dfa approach for xml stream processing. In ADC: Proceedings of the 15th Australasian database conference, pages 131--140, 2004. Google ScholarDigital Library
Y. Diao and M. J. Franklin. High-performance xml filtering: An overview of yfilter. IEEE Data Eng. Bull., 26(1):41--48, 2003.Google Scholar
M. Ferecatu, M. Crucianu, and N. Boujemaa. Improving performance of interactive categorization of images using relevance feedback. In ICIP (1), pages 1197--1200, 2005.Google Scholar
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997. Google ScholarDigital Library
G. Gou and R. Chirkova. Efficiently querying large xml data repositories: A survey. TKDE, 19(10):1381--1403, 2007. Google ScholarDigital Library
T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing xml streams with deterministic automata and stream indexes. ACM Trans. Database Syst., 29(4):752--788, 2004. Google ScholarDigital Library
A. K. Gupta and D. Suciu. Stream processing of xpath queries with predicates. In SIGMOD '03, pages 419--430, 2003. Google ScholarDigital Library
L. Hlaoua, M. Boughanem, and K. Pinel-Sauvagnat. Combination of evidences in relevance feedback for xml retrieval. In CIKM '07. Google ScholarDigital Library
M. E. Khalefa, M. F. Mokbel, and J. J. Levandoski. Skyline query processing for incomplete data. In ICDE, pages 556--565, 2008. Google ScholarDigital Library
J. W. Kim and K. S. Candan. Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. In SIGMOD, 2009. Google ScholarDigital Library
W.-S. Li, K. S. Candan, K. Hirata, and Y. Hara. Supporting efficient multimedia database exploration. VLDB J., 9(4):312--326, 2001. Google ScholarDigital Library
M. Moro, P. Bakalov, and V. Tsotras. Early profile pruning on xml-aware publish-subscribe systems. In VLDB, 2007. Google ScholarDigital Library
H. Pan. Relevance feedback in xml retrieval. In EDBT Workshops, pages 187--196, 2004. Google ScholarDigital Library
H. Pan, R. Schenkel, and G. Weikum. Fine-grained relevance feedback for xml retrieval. In SIGIR '08, pages 887--887, 2008. Google ScholarDigital Library
Y. Qi, K. S. Candan, and M. L. Sapino. Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. In SIGMOD '07, pages 151--162, 2007. Google ScholarDigital Library
Y. Qi, K. S. Candan, and M. L. Sapino. Sum-max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. In VLDB, pages 507--518, 2007. Google ScholarDigital Library
Y. Qi, K. S. Candan, M. L. Sapino, and K. W. Kintigh. Integrating and querying taxonomies with quest in the presence of conflicts. In SIGMOD Conference, pages 1153--1155, 2007. Google ScholarDigital Library
J. Rocchio. Relevance Feedback in Information Retrieval, pages 313--323. 1971.Google Scholar
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD Conference, pages 71--79, 1995. Google ScholarDigital Library
I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev., 18(2), 2003. Google ScholarDigital Library
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975. Google ScholarDigital Library
K. Sauvagnat, L. Hlaoua, and M. Boughanem. Xfirm at inex 2005: Ad-hoc and relevance feedback tracks. In INEX, pages 88--103, 2005. Google ScholarDigital Library
R. Schenkel and M. Theobald. Feedback-driven structural query expansion for ranked retrieval of xml data. In EDBT, 2006. Google ScholarDigital Library
R. Schenkel and M. Theobald. Structural feedback for keyword-based xml retrieval. In ECIR, pages 326--337, 2006. Google ScholarDigital Library
Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424--445, 2007. Google ScholarDigital Library
M. Theobald, A. Broschart, R. Schenkel, S. Solomon, and G. Weikum. Topx: Adhoc track and feedback task. In 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006.Google Scholar
X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In SIGIR '08, 2008. Google ScholarDigital Library
R. Weber. Using relevance feedback in xml retrieval. In Intelligent Search on XML Data, pages 133--143, 2003.Google ScholarCross Ref
C. T. Yu, W. S. Luk, and T. Y. Cheung. A statistical model for relevance feedback in information retrieval. J. ACM, 23(2), 1976. Google ScholarDigital Library

Index Terms

Feedback-driven result ranking and query refinement for exploring semi-structured data collections
1. Information systems
  1. Information retrieval

Recommendations

Exploring path query results through relevance feedback
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Feedback driven data exploration schemes have been implemented for non-structured data (such as text) and document-centric XML collections where formulating precise queries is often impossible. In this paper, we study the problem of enabling exploratory ...
Read More
Flexible query facilities for heterogeneous semi-structured data
Read More
Query refinement suggestion in multimodal image retrieval with relevance feedback
ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

In the literature, it has been shown that relevance feedback is a good strategy for the system to interact with the user and provide better results in a content-based image retrieval (CBIR) system. On the other hand, there are many retrieval systems ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
March 2010
741 pages
ISBN:9781605589459
DOI:10.1145/1739041
Editors:
Ioana Manolescu
INRIA, France
,
Stefano Spaccapietra
EPFL, Switzerland
,
Jens Teubner
ETH Zurich, Switzerland
,
Masaru Kitsuregawa
Tokyo University, Japan
,
Alain Leger
Orange - France Telecom R&D, France
,
Felix Naumann
Hasso Plattner Institute, Germany
,
Anastasia Ailamaki
EPFL, Switzerland
,
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data-centric XML
feature cover
inter-dependent structural feature
relevance feedback
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 324
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Feedback-driven result ranking and query refinement for exploring semi-structured data collections

EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring path query results through relevance feedback

Flexible query facilities for heterogeneous semi-structured data

Query refinement suggestion in multimodal image retrieval with relevance feedback