ABSTRACT
Feedback process has been used extensively in document-centric applications, such as text retrieval and multimedia retrieval. Recently, there have been efforts to apply feedback to semi-structured XML document collections as well. In this paper, we note that feedback can also be an effective tool for exploring (through result ranking and query refinement) large semi-structured data collections. In particular, in large scale data sharing and curation environments, where the user may not know the structure of the data, queries may initially be overly vague. Given a path query and a set of results identified by the system to this query over the data, we consider two types of feedback: Soft feedback captures the user's preference for some features over the others. Hard feedback, on the other hand, expresses users' assertions regarding whether certain features should be further enforced or, in contrast, are to be avoided. Both soft and hard feedback can be "positive" or "negative". For soft feedback, we develop a probabilistic feature significance measure and describe how to use this for ranking results in the presence of dependencies between the path features. To deal with the hard feedback efficiently (i.e., fast enough for interactive exploration), we present finite automata based query refinement solutions. In particular, we present a novel LazyDFA+ algorithm for managing hard feedback. We also describe optimizations that leverage the inherently iterative nature of the feedback process. We bring together these techniques in AXP, a system for adaptive and exploratory path retrieval. The experimental results show the effectiveness of the proposed techniques.
- Initiative for the evaluation of XML retrieval (INEX). http://www.inex.otago.ac.nz/.Google Scholar
- The penn treebank project, http://www.cis.upenn.edu/treebank/.Google Scholar
- Treebank search tools in the tiger project, http://www.ims.unistuttgart.de/projekte/tiger/related/links.shtml#xml.Google Scholar
- S. Amer-Yahia, D. Hiemstra, T. Roelleke, D. Srivastava, and G. Weikum. Db&ir integration: Report on the dagstuhl seminar "ranked xml querying". SIGMOD Record, 37(3):46--49, 2008. Google ScholarDigital Library
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
- K. S. Candan, H. Cao, Y. Qi, and M. L. Sapino. System support for exploration and expert feedback in resolving conflicts during integration of metadata. VLDB J., 17(6):1407--1444, 2008. Google ScholarDigital Library
- H. Cao, Y. Qi, K. S. Candan, and M. L. Sapino. Exploring path query results through relevance feedback. In CIKM, pages 1959--1962, 2009. Google ScholarDigital Library
- D. Z. Chen and R. K. Wong. Optimizing the lazy dfa approach for xml stream processing. In ADC: Proceedings of the 15th Australasian database conference, pages 131--140, 2004. Google ScholarDigital Library
- Y. Diao and M. J. Franklin. High-performance xml filtering: An overview of yfilter. IEEE Data Eng. Bull., 26(1):41--48, 2003.Google Scholar
- M. Ferecatu, M. Crucianu, and N. Boujemaa. Improving performance of interactive categorization of images using relevance feedback. In ICIP (1), pages 1197--1200, 2005.Google Scholar
- R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997. Google ScholarDigital Library
- G. Gou and R. Chirkova. Efficiently querying large xml data repositories: A survey. TKDE, 19(10):1381--1403, 2007. Google ScholarDigital Library
- T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing xml streams with deterministic automata and stream indexes. ACM Trans. Database Syst., 29(4):752--788, 2004. Google ScholarDigital Library
- A. K. Gupta and D. Suciu. Stream processing of xpath queries with predicates. In SIGMOD '03, pages 419--430, 2003. Google ScholarDigital Library
- L. Hlaoua, M. Boughanem, and K. Pinel-Sauvagnat. Combination of evidences in relevance feedback for xml retrieval. In CIKM '07. Google ScholarDigital Library
- M. E. Khalefa, M. F. Mokbel, and J. J. Levandoski. Skyline query processing for incomplete data. In ICDE, pages 556--565, 2008. Google ScholarDigital Library
- J. W. Kim and K. S. Candan. Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. In SIGMOD, 2009. Google ScholarDigital Library
- W.-S. Li, K. S. Candan, K. Hirata, and Y. Hara. Supporting efficient multimedia database exploration. VLDB J., 9(4):312--326, 2001. Google ScholarDigital Library
- M. Moro, P. Bakalov, and V. Tsotras. Early profile pruning on xml-aware publish-subscribe systems. In VLDB, 2007. Google ScholarDigital Library
- H. Pan. Relevance feedback in xml retrieval. In EDBT Workshops, pages 187--196, 2004. Google ScholarDigital Library
- H. Pan, R. Schenkel, and G. Weikum. Fine-grained relevance feedback for xml retrieval. In SIGIR '08, pages 887--887, 2008. Google ScholarDigital Library
- Y. Qi, K. S. Candan, and M. L. Sapino. Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. In SIGMOD '07, pages 151--162, 2007. Google ScholarDigital Library
- Y. Qi, K. S. Candan, and M. L. Sapino. Sum-max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. In VLDB, pages 507--518, 2007. Google ScholarDigital Library
- Y. Qi, K. S. Candan, M. L. Sapino, and K. W. Kintigh. Integrating and querying taxonomies with quest in the presence of conflicts. In SIGMOD Conference, pages 1153--1155, 2007. Google ScholarDigital Library
- J. Rocchio. Relevance Feedback in Information Retrieval, pages 313--323. 1971.Google Scholar
- N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD Conference, pages 71--79, 1995. Google ScholarDigital Library
- I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev., 18(2), 2003. Google ScholarDigital Library
- G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975. Google ScholarDigital Library
- K. Sauvagnat, L. Hlaoua, and M. Boughanem. Xfirm at inex 2005: Ad-hoc and relevance feedback tracks. In INEX, pages 88--103, 2005. Google ScholarDigital Library
- R. Schenkel and M. Theobald. Feedback-driven structural query expansion for ranked retrieval of xml data. In EDBT, 2006. Google ScholarDigital Library
- R. Schenkel and M. Theobald. Structural feedback for keyword-based xml retrieval. In ECIR, pages 326--337, 2006. Google ScholarDigital Library
- Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424--445, 2007. Google ScholarDigital Library
- M. Theobald, A. Broschart, R. Schenkel, S. Solomon, and G. Weikum. Topx: Adhoc track and feedback task. In 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006.Google Scholar
- X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In SIGIR '08, 2008. Google ScholarDigital Library
- R. Weber. Using relevance feedback in xml retrieval. In Intelligent Search on XML Data, pages 133--143, 2003.Google ScholarCross Ref
- C. T. Yu, W. S. Luk, and T. Y. Cheung. A statistical model for relevance feedback in information retrieval. J. ACM, 23(2), 1976. Google ScholarDigital Library
Index Terms
- Feedback-driven result ranking and query refinement for exploring semi-structured data collections
Recommendations
Exploring path query results through relevance feedback
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementFeedback driven data exploration schemes have been implemented for non-structured data (such as text) and document-centric XML collections where formulating precise queries is often impossible. In this paper, we study the problem of enabling exploratory ...
Query refinement suggestion in multimodal image retrieval with relevance feedback
ICMI '11: Proceedings of the 13th international conference on multimodal interfacesIn the literature, it has been shown that relevance feedback is a good strategy for the system to interact with the user and provide better results in a content-based image retrieval (CBIR) system. On the other hand, there are many retrieval systems ...
Comments