skip to main content
10.1145/1739041.1739046acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Feedback-driven result ranking and query refinement for exploring semi-structured data collections

Published:22 March 2010Publication History

ABSTRACT

Feedback process has been used extensively in document-centric applications, such as text retrieval and multimedia retrieval. Recently, there have been efforts to apply feedback to semi-structured XML document collections as well. In this paper, we note that feedback can also be an effective tool for exploring (through result ranking and query refinement) large semi-structured data collections. In particular, in large scale data sharing and curation environments, where the user may not know the structure of the data, queries may initially be overly vague. Given a path query and a set of results identified by the system to this query over the data, we consider two types of feedback: Soft feedback captures the user's preference for some features over the others. Hard feedback, on the other hand, expresses users' assertions regarding whether certain features should be further enforced or, in contrast, are to be avoided. Both soft and hard feedback can be "positive" or "negative". For soft feedback, we develop a probabilistic feature significance measure and describe how to use this for ranking results in the presence of dependencies between the path features. To deal with the hard feedback efficiently (i.e., fast enough for interactive exploration), we present finite automata based query refinement solutions. In particular, we present a novel LazyDFA+ algorithm for managing hard feedback. We also describe optimizations that leverage the inherently iterative nature of the feedback process. We bring together these techniques in AXP, a system for adaptive and exploratory path retrieval. The experimental results show the effectiveness of the proposed techniques.

References

  1. Initiative for the evaluation of XML retrieval (INEX). http://www.inex.otago.ac.nz/.Google ScholarGoogle Scholar
  2. The penn treebank project, http://www.cis.upenn.edu/treebank/.Google ScholarGoogle Scholar
  3. Treebank search tools in the tiger project, http://www.ims.unistuttgart.de/projekte/tiger/related/links.shtml#xml.Google ScholarGoogle Scholar
  4. S. Amer-Yahia, D. Hiemstra, T. Roelleke, D. Srivastava, and G. Weikum. Db&ir integration: Report on the dagstuhl seminar "ranked xml querying". SIGMOD Record, 37(3):46--49, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. S. Candan, H. Cao, Y. Qi, and M. L. Sapino. System support for exploration and expert feedback in resolving conflicts during integration of metadata. VLDB J., 17(6):1407--1444, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Cao, Y. Qi, K. S. Candan, and M. L. Sapino. Exploring path query results through relevance feedback. In CIKM, pages 1959--1962, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Z. Chen and R. K. Wong. Optimizing the lazy dfa approach for xml stream processing. In ADC: Proceedings of the 15th Australasian database conference, pages 131--140, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Diao and M. J. Franklin. High-performance xml filtering: An overview of yfilter. IEEE Data Eng. Bull., 26(1):41--48, 2003.Google ScholarGoogle Scholar
  10. M. Ferecatu, M. Crucianu, and N. Boujemaa. Improving performance of interactive categorization of images using relevance feedback. In ICIP (1), pages 1197--1200, 2005.Google ScholarGoogle Scholar
  11. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Gou and R. Chirkova. Efficiently querying large xml data repositories: A survey. TKDE, 19(10):1381--1403, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing xml streams with deterministic automata and stream indexes. ACM Trans. Database Syst., 29(4):752--788, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. K. Gupta and D. Suciu. Stream processing of xpath queries with predicates. In SIGMOD '03, pages 419--430, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Hlaoua, M. Boughanem, and K. Pinel-Sauvagnat. Combination of evidences in relevance feedback for xml retrieval. In CIKM '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. E. Khalefa, M. F. Mokbel, and J. J. Levandoski. Skyline query processing for incomplete data. In ICDE, pages 556--565, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. W. Kim and K. S. Candan. Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W.-S. Li, K. S. Candan, K. Hirata, and Y. Hara. Supporting efficient multimedia database exploration. VLDB J., 9(4):312--326, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Moro, P. Bakalov, and V. Tsotras. Early profile pruning on xml-aware publish-subscribe systems. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Pan. Relevance feedback in xml retrieval. In EDBT Workshops, pages 187--196, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Pan, R. Schenkel, and G. Weikum. Fine-grained relevance feedback for xml retrieval. In SIGIR '08, pages 887--887, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Qi, K. S. Candan, and M. L. Sapino. Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. In SIGMOD '07, pages 151--162, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Qi, K. S. Candan, and M. L. Sapino. Sum-max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. In VLDB, pages 507--518, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Qi, K. S. Candan, M. L. Sapino, and K. W. Kintigh. Integrating and querying taxonomies with quest in the presence of conflicts. In SIGMOD Conference, pages 1153--1155, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Rocchio. Relevance Feedback in Information Retrieval, pages 313--323. 1971.Google ScholarGoogle Scholar
  26. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD Conference, pages 71--79, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev., 18(2), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Sauvagnat, L. Hlaoua, and M. Boughanem. Xfirm at inex 2005: Ad-hoc and relevance feedback tracks. In INEX, pages 88--103, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Schenkel and M. Theobald. Feedback-driven structural query expansion for ranked retrieval of xml data. In EDBT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Schenkel and M. Theobald. Structural feedback for keyword-based xml retrieval. In ECIR, pages 326--337, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424--445, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Theobald, A. Broschart, R. Schenkel, S. Solomon, and G. Weikum. Topx: Adhoc track and feedback task. In 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006.Google ScholarGoogle Scholar
  34. X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In SIGIR '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Weber. Using relevance feedback in xml retrieval. In Intelligent Search on XML Data, pages 133--143, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  36. C. T. Yu, W. S. Luk, and T. Y. Cheung. A statistical model for relevance feedback in information retrieval. J. ACM, 23(2), 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Feedback-driven result ranking and query refinement for exploring semi-structured data collections

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
      March 2010
      741 pages
      ISBN:9781605589459
      DOI:10.1145/1739041

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 March 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate7of10submissions,70%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader