Skip to main content
Top
Published in: Discover Computing 5/2010

01-10-2010 | Focused Retrieval and Result Aggr.

Expected reading effort in focused retrieval evaluation

Authors: Paavo Arvola, Jaana Kekäläinen, Marko Junkkari

Published in: Discover Computing | Issue 5/2010

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This study introduces a novel framework for evaluating passage and XML retrieval. The framework focuses on a user’s effort to localize relevant content in a result document. Measuring the effort is based on a system guided reading order of documents. The effort is calculated as the quantity of text the user is expected to browse through. More specifically, this study seeks evaluation metrics for retrieval methods following a specific fetch and browse approach, where in the fetch phase documents are ranked in decreasing order according to their document score, like in document retrieval. In the browse phase, for each retrieved document, a set of non-overlapping passages representing the relevant text within the document is retrieved. In other words, the passages of the document are re-organized, so that the best matching passages are read first in sequential order. We introduce an application scenario motivating the framework, and propose sample metrics based on the framework. These metrics give a basis for the comparison of effectiveness between traditional document retrieval and passage/XML retrieval and illuminate the benefit of passage/XML retrieval.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Ali, M. S., Consens, M. P., Kazai, G., & Lalmas, M. (2008). Structural relevance: A common basis for the evaluation of structured document retrieval. In Proceedings of CIKM ‘08 (pp. 1153–1162). Ali, M. S., Consens, M. P., Kazai, G., & Lalmas, M. (2008). Structural relevance: A common basis for the evaluation of structured document retrieval. In Proceedings of CIKM ‘08 (pp. 1153–1162).
go back to reference Allan, J. (2004). Hard track overview in TREC 2004: High accuracy retrieval from documents. In Proceedings of the 13th text retrieval conference (TREC 2004). Nist Special Publication, SP 500-261, 11 pages. Allan, J. (2004). Hard track overview in TREC 2004: High accuracy retrieval from documents. In Proceedings of the 13th text retrieval conference (TREC 2004). Nist Special Publication, SP 500-261, 11 pages.
go back to reference Arvola, P., Junkkari, M., & Kekäläinen, J. (2006). Applying XML retrieval methods for result document navigation in small screen devices. In Proceedings of MobileHCI workshop for ubiquitous information access (pp. 6–10). Arvola, P., Junkkari, M., & Kekäläinen, J. (2006). Applying XML retrieval methods for result document navigation in small screen devices. In Proceedings of MobileHCI workshop for ubiquitous information access (pp. 6–10).
go back to reference Buyukkokten, O., Garcia-Molina, H., Paepcke, A., & Winograd, T. (2000). Power browser: Efficient web browsing for PDAs. In Proceedings of CHI ‘2000 (pp. 430–437). Buyukkokten, O., Garcia-Molina, H., Paepcke, A., & Winograd, T. (2000). Power browser: Efficient web browsing for PDAs. In Proceedings of CHI ‘2000 (pp. 430–437).
go back to reference Chiaramella, Y., Mulhem, P., & Fourel, F. (1996). A model for multimedia search information retrieval. Technical report, basic research action FERMI 8134. Chiaramella, Y., Mulhem, P., & Fourel, F. (1996). A model for multimedia search information retrieval. Technical report, basic research action FERMI 8134.
go back to reference Cooper, W. (1968). Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19(1), 30–41.CrossRef Cooper, W. (1968). Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19(1), 30–41.CrossRef
go back to reference de Vries, A. P., Kazai, G., & Lalmas, M. (2004). Tolerance to irrelevance: A user-effort oriented evaluation of retrieval systems without predefined retrieval unit. In Proceedings of RIAO 2004 (pp. 463–473). de Vries, A. P., Kazai, G., & Lalmas, M. (2004). Tolerance to irrelevance: A user-effort oriented evaluation of retrieval systems without predefined retrieval unit. In Proceedings of RIAO 2004 (pp. 463–473).
go back to reference Denoyer, L., & Gallinari, P. (2006). The Wikipedia XML corpus. SIGIR Forum, 40(1), 64–69.CrossRef Denoyer, L., & Gallinari, P. (2006). The Wikipedia XML corpus. SIGIR Forum, 40(1), 64–69.CrossRef
go back to reference Dunlop, M. D. (1997). Time, relevance and interaction modelling for information retrieval. In Proceedings of SIGIR ‘97 (pp. 206–212). Dunlop, M. D. (1997). Time, relevance and interaction modelling for information retrieval. In Proceedings of SIGIR ‘97 (pp. 206–212).
go back to reference Finesilver K., & Reid J. (2003). User behaviour in the context of structured documents. In Proceedings of ECIR 2003, LNCS 2633 (pp. 104–119). Finesilver K., & Reid J. (2003). User behaviour in the context of structured documents. In Proceedings of ECIR 2003, LNCS 2633 (pp. 104–119).
go back to reference Hyönä, J., & Nurminen, A.-M. (2006). Do adult readers know how they read? Evidence from eye movement patterns and verbal reports. British Journal of Psychology, 97(1), 31–50.CrossRef Hyönä, J., & Nurminen, A.-M. (2006). Do adult readers know how they read? Evidence from eye movement patterns and verbal reports. British Journal of Psychology, 97(1), 31–50.CrossRef
go back to reference Ibekwe-SanJuan, F., & SanJuan, E. (2009). Use of multiword terms and query expansion for interactive information retrieval. In Advances in Focused Retrieval, LNCS 5631 (pp. 54–64). Ibekwe-SanJuan, F., & SanJuan, E. (2009). Use of multiword terms and query expansion for interactive information retrieval. In Advances in Focused Retrieval, LNCS 5631 (pp. 54–64).
go back to reference Itakura, K., & Clarke, C. L. K. (2009). University of Waterloo at INEX 2008: Adhoc, book, and link-the-wiki tracks. In Advances in Focused Retrieval, LNCS 5631 (pp. 132–139). Itakura, K., & Clarke, C. L. K. (2009). University of Waterloo at INEX 2008: Adhoc, book, and link-the-wiki tracks. In Advances in Focused Retrieval, LNCS 5631 (pp. 132–139).
go back to reference Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transaction on Information Systems, 20(4), 422–446.CrossRef Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transaction on Information Systems, 20(4), 422–446.CrossRef
go back to reference Jones, M., Buchanan, G., & Mohd-Nasir, N. (1999). Evaluation of WebTwig—a site outliner for handheld Web access. In Proceedings of international symposium on handheld and ubiquitous computing, LNCS 1707 (pp. 343–345). Jones, M., Buchanan, G., & Mohd-Nasir, N. (1999). Evaluation of WebTwig—a site outliner for handheld Web access. In Proceedings of international symposium on handheld and ubiquitous computing, LNCS 1707 (pp. 343–345).
go back to reference Kamps, J., Geva, S., Trotman, A., Woodley, A., & Koolen, M. (2008c). Overview of the INEX 2008 ad hoc track. In INEX 2008 workshop pre-proceedings (pp. 1–28). Kamps, J., Geva, S., Trotman, A., Woodley, A., & Koolen, M. (2008c). Overview of the INEX 2008 ad hoc track. In INEX 2008 workshop pre-proceedings (pp. 1–28).
go back to reference Kamps, J., Koolen, M., & Lalmas, M. (2008a). Locating relevant text within XML documents. In Proceedings of SIGIR’08 (pp. 847–848). Kamps, J., Koolen, M., & Lalmas, M. (2008a). Locating relevant text within XML documents. In Proceedings of SIGIR’08 (pp. 847–848).
go back to reference Kamps, J., Lalmas, M., & Pehcevski, J. (2007). Evaluating relevant in context: Document retrieval with a twist. In Proceedings SIGIR ‘07 (pp. 749–750). Kamps, J., Lalmas, M., & Pehcevski, J. (2007). Evaluating relevant in context: Document retrieval with a twist. In Proceedings SIGIR ‘07 (pp. 749–750).
go back to reference Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., & Robertson, S. (2008b). INEX 2007 evaluation measures. In INEX 2007, LNCS 4862 (pp. 24–33). Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., & Robertson, S. (2008b). INEX 2007 evaluation measures. In INEX 2007, LNCS 4862 (pp. 24–33).
go back to reference Kazai, G., & Lalmas, M. (2006). Extended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Transaction on Information Systems, 24(4), 503–542.CrossRef Kazai, G., & Lalmas, M. (2006). Extended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Transaction on Information Systems, 24(4), 503–542.CrossRef
go back to reference Kekäläinen, J., & Järvelin, K. (2002). Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology, 53, 1120–1129.CrossRef Kekäläinen, J., & Järvelin, K. (2002). Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology, 53, 1120–1129.CrossRef
go back to reference Piwowarski, P. (2006). EPRUM metrics and INEX 2005. In Proceedings of INEX 2005, LNCS 3977 (pp. 30–42). Piwowarski, P. (2006). EPRUM metrics and INEX 2005. In Proceedings of INEX 2005, LNCS 3977 (pp. 30–42).
go back to reference Piwowarski, B., & Dupret, G. (2006). Evaluation in (XML) information retrieval: Expected precision-recall with user modelling (EPRUM). In Proceedings of SIGIR’06 (pp. 260–267). Piwowarski, B., & Dupret, G. (2006). Evaluation in (XML) information retrieval: Expected precision-recall with user modelling (EPRUM). In Proceedings of SIGIR’06 (pp. 260–267).
go back to reference Piwowarski, B., & Lalmas, M. (2004). Providing consistent and exhaustive relevance assessments for XML retrieval evaluation. In Proceedings of CIKM ‘04 (pp. 361–370). Piwowarski, B., & Lalmas, M. (2004). Providing consistent and exhaustive relevance assessments for XML retrieval evaluation. In Proceedings of CIKM ‘04 (pp. 361–370).
go back to reference Reid, J., Lalmas, M., Finesilver, K., & Hertzum, M. (2006). Best entry points for structured document retrieval: Parts I & II. Information Processing and Management, 42, 74–105.CrossRef Reid, J., Lalmas, M., Finesilver, K., & Hertzum, M. (2006). Best entry points for structured document retrieval: Parts I & II. Information Processing and Management, 42, 74–105.CrossRef
go back to reference Robertson, S. (2008). A new interpretation of average precision. In Proceedings of SIGIR ‘08 (pp. 689–690). Robertson, S. (2008). A new interpretation of average precision. In Proceedings of SIGIR ‘08 (pp. 689–690).
go back to reference Saracevic, T. (1996). Relevance reconsidered ‘96. In Proceedings of CoLIS (pp. 201–218). Saracevic, T. (1996). Relevance reconsidered ‘96. In Proceedings of CoLIS (pp. 201–218).
go back to reference Tombros, A., Larsen, B., & Malik, S. (2005). Report on the INEX 2004 interactive track. SIGIR Forum, 39, 43–49.CrossRef Tombros, A., Larsen, B., & Malik, S. (2005). Report on the INEX 2004 interactive track. SIGIR Forum, 39, 43–49.CrossRef
go back to reference Trotman, A., Pharo, N., & Lehtonen, M. (2007). XML IR users and use cases. In Proceedings of INEX 2006, LNCS 4518 (pp. 400–412). Trotman, A., Pharo, N., & Lehtonen, M. (2007). XML IR users and use cases. In Proceedings of INEX 2006, LNCS 4518 (pp. 400–412).
Metadata
Title
Expected reading effort in focused retrieval evaluation
Authors
Paavo Arvola
Jaana Kekäläinen
Marko Junkkari
Publication date
01-10-2010
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 5/2010
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9133-9

Other articles of this Issue 5/2010

Discover Computing 5/2010 Go to the issue

Premium Partner