skip to main content
10.1145/2484028.2484139acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Building a web test collection using social media

Published:28 July 2013Publication History

ABSTRACT

Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test collection for web search by exploiting the link information from this type of social media data. We propose to build the test collection by regarding CQA questions as queries and the associated linked web pages as relevant documents. To evaluate this approach, we collect approximately ten thousand CQA queries, whose answers contained links to ClueWeb09 documents after spam filtering. Experimental results using this collection show that the relative effectiveness between different retrieval models on the ClueWeb-CQA query set is consistent with that on the TREC Web Track query sets, confirming the reliability of our test collection. Further analysis shows that the large number of queries generated through this approach compensates for the sparse relevance judgments in determining significant differences.

References

  1. C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR, SIGIR '04, pages 25--32, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Buttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proc. of SIGIR, SIGIR '07, pages 63--70, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proc. of SIGIR, SIGIR '06, pages 268--275, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Carterette, V. Pavlu, E. Kanoulas, J. A. Aslam, and J. Allan. Evaluation over thousands of queries. In Proc. of SIGIR, SIGIR '08, pages 651--658, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. CoRR, abs/1004.5168, 2010.Google ScholarGoogle Scholar
  6. S. Huston and W. B. Croft. Evaluating verbose query processing techniques. In Proc. of SIGIR, SIGIR '10, pages 291--298, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Jones, C. Van Rijsbergen, B. L. Research, and D. Dept. Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. British Library Research and Development reports. 1975.Google ScholarGoogle Scholar
  8. V. Lavrenko and W. B. Croft. Relevance based language models. In Proc. of SIGIR, SIGIR '01, pages 120--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR, SIGIR '05, pages 472--479, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Metzler and W. B. Croft. Latent concept expansion using markov random fields. In Proc. of SIGIR, SIGIR '07, pages 311--318, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Building a web test collection using social media

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
        July 2013
        1188 pages
        ISBN:9781450320344
        DOI:10.1145/2484028

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 July 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader