short-paper

Building a web test collection using social media

Authors:
Chia-Jung Lee

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
W. Bruce Croft

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalJuly 2013Pages 757–760https://doi.org/10.1145/2484028.2484139

Published:28 July 2013Publication History

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 757–760

ABSTRACT

Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test collection for web search by exploiting the link information from this type of social media data. We propose to build the test collection by regarding CQA questions as queries and the associated linked web pages as relevant documents. To evaluate this approach, we collect approximately ten thousand CQA queries, whose answers contained links to ClueWeb09 documents after spam filtering. Experimental results using this collection show that the relative effectiveness between different retrieval models on the ClueWeb-CQA query set is consistent with that on the TREC Web Track query sets, confirming the reliability of our test collection. Further analysis shows that the large number of queries generated through this approach compensates for the sparse relevance judgments in determining significant differences.

References

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR, SIGIR '04, pages 25--32, 2004. Google ScholarDigital Library
S. Buttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proc. of SIGIR, SIGIR '07, pages 63--70, 2007. Google ScholarDigital Library
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proc. of SIGIR, SIGIR '06, pages 268--275, 2006. Google ScholarDigital Library
B. Carterette, V. Pavlu, E. Kanoulas, J. A. Aslam, and J. Allan. Evaluation over thousands of queries. In Proc. of SIGIR, SIGIR '08, pages 651--658, 2008. Google ScholarDigital Library
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. CoRR, abs/1004.5168, 2010.Google Scholar
S. Huston and W. B. Croft. Evaluating verbose query processing techniques. In Proc. of SIGIR, SIGIR '10, pages 291--298, 2010. Google ScholarDigital Library
K. Jones, C. Van Rijsbergen, B. L. Research, and D. Dept. Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. British Library Research and Development reports. 1975.Google Scholar
V. Lavrenko and W. B. Croft. Relevance based language models. In Proc. of SIGIR, SIGIR '01, pages 120--127, 2001. Google ScholarDigital Library
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR, SIGIR '05, pages 472--479, 2005. Google ScholarDigital Library
D. Metzler and W. B. Croft. Latent concept expansion using markov random fields. In Proc. of SIGIR, SIGIR '07, pages 311--318, 2007. Google ScholarDigital Library

Index Terms

Building a web test collection using social media
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Predicting web searcher satisfaction with existing community-based answers
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Community-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 ...
Read More
Using graded-relevance metrics for evaluating community QA answer selection
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even ...
Read More
A Test Collection for Ad-hoc Dataset Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

This paper introduces a new test collection for ad-hoc dataset retrieval, which have been developed through a shared task called Data Search in the fifteenth NTCIR. This test collection consists of dataset collections derived from the US and Japanese ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
July 2013
1188 pages
ISBN:9781450320344
DOI:10.1145/2484028
General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community question answering
social media
test collection
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 295
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Building a web test collection using social media

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting web searcher satisfaction with existing community-based answers

Using graded-relevance metrics for evaluating community QA answer selection

A Test Collection for Ad-hoc Dataset Retrieval