research-article

Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

Authors:
Flávio Martins

Universidade NOVA de Lisboa, Caparica, Portugal

Universidade NOVA de Lisboa, Caparica, Portugal
View Profile

,
João Magalhães

Universidade NOVA de Lisboa, Caparica, Portugal

Universidade NOVA de Lisboa, Caparica, Portugal
View Profile

,
Jamie Callan

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningFebruary 2016Pages 667–676https://doi.org/10.1145/2835776.2835825

Published:08 February 2016Publication History

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

Pages 667–676

ABSTRACT

In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event.

In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.

References

M. Bendersky, D. Metzler, and W. B. Croft. Effective query formulation with multiple information sources. In Proceedings of WSDM '12, 2012, 443--452. Google ScholarDigital Library
J. Choi, W. B. Croft, and J. Y. Kim. Quality models for microblog retrieval. In Proceedings of CIKM '12, 2012, 1834--1838. Google ScholarDigital Library
Ciglan and K. Nørvåg. WikiPop: Personalized event detection system based on wikipedia page view statistics. In Proceedings of CIKM '10, 2010, 1931--1932. Google ScholarDigital Library
M. Costa, F. Couto, and M. Silva. Learning temporal-dependent ranking models. In Proceedings of SIGIR '14, 2014, 757--766. Google ScholarDigital Library
N. Dai, M. Shokouhi, and B. D. Davison. Learning to rank for freshness and relevance. In Proceedings of SIGIR '11, 2011, 95--104. Google ScholarDigital Library
W. Dakka, L. Gravano, and P. Ipeirotis. Answering general time-sensitive queries. IEEE Trans. Knowl. Data Eng., 24 (2): 220--235, 2012. Google ScholarDigital Library
M. Efron. Information search and retrieval in microblogs. J. Am. Soc. Inf. Sci. Technol., 62 (6): 996--1008, 2011. Google ScholarDigital Library
M. Efron and G. Golovchinsky. Estimation methods for ranking recent information. In Proceedings of SIGIR '11, 2011, 495--504. Google ScholarDigital Library
M. Efron, J. Lin, J. He, and A. de Vries. Temporal feedback for tweet search with non-parametric density estimation. In Proceedings of SIGIR '14, 2014, 33--42. Google ScholarDigital Library
M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl, and S. Siersdorfer. Extracting event-related information from article updates in wikipedia. In Proceedings of ECIR'13, 2013, 254--266. Google ScholarDigital Library
R. Jones and F. Diaz. Temporal profiles of queries. ACM Trans Inf Syst, 25 (3), 2007. Google ScholarDigital Library
Kanhabua and K. Nørvåg. Learning to rank search results for time-sensitive queries. In Proceedings of CIKM '12, 2012, 2463--2466. Google ScholarDigital Library
N. Kanhabua, T. Ngoc Nguyen, and W. Nejdl. Learning to detect event-related queries for web search. In Proceedings of WWW '15 Companion, 2015, 1339--1344. Google ScholarDigital Library
Y. Kim, R. Yeniterzi, and J. Callan. Overcoming vocabulary limitations in twitter microblogs. In Proceedings of TREC 2012, 2012.Google Scholar
X. Li and W. B. Croft. Time-based language models. In Proceedings of CIKM '03, 2003, 469--475. Google ScholarDigital Library
K. Massoudi, M. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In Proceedings of ECIR'11, 2011, 362--367. Google ScholarDigital Library
D. Metzler and W. B. Croft. Linear feature-based models for information retrieval. Inf Retrieval, (3), 2007. Google ScholarDigital Library
R. B. Nattiya Kanhabua and K. Nørvåg. Temporal information retrieval. Found. Trends® Inf. Retr., 9 (2): 91--208, 2015. Google ScholarDigital Library
M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Inf Retrieval, 1--35, 2013. Google ScholarDigital Library
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR '98, 1998, 275--281. Google ScholarDigital Library
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC 1994, 1994.Google Scholar
T. Sakai. Statistical reform in information retrieval? SIGIR Forum, 48 (1): 3--12, 2014. Google ScholarDigital Library
F. Schilder and C. Habel. Temporal information extraction for temporal question answering. In New Directions in Question Answering, 2003, 35--44.Google Scholar
T. Steiner, S. van Hooland, and E. Summers. MJ no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection. In Proceedings of WWW '13 Companion, 2013, 791--794. Google ScholarDigital Library
J. Teevan, D. Ramage, and M. R. Morris. #TwitterSearch: A comparison of microblog search and web search. In Proceedings of WSDM '11, 2011, 35--44. Google ScholarDigital Library
W. Weerkamp and M. de Rijke. Credibility-inspired ranking for blog post retrieval. Inf Retrieval, 15 (3-4): 243--277, 2012. Google ScholarDigital Library
J. Weng, E.-P. Lim, J. Jiang, and Q. He. TwitterRank: Finding topic-sensitive influential twitterers. In Proceedings of WSDM '10, 2010, 261--270. Google ScholarDigital Library
S. Whiting, I. A. Klampanos, and J. M. Jose. Temporal pseudo-relevance feedback in microblog retrieval. In Advances in Information Retrieval, number 7224 in Lecture Notes in Computer Science. 2012. Google ScholarDigital Library
T. Xu, D. W. Oard, and P. McNamee. HLTCOE at TREC 2014: Microblog and clinical decision support. In Proceedings of The Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, November 19-21, 2014, 2014.Google Scholar
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst, 22 (2): 179--214, 2004. Google ScholarDigital Library
J. Ćwik and J. Mielniczuk. Data-dependent bandwidth choice for a grade density kernel estimate. Statistics & Probability Letters, 16 (5): 397--405, 1993.Google ScholarCross Ref

Index Terms

Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Modeling Temporal Evidence from External Collections
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). ...
Read More
An effective approach to tweets opinion retrieval

Opinion retrieval deals with finding relevant documents that express either a negative or positive opinion about some topic. Social Networks such as Twitter, where people routinely post opinions about almost any topic, are rich environments for ...
Read More
Breaking news on twitter
CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

After the news of Osama Bin Laden's death leaked through Twitter, many people wondered if Twitter would fundamentally change the way we produce, spread, and consume news. In this paper we provide an in-depth analysis of how the news broke and spread on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
February 2016
746 pages
ISBN:9781450337168
DOI:10.1145/2835776
General Chairs:
Paul N. Bennett
Microsoft Research
,
Vanja Josifovski
Pinterest
,
Program Chairs:
Jennifer Neville
Purdue University
,
Filip Radlinski
Microsoft
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
learning to rank
microblog search
social media
temporal information retrieval
time-aware ranking models
twitter
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '16 Paper Acceptance Rate67of368submissions,18%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 358
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling Temporal Evidence from External Collections

An effective approach to tweets opinion retrieval

Breaking news on twitter