ABSTRACT
In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event.
In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.
- M. Bendersky, D. Metzler, and W. B. Croft. Effective query formulation with multiple information sources. In Proceedings of WSDM '12, 2012, 443--452. Google ScholarDigital Library
- J. Choi, W. B. Croft, and J. Y. Kim. Quality models for microblog retrieval. In Proceedings of CIKM '12, 2012, 1834--1838. Google ScholarDigital Library
- Ciglan and K. Nørvåg. WikiPop: Personalized event detection system based on wikipedia page view statistics. In Proceedings of CIKM '10, 2010, 1931--1932. Google ScholarDigital Library
- M. Costa, F. Couto, and M. Silva. Learning temporal-dependent ranking models. In Proceedings of SIGIR '14, 2014, 757--766. Google ScholarDigital Library
- N. Dai, M. Shokouhi, and B. D. Davison. Learning to rank for freshness and relevance. In Proceedings of SIGIR '11, 2011, 95--104. Google ScholarDigital Library
- W. Dakka, L. Gravano, and P. Ipeirotis. Answering general time-sensitive queries. IEEE Trans. Knowl. Data Eng., 24 (2): 220--235, 2012. Google ScholarDigital Library
- M. Efron. Information search and retrieval in microblogs. J. Am. Soc. Inf. Sci. Technol., 62 (6): 996--1008, 2011. Google ScholarDigital Library
- M. Efron and G. Golovchinsky. Estimation methods for ranking recent information. In Proceedings of SIGIR '11, 2011, 495--504. Google ScholarDigital Library
- M. Efron, J. Lin, J. He, and A. de Vries. Temporal feedback for tweet search with non-parametric density estimation. In Proceedings of SIGIR '14, 2014, 33--42. Google ScholarDigital Library
- M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl, and S. Siersdorfer. Extracting event-related information from article updates in wikipedia. In Proceedings of ECIR'13, 2013, 254--266. Google ScholarDigital Library
- R. Jones and F. Diaz. Temporal profiles of queries. ACM Trans Inf Syst, 25 (3), 2007. Google ScholarDigital Library
- Kanhabua and K. Nørvåg. Learning to rank search results for time-sensitive queries. In Proceedings of CIKM '12, 2012, 2463--2466. Google ScholarDigital Library
- N. Kanhabua, T. Ngoc Nguyen, and W. Nejdl. Learning to detect event-related queries for web search. In Proceedings of WWW '15 Companion, 2015, 1339--1344. Google ScholarDigital Library
- Y. Kim, R. Yeniterzi, and J. Callan. Overcoming vocabulary limitations in twitter microblogs. In Proceedings of TREC 2012, 2012.Google Scholar
- X. Li and W. B. Croft. Time-based language models. In Proceedings of CIKM '03, 2003, 469--475. Google ScholarDigital Library
- K. Massoudi, M. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In Proceedings of ECIR'11, 2011, 362--367. Google ScholarDigital Library
- D. Metzler and W. B. Croft. Linear feature-based models for information retrieval. Inf Retrieval, (3), 2007. Google ScholarDigital Library
- R. B. Nattiya Kanhabua and K. Nørvåg. Temporal information retrieval. Found. Trends® Inf. Retr., 9 (2): 91--208, 2015. Google ScholarDigital Library
- M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Inf Retrieval, 1--35, 2013. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR '98, 1998, 275--281. Google ScholarDigital Library
- S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC 1994, 1994.Google Scholar
- T. Sakai. Statistical reform in information retrieval? SIGIR Forum, 48 (1): 3--12, 2014. Google ScholarDigital Library
- F. Schilder and C. Habel. Temporal information extraction for temporal question answering. In New Directions in Question Answering, 2003, 35--44.Google Scholar
- T. Steiner, S. van Hooland, and E. Summers. MJ no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection. In Proceedings of WWW '13 Companion, 2013, 791--794. Google ScholarDigital Library
- J. Teevan, D. Ramage, and M. R. Morris. #TwitterSearch: A comparison of microblog search and web search. In Proceedings of WSDM '11, 2011, 35--44. Google ScholarDigital Library
- W. Weerkamp and M. de Rijke. Credibility-inspired ranking for blog post retrieval. Inf Retrieval, 15 (3-4): 243--277, 2012. Google ScholarDigital Library
- J. Weng, E.-P. Lim, J. Jiang, and Q. He. TwitterRank: Finding topic-sensitive influential twitterers. In Proceedings of WSDM '10, 2010, 261--270. Google ScholarDigital Library
- S. Whiting, I. A. Klampanos, and J. M. Jose. Temporal pseudo-relevance feedback in microblog retrieval. In Advances in Information Retrieval, number 7224 in Lecture Notes in Computer Science. 2012. Google ScholarDigital Library
- T. Xu, D. W. Oard, and P. McNamee. HLTCOE at TREC 2014: Microblog and clinical decision support. In Proceedings of The Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, November 19-21, 2014, 2014.Google Scholar
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst, 22 (2): 179--214, 2004. Google ScholarDigital Library
- J. Ćwik and J. Mielniczuk. Data-dependent bandwidth choice for a grade density kernel estimate. Statistics & Probability Letters, 16 (5): 397--405, 1993.Google ScholarCross Ref
Index Terms
- Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank
Recommendations
Modeling Temporal Evidence from External Collections
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningNewsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). ...
An effective approach to tweets opinion retrieval
Opinion retrieval deals with finding relevant documents that express either a negative or positive opinion about some topic. Social Networks such as Twitter, where people routinely post opinions about almost any topic, are rich environments for ...
Breaking news on twitter
CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsAfter the news of Osama Bin Laden's death leaked through Twitter, many people wondered if Twitter would fundamentally change the way we produce, spread, and consume news. In this paper we provide an in-depth analysis of how the news broke and spread on ...
Comments