ABSTRACT
Microblog services such as Twitter have become a part of daily life for many users, with thousands of documents published each second. Microblog documents are often too short, overwhelming in their use of informal language and hard to understand due to a lack of contextual clues. Retrieving relevant documents from microblogs is somewhat challenging because of its nature and the massive scale of the data. However, microblog retrieval models suffer from a vocabulary mismatch problem that leads to insufficient performance. In this paper, we address microblog retrieval limitations by proposing a pseudo-relevance feedback model. Our model considers discriminative expansion to meet user interests. Experimental results on TREC 2011 and 2012 microblog datasets show that our model demonstrates significant improvements over the baseline models.
- N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at trec 2004: Novelty and hard. 2004.Google ScholarCross Ref
- C. C. Aggarwal and C. Zhai. Mining text data. Springer Science & Business Media, 2012. Google ScholarCross Ref
- K. Albishre, M. Albathan, and Y. Li. Effective 20 newsgroups dataset cleaning. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), volume 3, pages 98--101. IEEE, 2015.Google ScholarCross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
- C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1):1, 2012. Google ScholarDigital Library
- J. Choi and W. B. Croft. Temporal models for microblogs. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2491--2494. ACM, 2012. Google ScholarDigital Library
- M. Efron. Hashtag retrieval in a microblogging environment. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 787--788. ACM, 2010. Google ScholarDigital Library
- M. Efron. Information search and retrieval in microblogs. Journal of the American Society for Information Science and Technology, 62(6):996--1008, 2011. Google ScholarDigital Library
- M. Efron and G. Golovchinsky. Estimation methods for ranking recent information. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 495--504. ACM, 2011. Google ScholarDigital Library
- T. El-Ganainy, W. Magdy, and A. Rafea. Hyperlink-extended pseudo relevance feedback for improved microblog retrieval. In Proceedings of the first international workshop on Social media retrieval and analysis, pages 7--12. ACM, 2014. Google ScholarDigital Library
- Y. Gao, Y. Xu, and Y. Li. Topical pattern based document modelling and relevance ranking. In International Conference on Web Information Systems Engineering, pages 186--201. Springer, 2014.Google ScholarCross Ref
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127. ACM, 2001. Google ScholarDigital Library
- Y. Li, A. Algarni, M. Albathan, Y. Shen, and M. A. Bijaksana. Relevance feature discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 27(6):1656--1669, 2015.Google ScholarDigital Library
- Y. Li, A. Algarni, and N. Zhong. Mining positive and negative patterns for relevance feature discovery. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 753--762. ACM, 2010. Google ScholarDigital Library
- Y. Li and N. Zhong. Mining ontology for automatically acquiring web user information needs. IEEE transactions on Knowledge and Data Engineering, 18(4):554--568, 2006. Google ScholarDigital Library
- Y. Li, X. Zhou, P. Bruza, Y. Xu, and R. Y. Lau. A two-stage decision model for information filtering. Decision Support Systems, 52(3):706--716, 2012. Google ScholarDigital Library
- F. Liang, R. Qiang, and J. Yang. Exploiting real-time information retrieval in the microblogosphere. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 267--276. ACM, 2012. Google ScholarDigital Library
- J. Lin, M. Efron, Y. Wang, and G. Sherman. Overview of the trec-2014 microblog track. Technical report, DTIC Document, 2014.Google Scholar
- Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1895--1898. ACM, 2009. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarCross Ref
- T. Miyanishi, K. Seki, and K. Uehara. Improving pseudo-relevance feedback via tweet selection. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 439--448. ACM, 2013. Google ScholarDigital Library
- I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference (TREC 2011), volume 32, 2011.Google Scholar
- L. Pipanmaekaporn and Y. Li. Discovering relevant features for effective query formulation. In Information Retrieval Facility Conference, pages 137--151. Springer, 2012. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281. ACM, 1998. Google ScholarDigital Library
- M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarCross Ref
- S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. NIST SPECIAL PUBLICATION SP, 109:109, 1995.Google Scholar
- I. Soboroff, I. Ounis, C. Macdonald, and J. Lin. Overview of the trec-2012 microblog track. In TREC, volume 2012, page 20, 2012.Google Scholar
- J. Teevan, D. Ramage, and M. R. Morris. # twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011. Google ScholarDigital Library
- Z. Wang and M. Zhang. Feedback model for microblog retrieval. In Database Systems for Advanced Applications, pages 529--544. Springer, 2015.Google ScholarCross Ref
- P. Willett. The porter stemming algorithm: then and now. Program, 40(3):219--223, 2006.Google ScholarCross Ref
- J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4--11. ACM, 1996. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179--214, 2004. Google ScholarDigital Library
- C. Zhai and S. Massung. Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. Association for Computing Machinery and Morgan; Claypool, New York, NY, USA, 2016. Google ScholarDigital Library
Index Terms
- Effective pseudo-relevance for Microblog retrieval
Recommendations
Hybrid pseudo-relevance feedback for microblog retrieval
The microblog has become a new global hot spot. Information retrieval IR technologies are necessary for accessing the massive amounts of valuable user-generated contents in the microblog sphere. The challenge in searching relevant microblogs is that ...
Hyperlink-extended pseudo relevance feedback for improved microblog retrieval
SoMeRA '14: Proceedings of the first international workshop on Social media retrieval and analysisMicroblog retrieval has received much attention in recent years due to the wide spread of social microblogging platforms such as Twitter. Many research studies investigated different approaches for microblog retrieval. Query expansion is one of the ...
Improving pseudo-relevance feedback via tweet selection
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementQuery expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to ...
Comments