skip to main content
10.1145/3014812.3014865acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-cswConference Proceedingsconference-collections
research-article

Effective pseudo-relevance for Microblog retrieval

Authors Info & Claims
Published:31 January 2017Publication History

ABSTRACT

Microblog services such as Twitter have become a part of daily life for many users, with thousands of documents published each second. Microblog documents are often too short, overwhelming in their use of informal language and hard to understand due to a lack of contextual clues. Retrieving relevant documents from microblogs is somewhat challenging because of its nature and the massive scale of the data. However, microblog retrieval models suffer from a vocabulary mismatch problem that leads to insufficient performance. In this paper, we address microblog retrieval limitations by proposing a pseudo-relevance feedback model. Our model considers discriminative expansion to meet user interests. Experimental results on TREC 2011 and 2012 microblog datasets show that our model demonstrates significant improvements over the baseline models.

References

  1. N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at trec 2004: Novelty and hard. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. C. Aggarwal and C. Zhai. Mining text data. Springer Science & Business Media, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  3. K. Albishre, M. Albathan, and Y. Li. Effective 20 newsgroups dataset cleaning. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), volume 3, pages 98--101. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1):1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Choi and W. B. Croft. Temporal models for microblogs. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2491--2494. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Efron. Hashtag retrieval in a microblogging environment. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 787--788. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Efron. Information search and retrieval in microblogs. Journal of the American Society for Information Science and Technology, 62(6):996--1008, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Efron and G. Golovchinsky. Estimation methods for ranking recent information. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 495--504. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. El-Ganainy, W. Magdy, and A. Rafea. Hyperlink-extended pseudo relevance feedback for improved microblog retrieval. In Proceedings of the first international workshop on Social media retrieval and analysis, pages 7--12. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Gao, Y. Xu, and Y. Li. Topical pattern based document modelling and relevance ranking. In International Conference on Web Information Systems Engineering, pages 186--201. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  12. K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Li, A. Algarni, M. Albathan, Y. Shen, and M. A. Bijaksana. Relevance feature discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 27(6):1656--1669, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Li, A. Algarni, and N. Zhong. Mining positive and negative patterns for relevance feature discovery. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 753--762. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Li and N. Zhong. Mining ontology for automatically acquiring web user information needs. IEEE transactions on Knowledge and Data Engineering, 18(4):554--568, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Li, X. Zhou, P. Bruza, Y. Xu, and R. Y. Lau. A two-stage decision model for information filtering. Decision Support Systems, 52(3):706--716, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Liang, R. Qiang, and J. Yang. Exploiting real-time information retrieval in the microblogosphere. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 267--276. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Lin, M. Efron, Y. Wang, and G. Sherman. Overview of the trec-2014 microblog track. Technical report, DTIC Document, 2014.Google ScholarGoogle Scholar
  20. Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1895--1898. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Miyanishi, K. Seki, and K. Uehara. Improving pseudo-relevance feedback via tweet selection. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 439--448. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference (TREC 2011), volume 32, 2011.Google ScholarGoogle Scholar
  24. L. Pipanmaekaporn and Y. Li. Discovering relevant features for effective query formulation. In Information Retrieval Facility Conference, pages 137--151. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. NIST SPECIAL PUBLICATION SP, 109:109, 1995.Google ScholarGoogle Scholar
  28. I. Soboroff, I. Ounis, C. Macdonald, and J. Lin. Overview of the trec-2012 microblog track. In TREC, volume 2012, page 20, 2012.Google ScholarGoogle Scholar
  29. J. Teevan, D. Ramage, and M. R. Morris. # twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Wang and M. Zhang. Feedback model for microblog retrieval. In Database Systems for Advanced Applications, pages 529--544. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  31. P. Willett. The porter stemming algorithm: then and now. Program, 40(3):219--223, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4--11. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhai and S. Massung. Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. Association for Computing Machinery and Morgan; Claypool, New York, NY, USA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Effective pseudo-relevance for Microblog retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference
          January 2017
          615 pages
          ISBN:9781450347686
          DOI:10.1145/3014812

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 January 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ACSW '17 Paper Acceptance Rate78of156submissions,50%Overall Acceptance Rate204of424submissions,48%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader