skip to main content
10.1145/2908131.2908168acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
short-paper

Finding diverse needles in a haystack of comments: social media exploration for news

Authors Info & Claims
Published:22 May 2016Publication History

ABSTRACT

Use of social media platforms to express opinion and discuss various topics has been increasingly popular. Consequently, huge volume of social media data is generated by users across all these platforms, e.g. users comment on a variety of content items such as news articles, videos, images on social media. These comments are often noisy and sparse, therefore, identifying sub-topics within them to explore social media is a challenge. In this paper, we develop an effective way to distill sub-topics from all the comments related to a textual query and apply two different diversification techniques to select comments. We conduct experiments to validate our idea using seven years of Reddit comments and news events from Wikipedia Current Events Portal as queries.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ICWSM, pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Al-Maskari, M. Sanderson, and P. Clough. The relationship between ir effectiveness measures and user satisfaction. In SIGIR, pages 773--774. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Choi, J. Han, T. Chung, Y.-Y. Ahn, B.-G. Chun, and T. T. Kwon. Characterizing conversation patterns in reddit: From the perspectives of content properties and user participation behaviors. In COSN, pages 233--243. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493--2537, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Dang and B. W. Croft. Term level search result diversification. In SIGIR, pages 603--612. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR, pages 65--74. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Gormley and Z. Tong. Elasticsearch: The Definitive Guide. "O'Reilly Media, Inc.", 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. J. Grosz, S. Weinstein, and A. K. Joshi. Centering: A framework for modeling the local coherence of discourse. Computational linguistics, 21(2):203--225, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In AAAI Conference on Weblogs and Social Media, 2014.Google ScholarGoogle Scholar
  13. A. Lijphart and B. Grofman. Electoral laws and their political consequences. Agathon Press, 1986.Google ScholarGoogle Scholar
  14. C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang. Entity-centric topic-oriented opinion summarization in twitter. In SIGKDD, pages 379--387. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine learning, 39(2-3):103--134, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Ren, J. Ma, S. Wang, and Y. Liu. Summarizing web forum threads based on a latent topic propagation process. In CIKM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Ren, M.-H. Peetz, S. Liang, W. van Dolen, and M. de Rijke. Hierarchical multi-label classification of social text streams. In SIGIR, pages 213--222. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Setty, S. Bedathur, K. Berberich, and G. Weikum. Inzeit: Efficiently identifying insightful time points. Proc. VLDB Endow., 3(1-2):1605--1608, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Singh, W. Nejdl, and A. Anand. History by diversity: Helping historians search news archives. In CHIIR, pages 183--192. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In SIGKDD, pages 306--315. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Yin and J. Wang. A dirichlet multinomial mixture model-based approach for short text clustering. In SIGKDD, pages 233--242. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Finding diverse needles in a haystack of comments: social media exploration for news

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WebSci '16: Proceedings of the 8th ACM Conference on Web Science
        May 2016
        392 pages
        ISBN:9781450342087
        DOI:10.1145/2908131

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 May 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        WebSci '16 Paper Acceptance Rate13of70submissions,19%Overall Acceptance Rate218of875submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader