ABSTRACT
Use of social media platforms to express opinion and discuss various topics has been increasingly popular. Consequently, huge volume of social media data is generated by users across all these platforms, e.g. users comment on a variety of content items such as news articles, videos, images on social media. These comments are often noisy and sparse, therefore, identifying sub-topics within them to explore social media is a challenge. In this paper, we develop an effective way to distill sub-topics from all the comments related to a textual query and apply two different diversification techniques to select comments. We conduct experiments to validate our idea using seven years of Reddit comments and news events from Wikipedia Current Events Portal as queries.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ICWSM, pages 5--14, 2009. Google ScholarDigital Library
- A. Al-Maskari, M. Sanderson, and P. Clough. The relationship between ir effectiveness measures and user satisfaction. In SIGIR, pages 773--774. ACM, 2007. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336. ACM, 1998. Google ScholarDigital Library
- D. Choi, J. Han, T. Chung, Y.-Y. Ahn, B.-G. Chun, and T. T. Kwon. Characterizing conversation patterns in reddit: From the perspectives of content properties and user participation behaviors. In COSN, pages 233--243. ACM, 2015. Google ScholarDigital Library
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666. ACM, 2008. Google ScholarDigital Library
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493--2537, 2011. Google ScholarDigital Library
- V. Dang and B. W. Croft. Term level search result diversification. In SIGIR, pages 603--612. ACM, 2013. Google ScholarDigital Library
- V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR, pages 65--74. ACM, 2012. Google ScholarDigital Library
- C. Gormley and Z. Tong. Elasticsearch: The Definitive Guide. "O'Reilly Media, Inc.", 2015. Google ScholarDigital Library
- B. J. Grosz, S. Weinstein, and A. K. Joshi. Centering: A framework for modeling the local coherence of discourse. Computational linguistics, 21(2):203--225, 1995. Google ScholarDigital Library
- C. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In AAAI Conference on Weblogs and Social Media, 2014.Google Scholar
- A. Lijphart and B. Grofman. Electoral laws and their political consequences. Agathon Press, 1986.Google Scholar
- C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarDigital Library
- X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang. Entity-centric topic-oriented opinion summarization in twitter. In SIGKDD, pages 379--387. ACM, 2012. Google ScholarDigital Library
- K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine learning, 39(2-3):103--134, 2000. Google ScholarDigital Library
- Z. Ren, J. Ma, S. Wang, and Y. Liu. Summarizing web forum threads based on a latent topic propagation process. In CIKM, 2011. Google ScholarDigital Library
- Z. Ren, M.-H. Peetz, S. Liang, W. van Dolen, and M. de Rijke. Hierarchical multi-label classification of social text streams. In SIGIR, pages 213--222. ACM, 2014. Google ScholarDigital Library
- V. Setty, S. Bedathur, K. Berberich, and G. Weikum. Inzeit: Efficiently identifying insightful time points. Proc. VLDB Endow., 3(1-2):1605--1608, 2010. Google ScholarDigital Library
- J. Singh, W. Nejdl, and A. Anand. History by diversity: Helping historians search news archives. In CHIIR, pages 183--192. ACM, 2016. Google ScholarDigital Library
- M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In SIGKDD, pages 306--315. ACM, 2004. Google ScholarDigital Library
- J. Yin and J. Wang. A dirichlet multinomial mixture model-based approach for short text clustering. In SIGKDD, pages 233--242. ACM, 2014. Google ScholarDigital Library
- C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17. ACM, 2003. Google ScholarDigital Library
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011. Google ScholarDigital Library
Index Terms
- Finding diverse needles in a haystack of comments: social media exploration for news
Recommendations
Personalized Prediction of Offensive News Comments by Considering the Characteristics of Commenters
SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingWhen reading news articles on social networking services and news sites, readers can view comments marked by other people on these articles. By reading these comments, a reader can understand the public opinion about the news, and it is often helpful to ...
Going beyond Corr-LDA for detecting specific comments on news & blogs
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningUnderstanding user generated comments in response to news and blog posts is an important area of research. After ignoring irrelevant comments, one finds that a large fraction, approximately 50%, of the comments are very specific and can be further ...
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Comments