ABSTRACT
Several on-line daily newspapers offer readers the opportunity to directly comment on articles. In the Netherlands this feature is used quite often and the quality (grammatically and content-wise) is surprisingly high. We develop techniques to collect, store, enrichand analyze these comments. After giving a high-level overview of the Dutch 'commentosphere' we zoom in on extracting the discussion structure found in flat comment threads; people not only comment on the news article, they also heavily comment on other comments, resembling discussion fora. We show how techniques from information retrieval, natural language processing and machine learning can be used to extract the 'reacts-on' relation between comments with high precision and recall.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
- K. Balog, G. Mishne, and M. de Rijke. Why are they excited? identifying and explaining spikes in blog mood levels. In Proceedings 11th Meeting of the European Chapter of the Association for Computational Linguistics (EACL 2006), April 2006. Google ScholarDigital Library
- A. de Moor and L. Efimova. An argumentation analysis of weblog conversations. In The 9th International Working Conference on the Language-Action Perspective on Communication Modelling (LAP 2004), 2004.Google Scholar
- X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In Proc. SIGMOD, pages 85--96, 2005. Google ScholarDigital Library
- M. Gumbrecht. Blogs as protected space. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google Scholar
- G. Mishne. Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam, 2007.Google Scholar
- J. Quinlan. C4. 5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- J. Ratcliff and D. Metzener. Pattern matching: The Gestalt approach. Dr. Dobb's Journal, page 46, 1988.Google Scholar
- A. Schuth. Applied text analytics for comments of news articles, 2007.Google Scholar
- E. Tjong Kim Sang. Generating subtitles from linguistically annotated text. Atranos report WP4-12, University of Antwerp, 2003.Google Scholar
- E. Trevino. Blogger motivations: Power, pull, and positive feedback. In Internet Research 6.0, 2005.Google Scholar
- T. Witschge. (In)difference Online. PhD thesis, ASCoR, Universiteit van Amsterdam, 2007.Google Scholar
- I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005. Google ScholarDigital Library
Index Terms
- Extracting the discussion structure in comments on news-articles
Recommendations
Diversifying user comments on news articles
WISE'12: Proceedings of the 13th international conference on Web Information Systems EngineeringIn this paper we present an approach for diversifying user comments on news articles. In our proposed framework, we analyse user comments w.r.t. four different criteria in order to extract the respective diversification dimensions in the form of feature ...
Extracting Rhetorical Question from Twitter
iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & ServicesMany types of content exist on SNSs. Sometimes authors' opinions are not properly communicated to the reader. The content might be inflammatory, known as flaming. We infer the importance of extracting passages in which the author's opinion is not ...
Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementMany news websites from different regions in the world allow readers to write comments in their own languages about an event. Digesting such enormous amount of comments in different languages is difficult. One elegant way to digest and organize these ...
Comments