research-article

Extracting the discussion structure in comments on news-articles

Authors:
Anne Schuth

Universiteit van Amsterdam, Amsterdam, Netherlands

Universiteit van Amsterdam, Amsterdam, Netherlands
View Profile

,
Maarten Marx

Universiteit van Amsterdam, Amsterdam, Netherlands

Universiteit van Amsterdam, Amsterdam, Netherlands
View Profile

,
Maarten de Rijke

Universiteit van Amsterdam, Amsterdam, Netherlands

Universiteit van Amsterdam, Amsterdam, Netherlands
View Profile

WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data managementNovember 2007Pages 97–104https://doi.org/10.1145/1316902.1316919

Published:09 November 2007Publication History

WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management

Pages 97–104

ABSTRACT

Several on-line daily newspapers offer readers the opportunity to directly comment on articles. In the Netherlands this feature is used quite often and the quality (grammatically and content-wise) is surprisingly high. We develop techniques to collect, store, enrichand analyze these comments. After giving a high-level overview of the Dutch 'commentosphere' we zoom in on extracting the discussion structure found in flat comment threads; people not only comment on the news article, they also heavily comment on other comments, resembling discussion fora. We show how techniques from information retrieval, natural language processing and machine learning can be used to extract the 'reacts-on' relation between comments with high precision and recall.

References

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
K. Balog, G. Mishne, and M. de Rijke. Why are they excited? identifying and explaining spikes in blog mood levels. In Proceedings 11th Meeting of the European Chapter of the Association for Computational Linguistics (EACL 2006), April 2006. Google ScholarDigital Library
A. de Moor and L. Efimova. An argumentation analysis of weblog conversations. In The 9th International Working Conference on the Language-Action Perspective on Communication Modelling (LAP 2004), 2004.Google Scholar
X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In Proc. SIGMOD, pages 85--96, 2005. Google ScholarDigital Library
M. Gumbrecht. Blogs as protected space. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google Scholar
G. Mishne. Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam, 2007.Google Scholar
J. Quinlan. C4. 5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
J. Ratcliff and D. Metzener. Pattern matching: The Gestalt approach. Dr. Dobb's Journal, page 46, 1988.Google Scholar
A. Schuth. Applied text analytics for comments of news articles, 2007.Google Scholar
E. Tjong Kim Sang. Generating subtitles from linguistically annotated text. Atranos report WP4-12, University of Antwerp, 2003.Google Scholar
E. Trevino. Blogger motivations: Power, pull, and positive feedback. In Internet Research 6.0, 2005.Google Scholar
T. Witschge. (In)difference Online. PhD thesis, ASCoR, Universiteit van Amsterdam, 2007.Google Scholar
I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005. Google ScholarDigital Library

Index Terms

Extracting the discussion structure in comments on news-articles
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Diversifying user comments on news articles
WISE'12: Proceedings of the 13th international conference on Web Information Systems Engineering

In this paper we present an approach for diversifying user comments on news articles. In our proposed framework, we analyse user comments w.r.t. four different criteria in order to extract the respective diversification dimensions in the form of feature ...
Read More
Extracting Rhetorical Question from Twitter
iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Many types of content exist on SNSs. Sometimes authors' opinions are not properly communicated to the reader. The content might be inflammatory, known as flaming. We infer the importance of extracting passages in which the author's opinion is not ...
Read More
Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Many news websites from different regions in the world allow readers to write comments in their own languages about an event. Digesting such enormous amount of comments in different languages is difficult. One elegant way to digest and organize these ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management
November 2007
168 pages
ISBN:9781595938299
DOI:10.1145/1316902
Program Chairs:
Irini Fundulaki
University of Edinburgh, UK
,
Neoklis Polyzotis
University of California-Santa Cruz, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
web data extraction
web mining
Qualifiers
- research-article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 658
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Extracting the discussion structure in comments on news-articles

WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Diversifying user comments on news articles

Extracting Rhetorical Question from Twitter

Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity