research-article

Temporal corpus summarization using submodular word coverage

Authors:
Ruben Sipos

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Adith Swaminathan

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Pannaga Shivaswamy

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Thorsten Joachims

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 754–763https://doi.org/10.1145/2396761.2396857

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 754–763

ABSTRACT

In many areas of life, we now have almost complete electronic archives reaching back for well over two decades. This includes, for example, the body of research papers in computer science, all news articles written in the US, and most people's personal email. However, we have only rather limited methods for analyzing and understanding these collections. While keyword-based retrieval systems allow efficient access to individual documents in archives, we still lack methods for understanding a corpus as a whole. In this paper, we explore methods that provide a temporal summary of such corpora in terms of landmark documents, authors, and topics. In particular, we explicitly model the temporal nature of influence between documents and re-interpret summarization as a coverage problem over words anchored in time. The resulting models provide monotone sub-modular objectives for computing informative and non-redundant summaries over time, which can be efficiently optimized with greedy algorithms. Our empirical study shows the effectiveness of our approach over several baselines.

References

J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics. In SIGIR, pages 10--18, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research,3:993--1022, Mar. 2003. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
C. C. Chen and M. C. Chen. Tscan: a novel method for topic summarization and content anatomy. In SIGIR, pages 579--586, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
P. Chen, H. Xie, S. Maslov, and S. Redner. Finding scientific gems with google's pagerank algorithm. Journal of Informetrics, 1(1):8--15, 2007.Google ScholarCross Ref
K. El-Arini and C. Guestrin. Beyond keyword search: discovering relevant scientific literature. In KDD, pages 439--447, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, pages 289--298, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
S. Khuller, A. Moss, and J. S. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1), 1999. Google ScholarDigital Library
J.-M. Lim, I.-S. Kang, J.-H. Bae, and J.-H. Lee. Sentence extraction using time features in multi-document summarization. In Information Retrieval Technology, volume 3411 of Lecture Notes in Computer Science, pages 82--93. Springer Berlin / Heidelberg, 2005. Google ScholarDigital Library
H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submodular functions. In HLT, pages 912--920, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Google ScholarDigital Library
R. McDonald. A study of global inference algorithms. In Lecture Notes in Computer Science, 2007. Google ScholarDigital Library
R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In CIKM, pages 446--453, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.Google ScholarDigital Library
A. Nenkova and K. McKeown. Automatic summarization. Foundations and Trends in Information Retrieval, 5(2-3):103--233, 2011.Google ScholarCross Ref
D. R. Radev, P. Muthukrishnan, and V. Qazvinian. The ACL anthology network corpus. In Proceedings, ACL Workshop on Natural Language Processing and Information Retrieval for Digital Libraries, Singapore, 2009. Google ScholarDigital Library
K. Raman, T. Joachims, and P. Shivaswamy. Structured learning of two-level dynamic rankings. In CIKM, 2011. Google ScholarDigital Library
G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Technical report, Cornell University, Ithaca, NY, USA, 1987. Google ScholarDigital Library
B. Shaparenko and T. Joachims. Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases. In KDD, pages 619--628, 2007. Google ScholarDigital Library
R. Sipos, P. Shivaswamy, and T. Joachims. Large-margin learning of submodular summarization methods. In EACL, 2012. Google ScholarDigital Library
I. SubašiĆ and B. Berendt. From bursty patterns to bursty facts: The effectiveness of temporal text mining for news. In ECAI, pages 517--522, Amsterdam, The Netherlands, The Netherlands, 2010. IOS Press. Google ScholarDigital Library
A. Swaminthan, C. Metthew, and D. Kirovski. Essential pages. In Technical Report, MSR-TR-2008-15, Microsoft Research, 2008.Google Scholar
R. Swan and J. Allan. Automatic generation of overview timelines. In SIGIR, pages 49--56, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
H. Takamura and M. Okumura. Text summarization model based on maximum coverage problem and its variant. In EACL, pages 781--789, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Google ScholarDigital Library
R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl. Enhancing digital libraries with techlens+. In Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, JCDL '04, pages 228--236, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
M. Wu, W. Li, Q. Lu, and K.-F. Wong. Event-based summarization using time features. In Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing '07, pages 563--574, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
E. Yan and Y. Ding. Weighted citation: An indicator of an article's prestige. Journal of the American Society for Information Science and Technology, 61(8):1635--1643, 2010. Google ScholarDigital Library
R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In SIGIR, pages 745--754, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Y. Yue and T. Joachims. Predicting diverse subsets using structural SVMs. In ICML, pages 271--278, 2008. Google ScholarDigital Library

Index Terms

Temporal corpus summarization using submodular word coverage
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Automatic Document Summarization using Sentiment Analysis
ICIA-16: Proceedings of the International Conference on Informatics and Analytics

With the advent of information revolution, electronic documents have become the powerhouse of business and academic information. Modern organizations handle terabytes of data in text format alone. In order to fully understand and utilize these documents,...
Read More
Enhanced web document summarization using hyperlinks
HYPERTEXT '03: Proceedings of the fourteenth ACM conference on Hypertext and hypermedia

This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on it, many Web pages and websites cannot be suitably summarized. We consider ...
Read More
A new sentence similarity measure and sentence based extractive technique for automatic text summarization

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
submodular
summarization
temporal
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 411
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Temporal corpus summarization using submodular word coverage

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic Document Summarization using Sentiment Analysis

Enhanced web document summarization using hyperlinks

A new sentence similarity measure and sentence based extractive technique for automatic text summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Temporal corpus summarization using submodular word coverage

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic Document Summarization using Sentiment Analysis

Enhanced web document summarization using hyperlinks

A new sentence similarity measure and sentence based extractive technique for automatic text summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media