ABSTRACT
There usually exist many news articles written in different languages about a hot news event. The news articles in different languages are written in different ways to reflect different standpoints. For example, the Chinese news agencies and the Western news agencies have published many articles to report the same news of "Liu Xiaobo's Nobel Prize" in Chinese and English languages, respectively. The Chinese news articles and the English news articles share something about the news fact in common, but they focus on different aspects in order to reflect different standpoints about the event. In this paper, we investigate the task of multilingual news summarization for the purpose of finding and summarizing the major differences between the news articles about the same event in the Chinese and English languages. We propose a novel constrained co-ranking (C-CoRank) method for addressing this special task. The C-CoRank method adds the constraints between the difference score and the common score of each sentence to the co-ranking process. Evaluation results on the manually labeled test set with 15 news topics show the effectiveness of our proposed method, and the constrained co-ranking method can outperform a few baselines and the typical co-ranking method.
- A. Aker, T. Cohn, and R. Gaizauskas. Multi-document summarization using A* search and discriminative training. In Proceedings of EMNLP2010. Google ScholarDigital Library
- M. R. Amini, P. Gallinari. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of SIGIR2002. Google ScholarDigital Library
- F. Boudin, M. El-Bèze, J.-M. Torres-Moreno. The LIA update summarization systems at TAC-2008. In Proceedings of TAC2008.Google Scholar
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR1998. Google ScholarDigital Library
- G. de Chalendar, R. Besançon, O. Ferret, G. Grefenstette, and O. Mesnard. Crosslingual summarization with thematic extraction, syntactic sentence simplification, and bilingual generation. In Workshop on Crossing Barriers in Text Summarization Research, 5th International Conference on Recent Advances in Natural Language Processing (RANLP2005).Google Scholar
- A. Celikyilmaz and D. Hakkani-Tur. A hybrid hierarchical model for multi-document summarization. In Proceedings of ACL2010. Google ScholarDigital Library
- H. T. Dang and K. Owczarzak. Overview of the TAC 2008 update summarization task. In Proceedings of TAC2008.Google Scholar
- P. Du, J. Guo, J. Zhang, X. Cheng. Manifold ranking with sink points for update summarization. In Proceedings of CIKM2010. Google ScholarDigital Library
- G. ErKan, D. R. Radev. LexPageRank. Prestige in Multi-Document Text Summarization. In Proceedings of EMNLP2004.Google Scholar
- S. Fisher and B. Roark. Query-focused supervised sentence ranking for update summaries. In Proceeding of TAC2008.Google Scholar
- S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In Proceedings of SIGIR2005. Google ScholarDigital Library
- H. D. Kim and C. Zhai. Generating comparative summaries of contradictory opinions in text. In Proceedings of CIKM2009. Google ScholarDigital Library
- J. Kupiec, J. Pedersen, F. Chen. A.Trainable Document Summarizer. In Proceedings of SIGIR1995. Google ScholarDigital Library
- A. Leuski, C.-Y. Lin, L. Zhou, U. Germann, F. J. Och, E. Hovy. Cross-lingual C*ST*RD: English access to Hindi information. ACM Transactions on Asian Language Information Processing, 2(3): 245--269, 2003. Google ScholarDigital Library
- W. Li, F. Wei Q. Lu and Y. He. PNR2: Ranking sentences with positive and negative reinforcement for query-oriented update summarization. In Proceedings of COLING2008. Google ScholarDigital Library
- C. Y. Lin, E. Hovy. The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of COLING2000. Google ScholarDigital Library
- C..-Y. Lin and E.. H. Hovy. From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL2002. Google ScholarDigital Library
- C.-Y. Lin and E.H. Hovy. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of HLT-NAACL -2003. Google ScholarDigital Library
- C.-Y. Lin, L. Zhou, and E. Hovy. Multilingual summarization evaluation 2005: automatic evaluation report. In Proceedings of MSE (ACL2005 Workshop).Google Scholar
- M. Litvak, M. Last, and M. Friedman. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of ACL2010. Google ScholarDigital Library
- H. P. Luhn. The Automatic Creation of literature Abstracts. IBM Journal of Research and Development, 2(2), 1969. Google ScholarDigital Library
- I. Mani and E. Bloedorn. Summarizing similarities and differences among related documents. Information Retrieval, 1: 35--67, 1999. Google ScholarDigital Library
- R. Mihalcea, P. Tarau. TextRank: Bringing Order into Texts. In Proceedings of EMNLP2004.Google Scholar
- R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP-2005.Google Scholar
- V. Nastase, K. Filippova, S. P. Ponzetto. Generating update summaries with spreading activation. In Proceedings of TAC2008.Google Scholar
- A. Nenkova and A. Louis. Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization. In Proceedings of ACL-2008:HLT.Google Scholar
- M. J. Paul, C. Zhai, and R. Girju. Summarizing contrastive viewpoints in opinionated text. In Proceedings of EMNLP2010. Google ScholarDigital Library
- D. R. Radev, H. Y. Jing, M. Stys and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40: 919--938, 2004. Google ScholarDigital Library
- A. Siddharthan and K. McKeown. Improving multilingual summarization: using redundancy in the input to correct MT errors. In Proceedings of HLT/EMNLP-2005. Google ScholarDigital Library
- X. Wan. Towards a unified approach to simultaneous single-document and multi-document summarizations. In Proceedings of COLING2010. Google ScholarDigital Library
- X. Wan, H. Li and J. Xiao. Cross-language document summarization based on machine translation quality prediction. In Proceedings of ACL2010. Google ScholarDigital Library
- X. Wan and J. Yang. Multi-document summarization using cluster-based link analysis. In Proceedings of SIGIR-2008. Google ScholarDigital Library
- X. Wan, J. Yang and J. Xiao. Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In Proceedings of ACL2007.Google Scholar
- X. Wan, J. Yang and J. Xiao. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of IJCAI-2007. Google ScholarDigital Library
- D. Wang, S. Zhu, T. Li, and Y. Gong. Comparative document summarization via discriminative sentence selection. In Proceedings of CIKM2009. Google ScholarDigital Library
- D. Wang, T. Li. Document update summarization using incremental hierarchical clustering. In Proceedings of CIKM2010. Google ScholarDigital Library
- K.-F. Wong, M. Wu and W. Li. Extractive summarization using supervised and semi-supervised learning. In Proceedings of COLING-2008. Google ScholarDigital Library
- H. Y. Zha. Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering. In Proceedings of SIGIR2002. Google ScholarDigital Library
- Y. Zhang, X. Ji, C.-H. Chu, and H. Zha. Correlating summarization of multi-source news with K-way graph bi-clustering. SIGKDD Explorations, 6(2), 2004. Google ScholarDigital Library
Index Terms
- Summarizing the differences in multilingual news
Recommendations
ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis
Sentence-based summarization aims at extracting concise summaries of collections of textual documents. Summaries consist of a worthwhile subset of document sentences. The most effective multilingual strategies rely on Latent Semantic Analysis (LSA) and ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Summarizing Opinions with Sentiment Analysis from Multiple Reviews on Travel Destinations
Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, ...
Comments