Skip to main content
Erschienen in: Discover Computing 1/2008

01.02.2008

Using only cross-document relationships for both generic and topic-focused multi-document summarizations

verfasst von: Xiaojun Wan

Erschienen in: Discover Computing | Ausgabe 1/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years graph-ranking based algorithms have been proposed for single document summarization and generic multi-document summarization. The algorithms make use of the “votings” or “recommendations” between sentences to evaluate the importance of the sentences in the documents. This study aims to differentiate the cross-document and within-document relationships between sentences for generic multi-document summarization and adapt the graph-ranking based algorithm for topic-focused summarization. The contributions of this study are two-fold: (1) For generic multi-document summarization, we apply the graph-based ranking algorithm based on each kind of sentence relationship and explore their relative importance for summarization performance. (2) For topic-focused multi-document summarization, we propose to integrate the relevance of the sentences to the specified topic into the graph-ranking based method. Each individual kind of sentence relationship is also differentiated and investigated in the algorithm. Experimental results on DUC 2002–DUC 2005 data demonstrate the great importance of the cross-document relationships between sentences for both generic and topic-focused multi-document summarizations. Even the approach based only on the cross-document relationships can perform better than or at least as well as the approaches based on both kinds of relationships between sentences.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
5
The damping factor d is set without tuning. Different values of d might have influence on the summarization performance, which is, however, not the focus of this paper and will be investigated in our future work.
 
6
At first, there were 60 document clusters, but the document cluster of D088 is withdrawn by NIST due to differences in documents used by systems and NIST summarizers.
 
7
We used ROUGEeval-1.4.2 downloaded from http://​www.​haydn.​isi.​edu/​ROUGE/​
 
8
This option is necessary for fair comparison because longer summary will usually increase ROUGE evaluation values.
 
9
Seen from Tables 36, we can see that the coverage baseline can always achieve much better performances than the lead baseline, which can be intuitively explained that the lead baseline produces the summary locally from only one document, while the coverage baseline produces the summary globally from a number of documents and thus coverage baseline more meets the need of multi-document summarization. This result validates the aim of multi-document summarization stated here.
 
Literatur
Zurück zum Zitat Allan, J., Carbonell, J., Doddington, G., Yamron, J. P., & Yang, Y. (1998). Topic detection and tracking pilot study: final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218). Allan, J., Carbonell, J., Doddington, G., Yamron, J. P., & Yang, Y. (1998). Topic detection and tracking pilot study: final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrival. ACM Press and Addison Wesley. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrival. ACM Press and Addison Wesley.
Zurück zum Zitat Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. In Proceedings of the 37th Association for Computational Linguistics on Computational Linguistics, Maryland (pp. 550–557). Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. In Proceedings of the 37th Association for Computational Linguistics on Computational Linguistics, Maryland (pp. 550–557).
Zurück zum Zitat Bollegala, D., Okazaki, N., & Ishizuka, M. (2006). A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 385–392). Bollegala, D., Okazaki, N., & Ishizuka, M. (2006). A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 385–392).
Zurück zum Zitat Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336). Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336).
Zurück zum Zitat Conroy, J. M., & Schlesinger, J. D. (2005). CLASSY query-based multi-document summarization. In Proceedings of 2005 Document Understanding Conference. Conroy, J. M., & Schlesinger, J. D. (2005). CLASSY query-based multi-document summarization. In Proceedings of 2005 Document Understanding Conference.
Zurück zum Zitat Daumé, H., & Marcu, D. (2006). Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 305–312). Daumé, H., & Marcu, D. (2006). Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 305–312).
Zurück zum Zitat Erkan, G., & Radev, D. (2004a). LexPageRank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 365–371). Erkan, G., & Radev, D. (2004a). LexPageRank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 365–371).
Zurück zum Zitat Erkan, G., & Radev, D. (2004b) LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479. Erkan, G., & Radev, D. (2004b) LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.
Zurück zum Zitat Farzindar, A., Rozon, F., & Lapalme, G. (2005). CATS a topic-oriented multi-document summarization system at DUC 2005. In Proceedings of the 2005 Document Understanding Conference. Farzindar, A., Rozon, F., & Lapalme, G. (2005). CATS a topic-oriented multi-document summarization system at DUC 2005. In Proceedings of the 2005 Document Understanding Conference.
Zurück zum Zitat Ge, J., Huang, X., & Wu, L. (2003). Approaches to event-focused summarization based on named entities and query words. In Proceedings of the 2003 Document Understanding Conference. Ge, J., Huang, X., & Wu, L. (2003). Approaches to event-focused summarization based on named entities and query words. In Proceedings of the 2003 Document Understanding Conference.
Zurück zum Zitat Harabagiu, S., & Lacatusu, F. (2005). Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil (pp. 202–209). Harabagiu, S., & Lacatusu, F. (2005). Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil (pp. 202–209).
Zurück zum Zitat Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G. B., & Zhang, X. (2002). Cross-document summarization by concept classification. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (pp. 121–128). Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G. B., & Zhang, X. (2002). Cross-document summarization by concept classification. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (pp. 121–128).
Zurück zum Zitat Haveliwala, T. H. (2002). Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference (pp. 517–526). Haveliwala, T. H. (2002). Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference (pp. 517–526).
Zurück zum Zitat Hovy, E., Lin, C.-Y., & Zhou, L. (2005). A BE-based multi-document summarizer with query interpretation. In Proceedings of the 2005 Document Understanding Conference. Hovy, E., Lin, C.-Y., & Zhou, L. (2005). A BE-based multi-document summarizer with query interpretation. In Proceedings of the 2005 Document Understanding Conference.
Zurück zum Zitat Ji, P. D., & Pulman, S. (2006). Sentence ordering with manifold-based classification in multi-document summarization. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 526–533). Ji, P. D., & Pulman, S. (2006). Sentence ordering with manifold-based classification in multi-document summarization. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 526–533).
Zurück zum Zitat Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91–107.MATHCrossRefMathSciNet Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91–107.MATHCrossRefMathSciNet
Zurück zum Zitat Li, W., Wu, M., Lu, Q., Xu, W., & Yuan, C. (2006). Extractive summarization using inter- and intra- event relevance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 369–376). Li, W., Wu, M., Lu, Q., Xu, W., & Yuan, C. (2006). Extractive summarization using inter- and intra- event relevance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 369–376).
Zurück zum Zitat Lin, C.-Y., & Hovy, E. H. (2002). From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 25–34). Lin, C.-Y., & Hovy, E. H. (2002). From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 25–34).
Zurück zum Zitat Lin, C.-Y., & Hovy, E. H. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (pp. 71–78). Lin, C.-Y., & Hovy, E. H. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (pp. 71–78).
Zurück zum Zitat Mani, I., & Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1–2), 35–67.CrossRef Mani, I., & Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1–2), 35–67.CrossRef
Zurück zum Zitat McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., & Eskin, E. (1999). Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, Orlando, Florida (pp. 453–460). McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., & Eskin, E. (1999). Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, Orlando, Florida (pp. 453–460).
Zurück zum Zitat Mihalcea, R., & Tarau, P. (2005). A language independent algorithm for single and multiple document summarization. In Proceedings of the Second International Joint Conference on Natural Language Processing (pp. 19–24). Mihalcea, R., & Tarau, P. (2005). A language independent algorithm for single and multiple document summarization. In Proceedings of the Second International Joint Conference on Natural Language Processing (pp. 19–24).
Zurück zum Zitat Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compositional context sensitive multi-document summarizer: Exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 573–580). Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compositional context sensitive multi-document summarizer: Exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 573–580).
Zurück zum Zitat Otterbacher, J., Erkan, G., & Radev, D. R. (2005). Using random walks for question-focused sentence retrieval. In Proceedings of 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP2005) (pp. 915–922). Otterbacher, J., Erkan, G., & Radev, D. R. (2005). Using random walks for question-focused sentence retrieval. In Proceedings of 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP2005) (pp. 915–922).
Zurück zum Zitat Page, L., Brin, S., Motwani, R., & Winograd, T. (1998) The PageRank citation ranking: Bringing order to the Web. Technical Report, Computer Science Department, Stanford University. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998) The PageRank citation ranking: Bringing order to the Web. Technical Report, Computer Science Department, Stanford University.
Zurück zum Zitat Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Zurück zum Zitat Radev, D. R., Jing, H. Y., Stys, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40, 919–938.MATHCrossRef Radev, D. R., Jing, H. Y., Stys, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40, 919–938.MATHCrossRef
Zurück zum Zitat Saggion, H., Bontcheva, K., & Cunningham, H. (2003). Robust generic and query-based summarization. In Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics (pp. 235–238). Saggion, H., Bontcheva, K., & Cunningham, H. (2003). Robust generic and query-based summarization. In Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics (pp. 235–238).
Zurück zum Zitat Salton, G., Singhal, A., Mitra, M., & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing and Management, 33(2), 193–207.CrossRef Salton, G., Singhal, A., Mitra, M., & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing and Management, 33(2), 193–207.CrossRef
Zurück zum Zitat Zhang, Z., Blair-Goldensohn, S., & Radev, D. R. (2002). Towards CST-enhanced summarization. In Proceedings of the 18th National Conference on Artificial Intelligence (pp. 439–445). Zhang, Z., Blair-Goldensohn, S., & Radev, D. R. (2002). Towards CST-enhanced summarization. In Proceedings of the 18th National Conference on Artificial Intelligence (pp. 439–445).
Zurück zum Zitat Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., & Ma, W.-Y. (2005). Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 504–511). Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., & Ma, W.-Y. (2005). Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 504–511).
Metadaten
Titel
Using only cross-document relationships for both generic and topic-focused multi-document summarizations
verfasst von
Xiaojun Wan
Publikationsdatum
01.02.2008
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 1/2008
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-007-9037-5