Skip to main content

2018 | OriginalPaper | Buchkapitel

Multi-document Summarization via LDA and Density Peaks Based Sentence-Level Clustering

verfasst von : Baoyan Wang, Yuexian Zou, Jian Zhang, Jun Jiang, Yi Liu

Erschienen in: Computational Intelligence and Intelligent Systems

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a novel unsupervised extractive multi-document summarization method by ranking sentences based on the integrated sentence scoring method. The cluster-based methods tend to ignore informativeness of words and Latent Dirichlet Allocation (LDA) based methods are inclined to extract the longish sentences and cannot remove redundancy directly. Those methods select sentences with higher score to generate summaries but not necessarily to the optimal summaries. Our method takes four key issues of sentences into account concurrently by applying LDA to calculate term weighting of words and evaluate the informativeness of sentences and then applying Density Peaks Clustering (DPC) to assess relevance and diversity of sentences simultaneously. Our method achieves the best property on the DUC2004 dataset, which outperforms the state-of-the-art methods, such as DUC2004 Best, R2N2_ILP [3], and WCS [13].

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cao, Z., Wei, F., Dong, L., Li, S., Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: AAAI, pp. 2153–2159 (2015) Cao, Z., Wei, F., Dong, L., Li, S., Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: AAAI, pp. 2153–2159 (2015)
2.
Zurück zum Zitat Li, L., Zhou, K., Xue, G.-R., Zha, H., Yu, Y.: Enhancing diversity, coverage and balance for summarization through structure learning. In: Proceedings of the 18th International Conference on World Wide Web, pp. 71–80 (2009) Li, L., Zhou, K., Xue, G.-R., Zha, H., Yu, Y.: Enhancing diversity, coverage and balance for summarization through structure learning. In: Proceedings of the 18th International Conference on World Wide Web, pp. 71–80 (2009)
3.
Zurück zum Zitat Ma, T., Wan, X.: Multi-document summarization using minimum distortion. In: 2010 IEEE International Conference on Data Mining. IEEE (2010) Ma, T., Wan, X.: Multi-document summarization using minimum distortion. In: 2010 IEEE International Conference on Data Mining. IEEE (2010)
4.
Zurück zum Zitat Liu, H., Yu, H., Deng, Z.-H.: Multi-document summarization based on two-level sparse representation model. In: AAAI, pp. 196–202 (2015) Liu, H., Yu, H., Deng, Z.-H.: Multi-document summarization based on two-level sparse representation model. In: AAAI, pp. 196–202 (2015)
5.
Zurück zum Zitat Mei, Q., Guo, J., Radev, D.: DivRank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1009–1018 (2010) Mei, Q., Guo, J., Radev, D.: DivRank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1009–1018 (2010)
6.
Zurück zum Zitat Wang, D., Li, T., Ding, C.: Weighted feature subset non-negative matrix factorization and its applications to document understanding. In: 2010 IEEE International Conference on Data Mining, pp. 541–550 (2010) Wang, D., Li, T., Ding, C.: Weighted feature subset non-negative matrix factorization and its applications to document understanding. In: 2010 IEEE International Conference on Data Mining, pp. 541–550 (2010)
7.
Zurück zum Zitat Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37, 420–430 (2012)CrossRef Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37, 420–430 (2012)CrossRef
8.
Zurück zum Zitat Arora, R., Ravindran, B.: Latent Dirichlet allocation and singular value decomposition based multi-document summarization. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 713–718 (2008) Arora, R., Ravindran, B.: Latent Dirichlet allocation and singular value decomposition based multi-document summarization. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 713–718 (2008)
9.
Zurück zum Zitat Wang, D., et al.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics (2009) Wang, D., et al.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics (2009)
10.
Zurück zum Zitat Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789 (2009) Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789 (2009)
11.
Zurück zum Zitat Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, vol. 4, pp. 40–48 (2000) Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, vol. 4, pp. 40–48 (2000)
12.
Zurück zum Zitat Cai, X., Li, W.: Ranking through clustering: an integrated approach to multi-document summarization. IEEE Trans. Audio Speech Lang. Process. 21, 1424–1433 (2013)CrossRef Cai, X., Li, W.: Ranking through clustering: an integrated approach to multi-document summarization. IEEE Trans. Audio Speech Lang. Process. 21, 1424–1433 (2013)CrossRef
13.
Zurück zum Zitat Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306. ACM (2008) Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306. ACM (2008)
14.
Zurück zum Zitat Zhang, Y., et al.: Clustering sentences with density peaks for multi-document summarization. In: Proceedings of Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL (2015) Zhang, Y., et al.: Clustering sentences with density peaks for multi-document summarization. In: Proceedings of Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL (2015)
15.
Zurück zum Zitat Wang, B., Zhang, J., Liu, Y., Zou, Y.: Density peaks clustering based integrate framework for multi-document summarization. CAAI Trans. Intell. Technol. 2(1), 26–30 (2017)CrossRef Wang, B., Zhang, J., Liu, Y., Zou, Y.: Density peaks clustering based integrate framework for multi-document summarization. CAAI Trans. Intell. Technol. 2(1), 26–30 (2017)CrossRef
16.
Zurück zum Zitat Lin, C.-Y: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop (2004) Lin, C.-Y: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop (2004)
17.
Zurück zum Zitat Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 306, 1910–1913 (2014) Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 306, 1910–1913 (2014)
Metadaten
Titel
Multi-document Summarization via LDA and Density Peaks Based Sentence-Level Clustering
verfasst von
Baoyan Wang
Yuexian Zou
Jian Zhang
Jun Jiang
Yi Liu
Copyright-Jahr
2018
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-1648-7_27