Skip to main content
Top

2019 | OriginalPaper | Chapter

5. Improving Sentence Extraction Through Rank Aggregation

Authors : Parth Mehta, Prasenjit Majumder

Published in: From Extractive to Abstractive Summarization: A Journey

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A plethora of extractive summarisation techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what factors affect these systems. Such meaningful comparison if available can be used to create a robust ensemble of these approaches, which has the possibility to consistently outperform each individual summarisation system. In this chapter we examine the roles of three principle components of an extractive summarisation technique: sentence ranking algorithm, sentence similarity metric and text representation scheme. We show that using a combination of several different sentence similarity measures, rather than choosing any particular measure, significantly improves performance of the resultant meta-system. Even simple ensemble techniques, when used in an informed manner, prove to be very effective in improving the overall performance and consistency of summarisation systems. While aggregating multiple ranking algorithms or text similarity measures, though the improvement in ROUGE score is not always significant, the resultant meta-systems are more robust than candidate systems. The results suggest that, when proposing a sentence extraction technique, defining better sentence similarity metrics would be more impactful than a new ranking algorithm. Also using multiple sentence similarity scores and ranking algorithms in favour of a particular combination always results in an improved and robust performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
1.
go back to reference Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: A text feature based automatic keyword extraction method for single documents. In: European Conference on Information Retrieval, pp. 684–691. Springer (2018) Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: A text feature based automatic keyword extraction method for single documents. In: European Conference on Information Retrieval, pp. 684–691. Springer (2018)
2.
go back to reference Cohn, T.A., Lapata, M.: Sentence compression as tree transduction. J. Artif. Intell. Res. 34, 637–674 (2009)CrossRef Cohn, T.A., Lapata, M.: Sentence compression as tree transduction. J. Artif. Intell. Res. 34, 637–674 (2009)CrossRef
3.
go back to reference Dang, H.T.: Overview of duc 2005. Proc. Doc. Underst. Conf. 2005, 1–12 (2005) Dang, H.T.: Overview of duc 2005. Proc. Doc. Underst. Conf. 2005, 1–12 (2005)
4.
go back to reference Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995) Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995)
5.
go back to reference Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 457–479, (2004) Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 457–479, (2004)
6.
go back to reference Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009) Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009)
8.
go back to reference Hong, K., Marcus, M., Nenkova, A.: System combination for multi-document summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 107–117. Association for Computational Linguistics, Lisbon, Portugal (2015) Hong, K., Marcus, M., Nenkova, A.: System combination for multi-document summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 107–117. Association for Computational Linguistics, Lisbon, Portugal (2015)
9.
go back to reference Kulesza, A., Taskar, B., et al.: Determinantal point processes for machine learning. Found. Trends® Mach. Learn. 5(2–3), 123–286 (2012)CrossRef Kulesza, A., Taskar, B., et al.: Determinantal point processes for machine learning. Found. Trends® Mach. Learn. 5(2–3), 123–286 (2012)CrossRef
10.
go back to reference Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004) Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)
11.
go back to reference Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of the 18th conference on Computational linguistics, vol. 1, pp. 495–501. Association for Computational Linguistics (2000) Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of the 18th conference on Computational linguistics, vol. 1, pp. 495–501. Association for Computational Linguistics (2000)
12.
go back to reference Lin, H., Bilmes, J.: Learning mixtures of submodular shells with application to document summarization. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pp. 479–490. AUAI Press (2012) Lin, H., Bilmes, J.: Learning mixtures of submodular shells with application to document summarization. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pp. 479–490. AUAI Press (2012)
13.
go back to reference Mandal, A., Ghosh, K., Pal, A., Ghosh, S.: Automatic catchphrase identification from legal court case documents. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2187–2190. ACM (2017) Mandal, A., Ghosh, K., Pal, A., Ghosh, S.: Automatic catchphrase identification from legal court case documents. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2187–2190. ACM (2017)
14.
go back to reference Mehta, P., Majumder, P.: Effective aggregation of various summarization techniques. Inf. Process. Manag. 54(2), 145–158 (2018)CrossRef Mehta, P., Majumder, P.: Effective aggregation of various summarization techniques. Inf. Process. Manag. 54(2), 145–158 (2018)CrossRef
15.
go back to reference Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2004) Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2004)
16.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
17.
go back to reference Mogren, O., Kågebäck, M., Dubhashi, D.: Extractive summarization by aggregating multiple similarities. In: Proceedings of Recent Advances In Natural Language Processing, pp. 451–457 (2015) Mogren, O., Kågebäck, M., Dubhashi, D.: Extractive summarization by aggregating multiple similarities. In: Proceedings of Recent Advances In Natural Language Processing, pp. 451–457 (2015)
18.
go back to reference Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006) Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006)
19.
go back to reference Owczarzak, K., Conroy, J.M., Dang, H.T., Nenkova, A.: An assessment of the accuracy of automatic evaluation in summarization. In: Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, pp. 1–9. Association for Computational Linguistics (2012) Owczarzak, K., Conroy, J.M., Dang, H.T., Nenkova, A.: An assessment of the accuracy of automatic evaluation in summarization. In: Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, pp. 1–9. Association for Computational Linguistics (2012)
20.
go back to reference Owczarzak, K., Dang, H.T.: Overview of the tac 2011 summarization track: Guided task and aesop task. In: Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA (2011) Owczarzak, K., Dang, H.T.: Overview of the tac 2011 summarization track: Guided task and aesop task. In: Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA (2011)
21.
go back to reference Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford InfoLab (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford InfoLab (1999)
22.
go back to reference Pei, Y., Yin, W., Fan, Q., Huang, L.: A supervised aggregation framework for multi-document summarization. In: Proceedings of 24th International Conference on Computational Linguistics: Technical Papers, pp. 2225–2242 (2012) Pei, Y., Yin, W., Fan, Q., Huang, L.: A supervised aggregation framework for multi-document summarization. In: Proceedings of 24th International Conference on Computational Linguistics: Technical Papers, pp. 2225–2242 (2012)
23.
go back to reference Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)CrossRef Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)CrossRef
24.
go back to reference Steinberger, J.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of ISIM04, pp. 93–100 (2004) Steinberger, J.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of ISIM04, pp. 93–100 (2004)
25.
go back to reference Voorhees, E.M.: The trec robust retrieval track. ACM SIGIR Forum 39(1), 11–20 (2005)CrossRef Voorhees, E.M.: The trec robust retrieval track. ACM SIGIR Forum 39(1), 11–20 (2005)CrossRef
26.
go back to reference Wang, D., Li, T.: Weighted consensus multi-document summarization. Inf. Process. Manag. 48(3), 513–523 (2012)CrossRef Wang, D., Li, T.: Weighted consensus multi-document summarization. Inf. Process. Manag. 48(3), 513–523 (2012)CrossRef
Metadata
Title
Improving Sentence Extraction Through Rank Aggregation
Authors
Parth Mehta
Prasenjit Majumder
Copyright Year
2019
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-8934-4_5