Skip to main content
Top
Published in: Knowledge and Information Systems 5/2021

28-01-2021 | Regular Paper

PE-MSC: partial entailment-based minimum set cover for text summarization

Authors: Anand Gupta, Manpreet Kaur, Sonaali Mittal, Swati Garg

Published in: Knowledge and Information Systems | Issue 5/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The notion of Textual Entailment (TE) is an established indicator of text connectedness. It captures semantic relationships between texts. Recently, it has been used successfully for determining sentence salience in many text summarization methods. However, it has been reported in previous works that the standard textual entailment is not ideal for measuring sentence salience. This is because textual entailment relationships between sentences are quite rare in real-world texts. Therefore, we suggest using partial TE to accomplish the task of recognizing standard TE. We present the single document summarization problem as an optimization problem which is solved using a weighted Minimum Set Cover (wMSC) algorithm. In this method, sentences are broken into fragments and Partial TE is used to form sets of fragments. Finally, wMSC is applied to the sets to obtain the minimum set cover, which corresponds to the summary of the document. The results achieved on the DUC 2002 dataset using ROUGE and other quality metrics show that the proposed method outperforms the state of the art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Levy et al. [19] have introduced the term complete TE for standard TE. Thus, standard TE will be referred to as Complete TE from here onwards in rest of the paper.
 
2
As mentioned, that is a key difference between single- and multiple-document summarization.
 
5
ROUGE version (1.5.5) runs with the same parameters as mentioned on the DUC website (ROUGE-1.5.5.pl -n 2 -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -l 100 -d).
 
Literature
1.
go back to reference Baralis E, Cagliero L, Fiori A, Jabeen S (2011) Pattexsum: A pattern-based text summarizer. In: Proceedings of the workshop on mining complex patterns, p 14 Baralis E, Cagliero L, Fiori A, Jabeen S (2011) Pattexsum: A pattern-based text summarizer. In: Proceedings of the workshop on mining complex patterns, p 14
2.
go back to reference Barzilay R, Elhadad M (1999) Using lexical chains for text summarization. In: Mani I, Mark TM (eds) Advances in automatic text summarization. The MIT Press, London, pp 111–121 Barzilay R, Elhadad M (1999) Using lexical chains for text summarization. In: Mani I, Mark TM (eds) Advances in automatic text summarization. The MIT Press, London, pp 111–121
3.
go back to reference Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Networks ISDN Syst 29(8):1157–1166CrossRef Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Networks ISDN Syst 29(8):1157–1166CrossRef
5.
go back to reference Cornuéjols G (2001) Combinatorial optimization: packing and covering. SIAM, PhilapediaCrossRef Cornuéjols G (2001) Combinatorial optimization: packing and covering. SIAM, PhilapediaCrossRef
6.
go back to reference Dagan I, Dolan B, Magnini B, Roth D (2009) Recognizing textual entailment: rational, evaluation and approaches. Nat Lang Eng 15(4):1–17CrossRef Dagan I, Dolan B, Magnini B, Roth D (2009) Recognizing textual entailment: rational, evaluation and approaches. Nat Lang Eng 15(4):1–17CrossRef
7.
go back to reference Filatova E, Hatzivassiloglou V (2004) A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on computational linguistics, association for computational linguistics, COLING ’04 Filatova E, Hatzivassiloglou V (2004) A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on computational linguistics, association for computational linguistics, COLING ’04
8.
go back to reference Gupta A, Kathuria M, Singh A, Sachdeva A, Bhati S (2012) Analog textual entailment and spectral clustering (atesc) based summarization. In: International conference on big data analytics. Springer, pp 101–110 Gupta A, Kathuria M, Singh A, Sachdeva A, Bhati S (2012) Analog textual entailment and spectral clustering (atesc) based summarization. In: International conference on big data analytics. Springer, pp 101–110
9.
go back to reference Gupta A, Kaur M, Singh A, Goel A, Mirkin S (2014) Text summarization through entailment-based minimum vertex cover. Lexical and Computational Semantics (* SEM 2014), p 75 Gupta A, Kaur M, Singh A, Goel A, Mirkin S (2014) Text summarization through entailment-based minimum vertex cover. Lexical and Computational Semantics (* SEM 2014), p 75
10.
go back to reference He Z, Chen C, Bu J, Wang C, Zhang L, Cai D, He X (2012) Document summarization based on data reconstruction. In: AAAI He Z, Chen C, Bu J, Wang C, Zhang L, Cai D, He X (2012) Document summarization based on data reconstruction. In: AAAI
11.
go back to reference Hirao T, Sasaki Y, Isozaki H, Maeda E (2002) Ntt’s text summarization system for duc-2002. In: Proceedings of the document understanding conference 2002. Citeseer, pp 104–107 Hirao T, Sasaki Y, Isozaki H, Maeda E (2002) Ntt’s text summarization system for duc-2002. In: Proceedings of the document understanding conference 2002. Citeseer, pp 104–107
12.
go back to reference Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. EMNLP 13:1515–1520 Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. EMNLP 13:1515–1520
13.
14.
go back to reference Jones KS (2007) Automatic summarizing: the state of the art. Inf Process Manag 43:1449–1481CrossRef Jones KS (2007) Automatic summarizing: the state of the art. Inf Process Manag 43:1449–1481CrossRef
15.
go back to reference Jones KS, Galliers JR (1995) Evaluating natural language processing systems: an analysis and review, vol 1083. Springer, Berlin Jones KS, Galliers JR (1995) Evaluating natural language processing systems: an analysis and review, vol 1083. Springer, Berlin
16.
go back to reference Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: ACL (2), pp 315–320 Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: ACL (2), pp 315–320
17.
go back to reference Korte B, Vygen J, Korte B, Vygen J (2002) Combinatorial optimization. Springer, BerlinCrossRef Korte B, Vygen J, Korte B, Vygen J (2002) Combinatorial optimization. Springer, BerlinCrossRef
18.
go back to reference Lapata M, Barzilay R (2005) Automatic evaluation of text coherence: models and representations. IJCAI 5:1085–1090 Lapata M, Barzilay R (2005) Automatic evaluation of text coherence: models and representations. IJCAI 5:1085–1090
19.
go back to reference Levy O, Zesch T, Dagan I, Gurevych I (2013) Recognizing partial textual entailment. In: ACL (2), pp 451–455 Levy O, Zesch T, Dagan I, Gurevych I (2013) Recognizing partial textual entailment. In: ACL (2), pp 451–455
20.
go back to reference Lin CY (1995) Knowledge-based automatic topic identification. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 308–310 Lin CY (1995) Knowledge-based automatic topic identification. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 308–310
21.
go back to reference Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain, vol 8 Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain, vol 8
22.
go back to reference Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, Edmonta, Canada, pp 71–78 Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, Edmonta, Canada, pp 71–78
23.
go back to reference Mani I (2001) Summarization evaluation: an overview Mani I (2001) Summarization evaluation: an overview
24.
go back to reference Marcu D (1999) Discourse trees are good indicators of importance in text. Advances in automatic text summarization, pp 123–136 Marcu D (1999) Discourse trees are good indicators of importance in text. Advances in automatic text summarization, pp 123–136
25.
go back to reference Marcu D (2008) From discourse structure to text summaries. In: Proceedings of the ACL/EACL ’97, workshop on intelligent scalable text summarization, Madrid, Spain, pp 82–88 Marcu D (2008) From discourse structure to text summaries. In: Proceedings of the ACL/EACL ’97, workshop on intelligent scalable text summarization, Madrid, Spain, pp 82–88
26.
go back to reference Martins AF, Smith NA (2009) Summarization with a joint model for sentence extraction and compression. In: Proceedings of the workshop on integer linear programming for natural langauge processing. Association for Computational Linguistics, pp 1–9 Martins AF, Smith NA (2009) Summarization with a joint model for sentence extraction and compression. In: Proceedings of the workshop on integer linear programming for natural langauge processing. Association for Computational Linguistics, pp 1–9
27.
go back to reference McDonald R (2007) A study of global inference algorithms in multi-document summarization. Springer, BerlinCrossRef McDonald R (2007) A study of global inference algorithms in multi-document summarization. Springer, BerlinCrossRef
28.
go back to reference Mihalcea R, Tarau P (2004) Textrank: Bringing order into texts. Proc EMNLP Barcelona Spain 4(4):275 Mihalcea R, Tarau P (2004) Textrank: Bringing order into texts. Proc EMNLP Barcelona Spain 4(4):275
29.
go back to reference Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41CrossRef Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41CrossRef
30.
go back to reference Monz C, de Rijke M (2001) Light-weight entailment checking for computational semantics. In: Proceedings of the 3rd workshop on inference in computational semantics (ICoS-3) Monz C, de Rijke M (2001) Light-weight entailment checking for computational semantics. In: Proceedings of the 3rd workshop on inference in computational semantics (ICoS-3)
31.
go back to reference Nenkova A, McKeown K et al (2011) Automatic summarization. Found Trends Inf Retriev 5(2–3):103–233CrossRef Nenkova A, McKeown K et al (2011) Automatic summarization. Found Trends Inf Retriev 5(2–3):103–233CrossRef
32.
go back to reference Nielsen RD, Ward W, Martin JH (2009) Recognizing entailment in intelligent tutoring systems. Nat Lang Eng 15(04):479–501CrossRef Nielsen RD, Ward W, Martin JH (2009) Recognizing entailment in intelligent tutoring systems. Nat Lang Eng 15(04):479–501CrossRef
33.
go back to reference Nishino M, Yasuda N, Hirao T, Suzuki J, Nagata M (2013) Text summarization while maximizing multiple objectives with lagrangian relaxation. In: Advances in Information Retrieval. Springer, pp 772–775 Nishino M, Yasuda N, Hirao T, Suzuki J, Nagata M (2013) Text summarization while maximizing multiple objectives with lagrangian relaxation. In: Advances in Information Retrieval. Springer, pp 772–775
34.
go back to reference Ono K, Sumita K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 344–348 Ono K, Sumita K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 344–348
35.
go back to reference Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. In: EMNLP, pp 772–783 Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. In: EMNLP, pp 772–783
36.
go back to reference Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33:193–207CrossRef Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33:193–207CrossRef
37.
go back to reference Skorochod’ko EF (1971) Adaptive method of automatic abstracting and indexing. In: IFIP Congress (2), vol 71, pp 1179–1182 Skorochod’ko EF (1971) Adaptive method of automatic abstracting and indexing. In: IFIP Congress (2), vol 71, pp 1179–1182
38.
go back to reference Steinberger J, Poesio M, Kabadjov MA, Ježek K (2007) Two uses of anaphora resolution in summarization. Inf Process Manag 43(6):1663–1680CrossRef Steinberger J, Poesio M, Kabadjov MA, Ježek K (2007) Two uses of anaphora resolution in summarization. Inf Process Manag 43(6):1663–1680CrossRef
39.
go back to reference Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 781–789 Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 781–789
40.
go back to reference Tatar D, Tamaianu-Morita E, Mihis A, Lupsa D (2008) Summarization by logic segmentation and text entailment. Adv Nat Lang Process Appl 15:26 Tatar D, Tamaianu-Morita E, Mihis A, Lupsa D (2008) Summarization by logic segmentation and text entailment. Adv Nat Lang Process Appl 15:26
41.
go back to reference Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp 1137–1145 Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp 1137–1145
42.
go back to reference Wt Yih, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. IJCAI 7:1776–1782 Wt Yih, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. IJCAI 7:1776–1782
43.
go back to reference Young NE (2008) Greedy set-cover algorithms. In: Encyclopedia of algorithms. Springer, pp. 379–381 Young NE (2008) Greedy set-cover algorithms. In: Encyclopedia of algorithms. Springer, pp. 379–381
Metadata
Title
PE-MSC: partial entailment-based minimum set cover for text summarization
Authors
Anand Gupta
Manpreet Kaur
Sonaali Mittal
Swati Garg
Publication date
28-01-2021
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 5/2021
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01537-1

Other articles of this Issue 5/2021

Knowledge and Information Systems 5/2021 Go to the issue

Premium Partner