Skip to main content
Erschienen in: Soft Computing 2/2021

30.10.2020 | Methodologies and Application

Topic modeling combined with classification technique for extractive multi-document text summarization

Erschienen in: Soft Computing | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong platform for further improvements, they still have many limitations. Looking in this direction, the paper proposes a novel methodology for summarizing a corpus of documents to generate a coherent summary using topic modeling and classification technique. The objectives of the propose work are highlighted below:
  • A novel heuristic approach is introduced to find out the actual number of topics that exist in a corpus of documents which handles the stochastic nature of latent dirichlet allocation.
  • A large corpus of documents is handled by minimizing the huge set of sentences into a small set without losing the important one and thus providing a concise and information rich summary at the end.
  • Ensuring that the sentences are arranged as per their importance in the coherent summary.
  • Results of the experiment are compared with the state-of-the-art summary systems.
The outcomes of the empirical work show that the proposed model is more promising compared to the well-known text summarization models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdi A, Shamsuddin SM, Hasan S, Piran J (2018) Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment. Expert Syst Appl 109:66–85CrossRef Abdi A, Shamsuddin SM, Hasan S, Piran J (2018) Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment. Expert Syst Appl 109:66–85CrossRef
Zurück zum Zitat Abdi A, Shamsuddin SM, Hasan S, Piran J (2019) Automatic sentiment-oriented summarization of multi-documents using soft computing. Soft Comput 23(20):10 551–10 568CrossRef Abdi A, Shamsuddin SM, Hasan S, Piran J (2019) Automatic sentiment-oriented summarization of multi-documents using soft computing. Soft Comput 23(20):10 551–10 568CrossRef
Zurück zum Zitat Briët J, Harremoës P (2009) Properties of classical and quantum Jensen–Shannon divergence. Phys Rev A 79(5):1–11CrossRef Briët J, Harremoës P (2009) Properties of classical and quantum Jensen–Shannon divergence. Phys Rev A 79(5):1–11CrossRef
Zurück zum Zitat Cagliero L, Garza P, Baralis E (2019) ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis. ACM Trans Inf Syst (TOIS) 37(2):1–33CrossRef Cagliero L, Garza P, Baralis E (2019) ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis. ACM Trans Inf Syst (TOIS) 37(2):1–33CrossRef
Zurück zum Zitat Chatterjee N, Sahoo PK (2015) Random indexing and modified random indexing based approach for extractive text summarization. Comput Speech Lang 29(1):32–44CrossRef Chatterjee N, Sahoo PK (2015) Random indexing and modified random indexing based approach for extractive text summarization. Comput Speech Lang 29(1):32–44CrossRef
Zurück zum Zitat Chen H, Jin H, Zhao F (2014) PSG: a two-layer graph model for document summarization. Front Comput Sci Sel Publ Chin Univ 8(1):119–130MathSciNet Chen H, Jin H, Zhao F (2014) PSG: a two-layer graph model for document summarization. Front Comput Sci Sel Publ Chin Univ 8(1):119–130MathSciNet
Zurück zum Zitat Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 484–494 Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 484–494
Zurück zum Zitat Elbarougy R, Behery G, Khatib AE (2020) Graph-based extractive Arabic text summarization using multiple morphological analyzers. J Inf Sci Eng 36(2):347–363 Elbarougy R, Behery G, Khatib AE (2020) Graph-based extractive Arabic text summarization using multiple morphological analyzers. J Inf Sci Eng 36(2):347–363
Zurück zum Zitat Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195CrossRef Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195CrossRef
Zurück zum Zitat Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787CrossRef Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787CrossRef
Zurück zum Zitat Genç S, Akay D, Boran FE, Yager RR (2019) Linguistic summarization of fuzzy social and economic networks: an application on the international trade network. Soft Comput 24:1511–1527CrossRef Genç S, Akay D, Boran FE, Yager RR (2019) Linguistic summarization of fuzzy social and economic networks: an application on the international trade network. Soft Comput 24:1511–1527CrossRef
Zurück zum Zitat Glavaš G, Šnajder J (2014) Event graphs for information retrieval and multi-document summarization. Expert Syst Appl 41(15):6904–6916CrossRef Glavaš G, Šnajder J (2014) Event graphs for information retrieval and multi-document summarization. Expert Syst Appl 41(15):6904–6916CrossRef
Zurück zum Zitat Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2(3):258–268 Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2(3):258–268
Zurück zum Zitat Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag 53(2):436–449CrossRef Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag 53(2):436–449CrossRef
Zurück zum Zitat Jagarlamudi J, Pingali P, Varma V (2006) Query independent sentence scoring approach to DUC 2006. In: Proceeding of document understanding conference (DUC-2006) Jagarlamudi J, Pingali P, Varma V (2006) Query independent sentence scoring approach to DUC 2006. In: Proceeding of document understanding conference (DUC-2006)
Zurück zum Zitat Joshi A, Fidalgo E, Alegre E, Fernández-Robles L (2019) Summcoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215CrossRef Joshi A, Fidalgo E, Alegre E, Fernández-Robles L (2019) Summcoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215CrossRef
Zurück zum Zitat Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225CrossRef Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225CrossRef
Zurück zum Zitat Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, vol 8, pp 74–81 Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, vol 8, pp 74–81
Zurück zum Zitat Liu H, Jiang C, Hu C, Zhang L (2016) Efficient relation extraction method based on spatial feature using ELM. Neural Comput Appl 27(2):1–11 Liu H, Jiang C, Hu C, Zhang L (2016) Efficient relation extraction method based on spatial feature using ELM. Neural Comput Appl 27(2):1–11
Zurück zum Zitat Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv (CSUR) 51(3):1–34CrossRef Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv (CSUR) 51(3):1–34CrossRef
Zurück zum Zitat Lovinger J, Valova I, Clough C (2019) GIST: general integrated summarization of text and reviews. Soft Comput 23(5):1589–1601CrossRef Lovinger J, Valova I, Clough C (2019) GIST: general integrated summarization of text and reviews. Soft Comput 23(5):1589–1601CrossRef
Zurück zum Zitat Lynn HM, Choi C, Kim P (2018) An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms. Soft Comput 22(12):4013–4023CrossRef Lynn HM, Choi C, Kim P (2018) An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms. Soft Comput 22(12):4013–4023CrossRef
Zurück zum Zitat Mashechkin I, Petrovskiy M, Popov D, Tsarev DV (2011) Automatic text summarization using latent semantic analysis. Program Comput Softw 37(6):299–305MathSciNetCrossRef Mashechkin I, Petrovskiy M, Popov D, Tsarev DV (2011) Automatic text summarization using latent semantic analysis. Program Comput Softw 37(6):299–305MathSciNetCrossRef
Zurück zum Zitat Melli G (2006) Description of squash, the SFU question answering summary handler for the DUC-2006 summarization task. Safety 1:1–8 Melli G (2006) Description of squash, the SFU question answering summary handler for the DUC-2006 summarization task. Safety 1:1–8
Zurück zum Zitat Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411 Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Zurück zum Zitat Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41CrossRef Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41CrossRef
Zurück zum Zitat Nagwani N (2015) Summarizing large text collection using topic modeling and clustering based on mapreduce framework. J Big Data 2(1):1–18CrossRef Nagwani N (2015) Summarizing large text collection using topic modeling and clustering based on mapreduce framework. J Big Data 2(1):1–18CrossRef
Zurück zum Zitat Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237CrossRef Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237CrossRef
Zurück zum Zitat Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417MathSciNetCrossRef Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417MathSciNetCrossRef
Zurück zum Zitat Parveen D, Ramsl H-M, Strube M (2015) Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1949–1954 Parveen D, Ramsl H-M, Strube M (2015) Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1949–1954
Zurück zum Zitat Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRef Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRef
Zurück zum Zitat Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef
Zurück zum Zitat Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523CrossRef Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523CrossRef
Zurück zum Zitat Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowl Based Syst 159:1–8CrossRef Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowl Based Syst 159:1–8CrossRef
Zurück zum Zitat Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using wikipedia. Inf Process Manag 50(3):443–461CrossRef Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using wikipedia. Inf Process Manag 50(3):443–461CrossRef
Zurück zum Zitat Tohalino JV, Amancio DR (2018) Extractive multi-document summarization using multilayer networks. Physica A 503:526–539CrossRef Tohalino JV, Amancio DR (2018) Extractive multi-document summarization using multilayer networks. Physica A 503:526–539CrossRef
Zurück zum Zitat Valizadeh M, Brazdil P (2015) Exploring actor–object relationships for query-focused multi-document summarization. Soft Comput 19(11):3109–3121CrossRef Valizadeh M, Brazdil P (2015) Exploring actor–object relationships for query-focused multi-document summarization. Soft Comput 19(11):3109–3121CrossRef
Zurück zum Zitat Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics, pp 1137–1145 Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics, pp 1137–1145
Zurück zum Zitat Wang X, McCallum A, Wei X (2007) Topical \(n\)-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE international conference on data mining (ICDM 2007), IEEE, pp 697–702 Wang X, McCallum A, Wei X (2007) Topical \(n\)-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE international conference on data mining (ICDM 2007), IEEE, pp 697–702
Zurück zum Zitat Woodsend K, Lapata M (2010) Automatic generation of story highlights. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 565–574 Woodsend K, Lapata M (2010) Automatic generation of story highlights. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 565–574
Zurück zum Zitat Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23CrossRef Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23CrossRef
Zurück zum Zitat Yang G, Wen D, Chen N-S, Sutinen E et al (2015) A novel contextual topic model for multi-document summarization. Expert Syst Appl 42(3):1340–1352CrossRef Yang G, Wen D, Chen N-S, Sutinen E et al (2015) A novel contextual topic model for multi-document summarization. Expert Syst Appl 42(3):1340–1352CrossRef
Zurück zum Zitat Ye S, Chua T-S, Kan M-Y, Qiu L (2007) Document concept lattice for text understanding and summarization. Inf Process Manag 43(6):1643–1662CrossRef Ye S, Chua T-S, Kan M-Y, Qiu L (2007) Document concept lattice for text understanding and summarization. Inf Process Manag 43(6):1643–1662CrossRef
Zurück zum Zitat Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl 68:93–105CrossRef Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl 68:93–105CrossRef
Zurück zum Zitat Zamanian M, Heydari P (2012) Readability of texts: state of the art. Theory Pract Lang Stud 2(1):43–53CrossRef Zamanian M, Heydari P (2012) Readability of texts: state of the art. Theory Pract Lang Stud 2(1):43–53CrossRef
Zurück zum Zitat Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. CM SIGIR Forum 51(2):268–276CrossRef Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. CM SIGIR Forum 51(2):268–276CrossRef
Metadaten
Titel
Topic modeling combined with classification technique for extractive multi-document text summarization
Publikationsdatum
30.10.2020
Erschienen in
Soft Computing / Ausgabe 2/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-05207-w

Weitere Artikel der Ausgabe 2/2021

Soft Computing 2/2021 Zur Ausgabe

Premium Partner