Skip to main content
Erschienen in: Earth Science Informatics 1/2022

22.09.2021 | Research Article

What is this article about? Generative summarization with the BERT model in the geosciences domain

verfasst von: Kai Ma, Miao Tian, Yongjian Tan, Xuejing Xie, Qinjun Qiu

Erschienen in: Earth Science Informatics | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, a large amount of data has been accumulated, such as those recorded in geological journals and report literature, which contain a wealth of information, but these data have not been fully exploited or mined. Automatic information extraction offers an effective way to achieve new discoveries and pursue further analysis, which is of great significance for users, researchers or decision makers to aid and support analysis. In this paper, we utilize the bidirectional encoder representations from transformers (BERT) model, which is fine-tuned and then applied to automatically generate the title of a given input summarization based on the collection of published literature samples. The framework contains an encoder module, decoder module and training module. The core stages of summary generation involve the combination of encoder and decoder modules, and the multi-stage function is then used to connect modules, thus endowing the text summarization model with a multi-task learning architecture. Compared to other baseline models, our proposed model obtains the best results on the constructed dataset. Therefore, based on the proposed model, an automatic geological briefing generation platform is developed and used as an online platform to support the excavation of key areas and a visual presentation analysis of the literature.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Al-Abdallah RZ, Al-Taani AT (2017) Arabic single-document text summarization using particle swarm optimization algorithm[J]. Procedia Comput Sci 117:30–37CrossRef Al-Abdallah RZ, Al-Taani AT (2017) Arabic single-document text summarization using particle swarm optimization algorithm[J]. Procedia Comput Sci 117:30–37CrossRef
Zurück zum Zitat Bhat IK , Mohd M , Hashmy R (2018) SumItUp: a hybrid single-document text summarizer[M] Bhat IK , Mohd M , Hashmy R (2018) SumItUp: a hybrid single-document text summarizer[M]
Zurück zum Zitat Carbonell J , Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. ACM:335–336 Carbonell J , Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. ACM:335–336
Zurück zum Zitat Ceylan H , Mihalcea R , O'Zertem U, et al. (2010) Quantifying the limits and success of extractive summarization systems across domains[C]// human language technologies: the conference of the north American chapter of the Association for Computational Linguistics. Association for Computational Linguistics Ceylan H , Mihalcea R , O'Zertem U, et al. (2010) Quantifying the limits and success of extractive summarization systems across domains[C]// human language technologies: the conference of the north American chapter of the Association for Computational Linguistics. Association for Computational Linguistics
Zurück zum Zitat Chen YC , Bansal M (2018) Fast abstractive summarization with reinforce-selected sentence rewriting[C]// proceedings of the 56th annual meeting of the Association for Computational Linguistics (volume 1: long papers) Chen YC , Bansal M (2018) Fast abstractive summarization with reinforce-selected sentence rewriting[C]// proceedings of the 56th annual meeting of the Association for Computational Linguistics (volume 1: long papers)
Zurück zum Zitat Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin, pp 484–494 Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin, pp 484–494
Zurück zum Zitat Devlin J, Chang M W, Lee K, et al (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[J]. arXiv preprint arXiv:1810.04805 Devlin J, Chang M W, Lee K, et al (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[J]. arXiv preprint arXiv:1810.04805
Zurück zum Zitat El-Kassas WS, Salama CR, Rafea AA et al (2020) Automatic text summarization: a comprehensive survey[J]. Expert Syst Appl 113679 El-Kassas WS, Salama CR, Rafea AA et al (2020) Automatic text summarization: a comprehensive survey[J]. Expert Syst Appl 113679
Zurück zum Zitat Grusky M, Naaman M, Artzi Y (2018) NEWSROOM: A dataset of 1.3 million summarieswith diverse extractive strategies. Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans Grusky M, Naaman M, Artzi Y (2018) NEWSROOM: A dataset of 1.3 million summarieswith diverse extractive strategies. Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans
Zurück zum Zitat Hou S, Lu R (2020) Knowledge-guided unsupervised rhetorical parsing for text summarization[J]. Inf Syst :101615 Hou S, Lu R (2020) Knowledge-guided unsupervised rhetorical parsing for text summarization[J]. Inf Syst :101615
Zurück zum Zitat Hou L, Hu P, Bei C (2017) Abstractive document summarization via neural model with joint attention. Paper presented at the natural language processing and Chinese computing, Dalian Hou L, Hu P, Bei C (2017) Abstractive document summarization via neural model with joint attention. Paper presented at the natural language processing and Chinese computing, Dalian
Zurück zum Zitat Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers):328–39 Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers):328–39
Zurück zum Zitat Hunter J, Freer Y, Gatt A et al (2012) Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-nurse[J]. Artif Intell Med 56(3):157–172CrossRef Hunter J, Freer Y, Gatt A et al (2012) Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-nurse[J]. Artif Intell Med 56(3):157–172CrossRef
Zurück zum Zitat Joshi M , Hui W, Mcclean S (2018) Dense semantic graph and its application in single document summarisation. Emerging Ideas on Information Filtering and Retrieval Joshi M , Hui W, Mcclean S (2018) Dense semantic graph and its application in single document summarisation. Emerging Ideas on Information Filtering and Retrieval
Zurück zum Zitat Jun Q (2019) Hybrid text summarization model based on reinforcement learning[J]. Information Technol Inform Technol 226(01):67–70 Jun Q (2019) Hybrid text summarization model based on reinforcement learning[J]. Information Technol Inform Technol 226(01):67–70
Zurück zum Zitat Kingma, D, Ba J (2014) Adam: a method for stochastic optimization. International Conference on Learning Representations Kingma, D, Ba J (2014) Adam: a method for stochastic optimization. International Conference on Learning Representations
Zurück zum Zitat Liang Z , Du J , Li C (2020) Abstractive social media text summarization using selective reinforced Seq2Seq attention model[J]. Neurocomputing, 410 Liang Z , Du J , Li C (2020) Abstractive social media text summarization using selective reinforced Seq2Seq attention model[J]. Neurocomputing, 410
Zurück zum Zitat Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. Proceedings of the ACL Workshop: Text Summarization Braches Out 10 Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. Proceedings of the ACL Workshop: Text Summarization Braches Out 10
Zurück zum Zitat Liu B (2012) Sentiment analysis and opinion mining[J]. Synthesis Lectures Human Language Technol 5(1):160–167 Liu B (2012) Sentiment analysis and opinion mining[J]. Synthesis Lectures Human Language Technol 5(1):160–167
Zurück zum Zitat Liu Y (2019) Fine-tune BERT for extractive summarization[J] Liu Y (2019) Fine-tune BERT for extractive summarization[J]
Zurück zum Zitat Lu R, Wang T, Zeng BQ, Liu X (2020) TSPT: a three-stage composite text summarization model based on pre-training. Appl Res Comput 37(10):2917–2921 Lu R, Wang T, Zeng BQ, Liu X (2020) TSPT: a three-stage composite text summarization model based on pre-training. Appl Res Comput 37(10):2917–2921
Zurück zum Zitat Ma S, Sun X, Lin J, Wang H (2018) Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization Ma S, Sun X, Lin J, Wang H (2018) Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization
Zurück zum Zitat Mao X, Yang H, Huang S et al (2019) Extractive Summarization Using Supervised and Unsupervised Learning[J]. Expert Syst Appl 133:173–181CrossRef Mao X, Yang H, Huang S et al (2019) Extractive Summarization Using Supervised and Unsupervised Learning[J]. Expert Syst Appl 133:173–181CrossRef
Zurück zum Zitat Mohan MJ, Sunitha C, Ganesh A, Jaya A (2016) A study on ontology based abstractive summarization. Procedia Comput Sci 87:32–37CrossRef Mohan MJ, Sunitha C, Ganesh A, Jaya A (2016) A study on ontology based abstractive summarization. Procedia Comput Sci 87:32–37CrossRef
Zurück zum Zitat Nallapati R, Xiang B , Zhou B (2016a) Sequence-to-sequence RNNs for text summarization[J] Nallapati R, Xiang B , Zhou B (2016a) Sequence-to-sequence RNNs for text summarization[J]
Zurück zum Zitat Nallapati R, Zhai F, Zhou B (2016b) SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents Nallapati R, Zhai F, Zhou B (2016b) SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Zurück zum Zitat Narayan S , Cohen SB, Lapata M (2018a) Don't give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization[J] Narayan S , Cohen SB, Lapata M (2018a) Don't give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization[J]
Zurück zum Zitat Narayan S , Cohen SB , Lapata M (2018b) Don't give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization[J] Narayan S , Cohen SB , Lapata M (2018b) Don't give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization[J]
Zurück zum Zitat Narayan S, Cardenas R, Topoulos NP, Cohen SB, Lapata M, Yu JS, Chang Y (2018c) Document modeling with external attention for sentence extraction. Proceedings of the 56th annual meeting of the Association for Computational Linguistics, MelbourneCrossRef Narayan S, Cardenas R, Topoulos NP, Cohen SB, Lapata M, Yu JS, Chang Y (2018c) Document modeling with external attention for sentence extraction. Proceedings of the 56th annual meeting of the Association for Computational Linguistics, MelbourneCrossRef
Zurück zum Zitat Nenkova A , Mckeown K (2012) A survey of text summarization techniques[J]. Springer US Nenkova A , Mckeown K (2012) A survey of text summarization techniques[J]. Springer US
Zurück zum Zitat Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. (2018) Deep contextualized word representations. Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers):2227–37 Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. (2018) Deep contextualized word representations. Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers):2227–37
Zurück zum Zitat Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization[J]. J Qiqihar Junior Teachers College 22:2004 Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization[J]. J Qiqihar Junior Teachers College 22:2004
Zurück zum Zitat Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Zurück zum Zitat Rush AM, Chopra S, Weston J (2015) A Neural Attention Model for Abstractive Sentence Summarization[J]. Computer Science. Sequence model for extractive summarization of documents. In Proceedings of the 31st AAAI conference on artificial intelligence, pages 3075–3081, San Francisco Rush AM, Chopra S, Weston J (2015) A Neural Attention Model for Abstractive Sentence Summarization[J]. Computer Science. Sequence model for extractive summarization of documents. In Proceedings of the 31st AAAI conference on artificial intelligence, pages 3075–3081, San Francisco
Zurück zum Zitat Sandhaus E (2008) The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6(12) Sandhaus E (2008) The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6(12)
Zurück zum Zitat Shuai W , Xiang Z , Bo L , et al. (2017) Integrating extractive and abstractive models for long text summarization[C]// 2017 IEEE international congress on big data (BigData congress). IEEE Shuai W , Xiang Z , Bo L , et al. (2017) Integrating extractive and abstractive models for long text summarization[C]// 2017 IEEE international congress on big data (BigData congress). IEEE
Zurück zum Zitat Siddiqui T , Shamsi J A (2018) Generating abstractive summaries using sequence to sequence attention model[C]// Frontiers of information technology. IEEE Comput Soc Siddiqui T , Shamsi J A (2018) Generating abstractive summaries using sequence to sequence attention model[C]// Frontiers of information technology. IEEE Comput Soc
Zurück zum Zitat Sutskever I , Vinyals O , Le Q V (2014) Sequence to sequence learning with neural networks[J]. NIPS Sutskever I , Vinyals O , Le Q V (2014) Sequence to sequence learning with neural networks[J]. NIPS
Zurück zum Zitat Tan J , Wan X , Xiao J (2017) Abstractive document summarization with a graph-based attentional neural model[C]// meeting of the Association for Computational Linguistics Tan J , Wan X , Xiao J (2017) Abstractive document summarization with a graph-based attentional neural model[C]// meeting of the Association for Computational Linguistics
Zurück zum Zitat Wang C, Hazen R, Cheng Q, Stephenson M, Zhou C, Fox P, Shen S, Oberhänsli R, Hou Z, Ma X, Feng Z, Fan J, Ma C, Hu X, Luo B, Wang J (2021) The deep-time digital earth program: data-driven discovery in geosciences. Natl Sci Rev. https://doi.org/10.1093/nsr/nwab027 Wang C, Hazen R, Cheng Q, Stephenson M, Zhou C, Fox P, Shen S, Oberhänsli R, Hou Z, Ma X, Feng Z, Fan J, Ma C, Hu X, Luo B, Wang J (2021) The deep-time digital earth program: data-driven discovery in geosciences. Natl Sci Rev. https://​doi.​org/​10.​1093/​nsr/​nwab027
Zurück zum Zitat Zhang H , Gong Y , Yan Y , et al. (2019) Pretraining-based natural language generation for text summarization[J] Zhang H , Gong Y , Yan Y , et al. (2019) Pretraining-based natural language generation for text summarization[J]
Zurück zum Zitat Zhou Q , Yang N , Wei F , et al. (2018) Neural document summarization by jointly learning to score and select sentences Zhou Q , Yang N , Wei F , et al. (2018) Neural document summarization by jointly learning to score and select sentences
Metadaten
Titel
What is this article about? Generative summarization with the BERT model in the geosciences domain
verfasst von
Kai Ma
Miao Tian
Yongjian Tan
Xuejing Xie
Qinjun Qiu
Publikationsdatum
22.09.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Earth Science Informatics / Ausgabe 1/2022
Print ISSN: 1865-0473
Elektronische ISSN: 1865-0481
DOI
https://doi.org/10.1007/s12145-021-00695-2

Weitere Artikel der Ausgabe 1/2022

Earth Science Informatics 1/2022 Zur Ausgabe

Premium Partner