Skip to main content

2019 | OriginalPaper | Buchkapitel

An Analytical Study on a Benchmark Corpus Constructed for Related Work Generation

verfasst von : Pancheng Wang, Shasha Li, Haifang Zhou, Jintao Tang, Ting Wang

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic related work generation aims at producing a related work section for a given scientific paper. Demand for this task replacing a labor-intensive process has substantially increased in recent years. Considering the lack of an open and large-scale dataset for related work generation, we introduce NudtRwG (https://​github.​com/​NudtRwG/​NudtRwG-Dataset/​), a collection of 2,084 document sets, each with a target paper, a ground truth related work, and the corresponding reference papers. To our knowledge, NudtRwG is the first open, large-scale and high-quality dataset for related work generation. The contribution of this work apart from the dataset is two-fold: firstly, we present a detailed description of the data collection procedure along with an analysis on the characteristics of the dataset; secondly, we conduct an analytical study, investigating the effects of summative sections (abstract, introduction and conclusion) and other sections of reference papers on related work generation. Experiments reveal that the two parts are equally important and other sections should not be ignored. When generating a related work section, researchers should consider not only summative sections, but also other sections of reference papers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen, J., Zhuge, H.: Automatic generation of related work through summarizing citations. Concurrency Comput. Pract. Exp. 31(3), e4261 (2016)CrossRef Chen, J., Zhuge, H.: Automatic generation of related work through summarizing citations. Concurrency Comput. Pract. Exp. 31(3), e4261 (2016)CrossRef
2.
Zurück zum Zitat Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. 19(2–3), 287–303 (2018)CrossRef Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. 19(2–3), 287–303 (2018)CrossRef
3.
Zurück zum Zitat Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)CrossRef Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)CrossRef
4.
Zurück zum Zitat Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 10–18. Association for Computational Linguistics (2009) Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 10–18. Association for Computational Linguistics (2009)
5.
Zurück zum Zitat Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009) Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009)
6.
Zurück zum Zitat Hoang, C.D.V., Kan, M.Y.: Towards automated related work summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 427–435. Association for Computational Linguistics (2010) Hoang, C.D.V., Kan, M.Y.: Towards automated related work summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 427–435. Association for Computational Linguistics (2010)
7.
Zurück zum Zitat Hong, K., Conroy, J.M., Favre, B., Kulesza, A., Lin, H., Nenkova, A.: A repository of state of the art and competitive baseline summaries for generic news summarization. In: LREC, pp. 1608–1616 (2014) Hong, K., Conroy, J.M., Favre, B., Kulesza, A., Lin, H., Nenkova, A.: A repository of state of the art and competitive baseline summaries for generic news summarization. In: LREC, pp. 1608–1616 (2014)
8.
Zurück zum Zitat Hu, Y., Wan, X.: Automatic generation of related work sections in scientific papers: an optimization approach. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1624–1633 (2014) Hu, Y., Wan, X.: Automatic generation of related work sections in scientific papers: an optimization approach. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1624–1633 (2014)
9.
Zurück zum Zitat Jaidka, K., et al.: The computational linguistics summarization pilot task (2014) Jaidka, K., et al.: The computational linguistics summarization pilot task (2014)
10.
Zurück zum Zitat Jha, R., Finegan-Dollak, C., King, B., Coke, R., Radev, D.: Content models for survey generation: a factoid-based evaluation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 441–450 (2015) Jha, R., Finegan-Dollak, C., King, B., Coke, R., Radev, D.: Content models for survey generation: a factoid-based evaluation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 441–450 (2015)
11.
Zurück zum Zitat Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004) Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)
12.
Zurück zum Zitat Mohammad, S., et al.: Using citations to generate surveys of scientific paradigms. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 584–592. Association for Computational Linguistics (2009) Mohammad, S., et al.: Using citations to generate surveys of scientific paradigms. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 584–592. Association for Computational Linguistics (2009)
13.
Zurück zum Zitat Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Technical report MSR-TR-2005 101 (2005) Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Technical report MSR-TR-2005 101 (2005)
14.
Zurück zum Zitat Peyrard, M., Eckle-Kohler, J.: A general optimization framework for multi-document summarization using genetic algorithms and swarm intelligence. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 247–257 (2016) Peyrard, M., Eckle-Kohler, J.: A general optimization framework for multi-document summarization using genetic algorithms and swarm intelligence. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 247–257 (2016)
15.
Zurück zum Zitat Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 689–696. Association for Computational Linguistics (2008) Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 689–696. Association for Computational Linguistics (2008)
16.
Zurück zum Zitat Qazvinian, V., Radev, D.R., Mohammad, S.M., Dorr, B., Zajic, D., Whidby, M., Moon, T.: Generating extractive summaries of scientific paradigms. J. Artif. Intell. Res. 46, 165–201 (2013)MathSciNetCrossRef Qazvinian, V., Radev, D.R., Mohammad, S.M., Dorr, B., Zajic, D., Whidby, M., Moon, T.: Generating extractive summaries of scientific paradigms. J. Artif. Intell. Res. 46, 165–201 (2013)MathSciNetCrossRef
17.
Zurück zum Zitat Wang, P., Li, S., Wang, T., Zhou, H., Tang, J.: Nudt@ clscisumm-18. In: BIRNDL@ SIGIR, pp. 102–113 (2018) Wang, P., Li, S., Wang, T., Zhou, H., Tang, J.: Nudt@ clscisumm-18. In: BIRNDL@ SIGIR, pp. 102–113 (2018)
18.
Zurück zum Zitat Wang, Y., Liu, X., Gao, Z.: Neural related work summarization with a joint context-driven attention mechanism. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1776–1786 (2018) Wang, Y., Liu, X., Gao, Z.: Neural related work summarization with a joint context-driven attention mechanism. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1776–1786 (2018)
19.
Zurück zum Zitat Widyantoro, D.H., Amin, I.: Citation sentence identification and classification for related work summarization. In: 2014 International Conference on Advanced Computer Science and Information System, pp. 291–296. IEEE (2014) Widyantoro, D.H., Amin, I.: Citation sentence identification and classification for related work summarization. In: 2014 International Conference on Advanced Computer Science and Information System, pp. 291–296. IEEE (2014)
20.
Zurück zum Zitat Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A.R., Li, I., Friedman, D., Radev, D.R.: ScisummNet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks (2019) Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A.R., Li, I., Friedman, D., Radev, D.R.: ScisummNet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks (2019)
Metadaten
Titel
An Analytical Study on a Benchmark Corpus Constructed for Related Work Generation
verfasst von
Pancheng Wang
Shasha Li
Haifang Zhou
Jintao Tang
Ting Wang
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-32233-5_33