Skip to main content

2022 | OriginalPaper | Buchkapitel

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

verfasst von : Liana Ermakova, Patrice Bellot, Jaap Kamps, Diana Nurbakova, Irina Ovchinnikova, Eric SanJuan, Elise Mathurin, Sílvia Araújo, Radia Hannachi, Stéphane Huet, Nicolas Poinsu

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Web and social media have become the main source of information for citizens, with the risk that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. Non-experts tend to avoid scientific literature due to its complex language or their lack of prior background knowledge. Text simplification promises to remove some of these barriers. The CLEF 2022 SimpleText track addresses the challenges of text simplification approaches in the context of promoting scientific information access, by providing appropriate data and benchmarks, and creating a community of NLP and IR researchers working together to resolve one of the greatest challenges of today. The track will use a corpus of scientific literature abstracts and popular science requests. It features three tasks. First, content selection (what is in, or out?) challenges systems to select passages to include in a simplified summary in response to a query. Second, complexity spotting (what is unclear?) given a passage and a query, aims to rank terms/concepts that are required to be explained for understanding this passage (definitions, context, applications). Third, text simplification (rewrite this!) given a query, asks to simplify passages from scientific abstracts while preserving the main content.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Alva-Manchego, F., Martin, L., Bordes, A., Scarton, C., Sagot, B., Specia, L.: Asset: a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. arXiv preprint arXiv:2005.00481 (2020) Alva-Manchego, F., Martin, L., Bordes, A., Scarton, C., Sagot, B., Specia, L.: Asset: a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. arXiv preprint arXiv:​2005.​00481 (2020)
4.
Zurück zum Zitat Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 496–501. Association for Computational Linguistics, Portland, Oregon, USA, June 2011. https://www.aclweb.org/anthology/P11-2087 Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 496–501. Association for Computational Linguistics, Portland, Oregon, USA, June 2011. https://​www.​aclweb.​org/​anthology/​P11-2087
8.
Zurück zum Zitat Ermakova, L., et al.: Overview of simpletext 2021 - CLEF workshop on text simplification for scientific information access. In: Candan, K.S., et al (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction, pp. 432–449. LNCS, Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_27 Ermakova, L., et al.: Overview of simpletext 2021 - CLEF workshop on text simplification for scientific information access. In: Candan, K.S., et al (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction, pp. 432–449. LNCS, Springer International Publishing, Cham (2021). https://​doi.​org/​10.​1007/​978-3-030-85251-1_​27
11.
Zurück zum Zitat Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication, MIT Press, Cambridge, MA (1998) Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication, MIT Press, Cambridge, MA (1998)
13.
Zurück zum Zitat François, T., Fairon, C.: Les apports du tal à la lisibilité du français langue étrangère. Trait. Autom. des Langues 54, 171–202 (2013) François, T., Fairon, C.: Les apports du tal à la lisibilité du français langue étrangère. Trait. Autom. des Langues 54, 171–202 (2013)
14.
Zurück zum Zitat Gala, N., François, T., Fairon, C.: Towards a french lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. In: eLex-Electronic Lexicography (2013) Gala, N., François, T., Fairon, C.: Towards a french lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. In: eLex-Electronic Lexicography (2013)
16.
Zurück zum Zitat Grabar, N., Farce, E., Sparrow, L.: Study of readability of health documents with eye-tracking approaches. In: 1st Workshop on Automatic Text Adaptation (ATA) (2018) Grabar, N., Farce, E., Sparrow, L.: Study of readability of health documents with eye-tracking approaches. In: 1st Workshop on Automatic Text Adaptation (ATA) (2018)
17.
Zurück zum Zitat Grabar, N., Hamon, T.: A large rated lexicon with French medical words. In: LREC (Language Resources and Evaluation Conference) 2016 (2016) Grabar, N., Hamon, T.: A large rated lexicon with French medical words. In: LREC (Language Resources and Evaluation Conference) 2016 (2016)
18.
Zurück zum Zitat Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF Model for Sentence Alignment in Text Simplification. arXiv:2005.02324 [cs] (June 2020) Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF Model for Sentence Alignment in Text Simplification. arXiv:​2005.​02324 [cs] (June 2020)
20.
Zurück zum Zitat Koptient, A., Grabar, N.: Rated lexicon for the simplification of medical texts. In: The Fifth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing HEALTHINFO 2020. Porto, Portugal, October 2020. https://hal.archives-ouvertes.fr/hal-03095275 Koptient, A., Grabar, N.: Rated lexicon for the simplification of medical texts. In: The Fifth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing HEALTHINFO 2020. Porto, Portugal, October 2020. https://​hal.​archives-ouvertes.​fr/​hal-03095275
23.
Zurück zum Zitat Lieber, O., Sharir, O., Lentz, B., Shoham, Y.: Jurassic-1: Technical Details and Evaluation, p. 9 (2021) Lieber, O., Sharir, O., Lentz, B., Shoham, Y.: Jurassic-1: Technical Details and Evaluation, p. 9 (2021)
25.
Zurück zum Zitat Maddela, M., Alva-Manchego, F., Xu, W.: Controllable Text Simplification with Explicit Paraphrasing. arXiv:2010.11004 [cs], April 2021 Maddela, M., Alva-Manchego, F., Xu, W.: Controllable Text Simplification with Explicit Paraphrasing. arXiv:​2010.​11004 [cs], April 2021
28.
Zurück zum Zitat Ovchinnikova, I., Nurbakova, D., Ermakova, L.: What science-related topics need to be popularized? a comparative study. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021. CEUR Workshop Proceedings, vol. 2936, pp. 2242–2255. CEUR-WS.org (2021). http://ceur-ws.org/Vol-2936/paper-203.pdf Ovchinnikova, I., Nurbakova, D., Ermakova, L.: What science-related topics need to be popularized? a comparative study. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021. CEUR Workshop Proceedings, vol. 2936, pp. 2242–2255. CEUR-WS.org (2021). http://​ceur-ws.​org/​Vol-2936/​paper-203.​pdf
29.
Zurück zum Zitat Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: vol. 2, Short Papers, pp. 34–40. Association for Computational Linguistics, Valencia, Spain, April 2017. https://www.aclweb.org/anthology/E17-2006 Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: vol. 2, Short Papers, pp. 34–40. Association for Computational Linguistics, Valencia, Spain, April 2017. https://​www.​aclweb.​org/​anthology/​E17-2006
30.
Zurück zum Zitat Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners, p. 24 (2019) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners, p. 24 (2019)
32.
Zurück zum Zitat Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: English lexical simplification. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 347–355. Association for Computational Linguistics, Montréal, Canada (2012). https://www.aclweb.org/anthology/S12-1046 Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: English lexical simplification. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 347–355. Association for Computational Linguistics, Montréal, Canada (2012). https://​www.​aclweb.​org/​anthology/​S12-1046
35.
Zurück zum Zitat Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415. MIT Press (2016) Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415. MIT Press (2016)
37.
Zurück zum Zitat Yaneva, V., Temnikova, I., Mitkov, R.: Accessible texts for autism: an eye-tracking study. In: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, pp. 49–57 (2015) Yaneva, V., Temnikova, I., Mitkov, R.: Accessible texts for autism: an eye-tracking study. In: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, pp. 49–57 (2015)
38.
Zurück zum Zitat Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 365–368. Association for Computational Linguistics, Los Angeles, California, June 2010. https://www.aclweb.org/anthology/N10-1056 Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 365–368. Association for Computational Linguistics, Los Angeles, California, June 2010. https://​www.​aclweb.​org/​anthology/​N10-1056
40.
Zurück zum Zitat Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, China, August 2010. https://www.aclweb.org/anthology/C10-1152 Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, China, August 2010. https://​www.​aclweb.​org/​anthology/​C10-1152
41.
Zurück zum Zitat Štajner, S., Nisioi, S.: A detailed evaluation of neural sequence-to-sequence models for in-domain and cross-domain text simplification. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018. https://www.aclweb.org/anthology/L18-1479 Štajner, S., Nisioi, S.: A detailed evaluation of neural sequence-to-sequence models for in-domain and cross-domain text simplification. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018. https://​www.​aclweb.​org/​anthology/​L18-1479
Metadaten
Titel
Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022
verfasst von
Liana Ermakova
Patrice Bellot
Jaap Kamps
Diana Nurbakova
Irina Ovchinnikova
Eric SanJuan
Elise Mathurin
Sílvia Araújo
Radia Hannachi
Stéphane Huet
Nicolas Poinsu
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_46