Skip to main content
Erschienen in: International Journal on Digital Libraries 2-3/2018

13.04.2017

Automatic summarization of scientific publications using a feature selection approach

verfasst von: Hazem Al Saied, Nicolas Dugué, Jean-Charles Lamirel

Erschienen in: International Journal on Digital Libraries | Ausgabe 2-3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this approach in a single-document summarization context. We evaluate it on the DUC AQUAINT corpus and show that despite the unstructured nature of the corpus, our system is above the baseline and produces encouraging results. We also observe that the produced summaries seem robust to redundancy. Next, we evaluate our method in the more appropriate context of SciSumm challenge, which is dedicated to research publications summarization. These publications are structured in sections and our class-based approach is thus relevant. We more specifically focus on the task that aims to summarize papers using those that refer to them. We consider and evaluate several systems using our approach dealing with specific bag of words. Furthermore, in these systems, we also evaluate cosine and graph-based distance for sentence weighting and comparison. We show that our Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall. We thus demonstrate the flexibility and the relevance of Feature Maximization in this context.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The 2nd Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2016), http://​wing.​comp.​nus.​edu.​sg/​cl-scisumm2016/​
 
2
In this paper, we always consider only one reference summary, but there may be several ones created by distinct human annotators for example.
 
3
The choice of the weighting scheme is not really constrained by the approach instead of producing positive values. Such a scheme is supposed to figure out the significance (i.e., semantic and importance) of the feature for the data. Feature recall is a scale-independent measure but feature predominance is not. We have, however, shown experimentally that the F-measure which is a combination of these two measures is only weakly influenced by feature scaling. Nevertheless, to guarantee full scale-independent behavior for this measure, data may be standardized.
 
4
The Document Understanding Conference.
 
5
Query-focused summarization.
 
Literatur
1.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
2.
Zurück zum Zitat Cao, Z., Li, W., Wu, D.: Polyu at cl-scisumm 2016. In: BIRNDL@ JCDL, pp. 132–138 (2016) Cao, Z., Li, W., Wu, D.: Polyu at cl-scisumm 2016. In: BIRNDL@ JCDL, pp. 132–138 (2016)
3.
4.
5.
Zurück zum Zitat Collins, A.M., Loftus, E.F.: A spreading-activation theory of semantic processing. Psychol. Rev. 82(6), 407 (1975)CrossRef Collins, A.M., Loftus, E.F.: A spreading-activation theory of semantic processing. Psychol. Rev. 82(6), 407 (1975)CrossRef
6.
Zurück zum Zitat Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: SIGIR, pp. 406–407 (2001) Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: SIGIR, pp. 406–407 (2001)
7.
Zurück zum Zitat Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)CrossRef Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)CrossRef
8.
Zurück zum Zitat Das, D., Martins, A.F.T.: A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU 4, 192–195 (2007) Das, D., Martins, A.F.T.: A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU 4, 192–195 (2007)
9.
Zurück zum Zitat Dugué, N., Lamirel, J.-C., Cuxac, P.: Keep track of your clusters ! In: Research Challenges in Information Science (RCIS) (2016) Dugué, N., Lamirel, J.-C., Cuxac, P.: Keep track of your clusters ! In: Research Challenges in Information Science (RCIS) (2016)
10.
Zurück zum Zitat Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)CrossRef Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)CrossRef
11.
Zurück zum Zitat Baeza-Yates, R.: Introduction to data structures and algorithms related to information retrieval. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, Data Structures and Algorithms, pp. 13–27. Prentice-Hall (1992) Baeza-Yates, R.: Introduction to data structures and algorithms related to information retrieval. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, Data Structures and Algorithms, pp. 13–27. Prentice-Hall (1992)
12.
Zurück zum Zitat Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pp. 362–370 (2009) Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pp. 362–370 (2009)
13.
Zurück zum Zitat Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Overview of the cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 93–102 (2016) Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Overview of the cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 93–102 (2016)
14.
Zurück zum Zitat Klampfl, S., Rexha, A., Kern, R.: Identifying referenced text in scientific publications by summarisation and classification techniques. In: BIRNDL@ JCDL, pp. 122–131 (2016) Klampfl, S., Rexha, A., Kern, R.: Identifying referenced text in scientific publications by summarisation and classification techniques. In: BIRNDL@ JCDL, pp. 122–131 (2016)
15.
Zurück zum Zitat Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: ACM SIGIR, pp. 68–73 (1995) Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: ACM SIGIR, pp. 68–73 (1995)
16.
Zurück zum Zitat Lamirel, J.-C., Cuxac, P., Chivukula, A.S., Hajlaoui, K.: A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 367–378. Springer, Berlin (2013) Lamirel, J.-C., Cuxac, P., Chivukula, A.S., Hajlaoui, K.: A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 367–378. Springer, Berlin (2013)
17.
Zurück zum Zitat Lamirel, J.-C., Dugué, N., Cuxac, P.: New efficient clustering quality indexes. In: International Joint Conference on Neural Networks (2016) Lamirel, J.-C., Dugué, N., Cuxac, P.: New efficient clustering quality indexes. In: International Joint Conference on Neural Networks (2016)
18.
Zurück zum Zitat Lamirel, J.-C., Dugué, N., Cuxac, P.: Performing and visualizing temporal analysis of large text data issued for open sources: past and future methods. In: Beyond Databases, Architectures and Structures (2016) Lamirel, J.-C., Dugué, N., Cuxac, P.: Performing and visualizing temporal analysis of large text data issued for open sources: past and future methods. In: Beyond Databases, Architectures and Structures (2016)
19.
Zurück zum Zitat Lamirel, J.-C., Falk, I., Gardent, C.: Federating clustering and cluster labelling capabilities with a single approach based on feature maximization: French verb classes identification with igngf neural clustering. Neurocomputing 147, 136–146 (2015)CrossRef Lamirel, J.-C., Falk, I., Gardent, C.: Federating clustering and cluster labelling capabilities with a single approach based on feature maximization: French verb classes identification with igngf neural clustering. Neurocomputing 147, 136–146 (2015)CrossRef
20.
Zurück zum Zitat Lamirel, J.-C., Ta, A.P., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (2008) Lamirel, J.-C., Ta, A.P., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (2008)
21.
Zurück zum Zitat Li, L., Mao, L., Zhang, Y., Chi, J., Huang, T., Cong, X., Peng, H.: Cist system for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 156–167 (2016) Li, L., Mao, L., Zhang, Y., Chi, J., Huang, T., Cong, X., Peng, H.: Cist system for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 156–167 (2016)
22.
Zurück zum Zitat Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: the ACL-04 workshop, vol. 8 (2004) Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: the ACL-04 workshop, vol. 8 (2004)
23.
Zurück zum Zitat Lin, C.-Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: 18th Conference on Computational Linguistics, vol. 1, pp. 495–501 (2000) Lin, C.-Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: 18th Conference on Computational Linguistics, vol. 1, pp. 495–501 (2000)
24.
Zurück zum Zitat Lloret, E.: Text summarisation based on human language technologies and its applications. Ph.D. Thesis, Universidad de Alicante (2015) Lloret, E.: Text summarisation based on human language technologies and its applications. Ph.D. Thesis, Universidad de Alicante (2015)
25.
Zurück zum Zitat Lu, K., Mao, J., Li, G., Xu, J.: Recognizing reference spans and classifying their discourse facets. In: BIRNDL@ JCDL, pp. 139–145 (2016) Lu, K., Mao, J., Li, G., Xu, J.: Recognizing reference spans and classifying their discourse facets. In: BIRNDL@ JCDL, pp. 139–145 (2016)
26.
Zurück zum Zitat Malenfant, B., Lapalme, G.: Rali system description for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 146–155 (2016) Malenfant, B., Lapalme, G.: Rali system description for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 146–155 (2016)
27.
Zurück zum Zitat Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (2004) Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (2004)
28.
Zurück zum Zitat Moraes, L., Baki, S., Verma, R., Lee, D.: University of houston at cl-scisumm 2016: Svms with tree kernels and sentence similarity. In: BIRNDL@ JCDL, pp. 113–121 (2016) Moraes, L., Baki, S., Verma, R., Lee, D.: University of houston at cl-scisumm 2016: Svms with tree kernels and sentence similarity. In: BIRNDL@ JCDL, pp. 113–121 (2016)
29.
Zurück zum Zitat Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: 49th Annual Meeting of the ACL: Tutorial Abstracts, p. 3 (2011) Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: 49th Annual Meeting of the ACL: Tutorial Abstracts, p. 3 (2011)
30.
Zurück zum Zitat Nicolas, D., Lamirel, J.-C.: Une métrique de sélection de variables appliquée à la centralité et à la détection des roles communautaires. In: EGC (2017) Nicolas, D., Lamirel, J.-C.: Une métrique de sélection de variables appliquée à la centralité et à la détection des roles communautaires. In: EGC (2017)
31.
Zurück zum Zitat Nomoto, Ta.: Neal: a neurally enhanced approach to linking citation and reference. In: BIRNDL@ JCDL, pp. 168–174 (2016) Nomoto, Ta.: Neal: a neurally enhanced approach to linking citation and reference. In: BIRNDL@ JCDL, pp. 168–174 (2016)
32.
Zurück zum Zitat Saggion, H., AbuRaed, A., Ronzano, F.: Trainable citation-enhanced summarization of scientific articles. In: BIRNDL@ JCDL (2016) Saggion, H., AbuRaed, A., Ronzano, F.: Trainable citation-enhanced summarization of scientific articles. In: BIRNDL@ JCDL (2016)
33.
Zurück zum Zitat Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Rec. 36(2), 7–12 (2007)CrossRef Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Rec. 36(2), 7–12 (2007)CrossRef
34.
Zurück zum Zitat Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43(6), 1606–1618 (2007)CrossRef Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43(6), 1606–1618 (2007)CrossRef
Metadaten
Titel
Automatic summarization of scientific publications using a feature selection approach
verfasst von
Hazem Al Saied
Nicolas Dugué
Jean-Charles Lamirel
Publikationsdatum
13.04.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 2-3/2018
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-017-0214-x

Weitere Artikel der Ausgabe 2-3/2018

International Journal on Digital Libraries 2-3/2018 Zur Ausgabe