Skip to main content
Top
Published in: International Journal on Digital Libraries 2-3/2018

13-06-2017

Computational linguistics literature and citations oriented citation linkage, classification and summarization

Authors: Lei Li, Liyuan Mao, Yazhao Zhang, Junqi Chi, Taiwen Huang, Xiaoyue Cong, Heng Peng

Published in: International Journal on Digital Libraries | Issue 2-3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: IEEE/Wic/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006) Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: IEEE/Wic/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)
2.
go back to reference Garca, J., Laurent, F., Gillard, O.F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop—Notebook Papers and Results (2008) Garca, J., Laurent, F., Gillard, O.F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop—Notebook Papers and Results (2008)
3.
go back to reference Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings (2008) Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings (2008)
4.
go back to reference Conroy, J., Schlesinger, J.D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings (2008) Conroy, J., Schlesinger, J.D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings (2008)
5.
go back to reference Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings (2009) Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings (2009)
6.
go back to reference Annie, L., Ani, N.: Predicting summary quality using limited human input. In: TAC 2009 Proceedings (2009) Annie, L., Ani, N.: Predicting summary quality using limited human input. In: TAC 2009 Proceedings (2009)
7.
go back to reference Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST, vol. 150 (2010) Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST, vol. 150 (2010)
8.
go back to reference Kokil, J., Muthu, K.C., Sajal, R., MinYen, K.: Overview of the 2nd computational linguistics scientific document summarization shared task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA (2016) Kokil, J., Muthu, K.C., Sajal, R., MinYen, K.: Overview of the 2nd computational linguistics scientific document summarization shared task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA (2016)
9.
go back to reference Genest, P., Lapalme, G., Qubec, M.: Text generation for abstractive summarization. In: TAC 2010 Proceedings (2010) Genest, P., Lapalme, G., Qubec, M.: Text generation for abstractive summarization. In: TAC 2010 Proceedings (2010)
10.
go back to reference Jin, F., Huang, M., Zhu, X.: The THU summarization systems at TAC 2010. In: Text Analysis Conference (2010) Jin, F., Huang, M., Zhu, X.: The THU summarization systems at TAC 2010. In: Text Analysis Conference (2010)
11.
go back to reference Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided summarization with aspect recognition. In: TAC 2011 Proceedings (2011) Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided summarization with aspect recognition. In: TAC 2011 Proceedings (2011)
12.
go back to reference Marina, L., Natalia, V.: Multilingual multi-document summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013) Marina, L., Natalia, V.: Multilingual multi-document summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
13.
go back to reference Steinberger, J.: The UWB summariser at multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013) Steinberger, J.: The UWB summariser at multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
14.
go back to reference Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Multiling 2013 Workshop on Multilingual Multi-Document Summarization, pp. 3899–3905 (2013) Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Multiling 2013 Workshop on Multilingual Multi-Document Summarization, pp. 3899–3905 (2013)
15.
go back to reference Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the discourse structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013) Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the discourse structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
16.
go back to reference El-Haj, M., Rayson, P.: Using a keyness metric for single and multi document summarisation. In: Multiling 2013 Workshop, ACL (2013) El-Haj, M., Rayson, P.: Using a keyness metric for single and multi document summarisation. In: Multiling 2013 Workshop, ACL (2013)
17.
go back to reference Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(40), 592–600 (2014)CrossRef Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(40), 592–600 (2014)CrossRef
18.
go back to reference Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Trans. Audio Speech Lang. Process. 21(3), 649–658 (2013)CrossRef Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Trans. Audio Speech Lang. Process. 21(3), 649–658 (2013)CrossRef
19.
go back to reference Xu, Y.D., Zhang, X.D., Quan, G.R., Wang, Y.D.: MRS for multi-document summarization by sentence extraction. Tele-commun. Syst. 53(1), 91–98 (2013)CrossRef Xu, Y.D., Zhang, X.D., Quan, G.R., Wang, Y.D.: MRS for multi-document summarization by sentence extraction. Tele-commun. Syst. 53(1), 91–98 (2013)CrossRef
20.
go back to reference Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data. ACM, pp. 91–97 (2008) Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data. ACM, pp. 91–97 (2008)
21.
go back to reference Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recommender Systems, pp. 61–68 (2009) Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recommender Systems, pp. 61–68 (2009)
22.
go back to reference Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. Adv. Neural Inf. Process. Syst. 17, 537–544 (2010) Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. Adv. Neural Inf. Process. Syst. 17, 537–544 (2010)
23.
go back to reference Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006) Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
24.
go back to reference Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1982–1989 (2009) Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1982–1989 (2009)
25.
go back to reference Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. Sharing clusters among related groups: hierarchical Dirichlet processes. Advanced Neural Inf Process Syst 37(2), 1385–1392 (2004) Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. Sharing clusters among related groups: hierarchical Dirichlet processes. Advanced Neural Inf Process Syst 37(2), 1385–1392 (2004)
26.
go back to reference Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 87–103 (2010)MathSciNetCrossRefMATH Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 87–103 (2010)MathSciNetCrossRefMATH
27.
go back to reference Celikyilmaz, A., HakkaniTur, D.: A hybrid hierarchical model for multi-document summarization. In: ACL 2010, Proceedings of the, Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 815–824 (2010) Celikyilmaz, A., HakkaniTur, D.: A hybrid hierarchical model for multi-document summarization. In: ACL 2010, Proceedings of the, Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 815–824 (2010)
28.
go back to reference Ren, Z., De Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 93–102 (2015) Ren, Z., De Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 93–102 (2015)
29.
go back to reference Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)CrossRef Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)CrossRef
30.
go back to reference Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008) Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008)
31.
go back to reference Qazvinian, V., Radev, D.R., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics (2010) Qazvinian, V., Radev, D.R., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics (2010)
32.
go back to reference Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, pp. 500–509 (2010) Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, pp. 500–509 (2010)
33.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. arXiv:​1301.​3781 (2013)
34.
go back to reference Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 4, 1188–1196 (2014) Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 4, 1188–1196 (2014)
35.
go back to reference Heng, W., Yu, J., Li, L., Liu, Y.: Research on key factors in multi-document topic modelling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013) Heng, W., Yu, J., Li, L., Liu, Y.: Research on key factors in multi-document topic modelling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)
36.
go back to reference Huang, T., Li, L., Zhang, Y., Chi, J.: Summarization based on multiple feature combination. In: Proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS 2016), 2016.8.17–19, Beijing, China, pp. 11–15 (2016) Huang, T., Li, L., Zhang, Y., Chi, J.: Summarization based on multiple feature combination. In: Proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS 2016), 2016.8.17–19, Beijing, China, pp. 11–15 (2016)
37.
go back to reference Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization with enhanced hLDA features. In: Springer Lecture Notes in Artificial Intelligence, LNAI10035, Subseries of Lecture Notes in Computer Science, Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated BigData, 15th China National Conference, CCL 2016, and 4th International Symposium, NLP-NABD 2016. Yantai, China, October 15–16, 2016, Proceedings, pp. 299–312 (2016) Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization with enhanced hLDA features. In: Springer Lecture Notes in Artificial Intelligence, LNAI10035, Subseries of Lecture Notes in Computer Science, Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated BigData, 15th China National Conference, CCL 2016, and 4th International Symposium, NLP-NABD 2016. Yantai, China, October 15–16, 2016, Proceedings, pp. 299–312 (2016)
38.
go back to reference Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 132–138, June 2016 Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 132–138, June 2016
39.
go back to reference Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 168–174, June 2016 Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 168–174, June 2016
40.
go back to reference Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable Citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 175–186, June 2016 Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable Citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 175–186, June 2016
41.
go back to reference Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT. Association of Computational Linguistics, Newark, NJ, USA, pp. 186–191 (2015) Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT. Association of Computational Linguistics, Newark, NJ, USA, pp. 186–191 (2015)
Metadata
Title
Computational linguistics literature and citations oriented citation linkage, classification and summarization
Authors
Lei Li
Liyuan Mao
Yazhao Zhang
Junqi Chi
Taiwen Huang
Xiaoyue Cong
Heng Peng
Publication date
13-06-2017
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 2-3/2018
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-017-0219-5

Other articles of this Issue 2-3/2018

International Journal on Digital Libraries 2-3/2018 Go to the issue

Premium Partner