Skip to main content

2020 | OriginalPaper | Buchkapitel

A Large-Scale Analysis of Cross-lingual Citations in English Papers

verfasst von : Tarek Saier, Michael Färber

Erschienen in: Digital Libraries at Times of Massive Societal Transition

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Citation data is an important source of insight into the scholarly discourse and the reception of publications. Outcomes of citation analyses and the applicability of citation based machine learning approaches heavily depend on the completeness of citation data. One particular shortcoming of scholarly data nowadays is language coverage. That is, non-English publications are often not included in data sets, or language metadata is not available. While national citation indices exist, these are often not interconnected to other data sets. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on one million English papers, covering three scientific disciplines and a time span of 27 years. Our results unveil differences between languages and disciplines, show developments over time, and give insight into the impact of cross-lingual citations on scholarly data mining as well as the publications that contain them. To facilitate further analyses, we make our collected data and code for analysis publicly available.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The selection of RQs is motivated by existing literature  [18, 21] (1–3) as well as the intent to inform future endeavors in handling multilingual scholarly data (4–5).
 
3
Language information is given for the cited document by the “<Language>” part of the marker, and for the citing document by the fact, that the marker is in English.
 
4
Identification of marked entries is detailed in Sect. 3.3. For the identification of non-English titles we used the reference string parser module of GROBID  [24] and the Python module langdetect (see https://​github.​com/​Mimino666/​langdetect).
 
5
This is because the detection of untranslated non-English reference titles requires language identification on reference titles, which turned out to be unreliable for Latin script languages (e.g., many English titles were falsely identified as German).
 
8
Full evaluation details can be found at https://​github.​com/​IllDepence/​icadl2020.
 
10
As, for example, in reference [15] in arXiv:1503.05573: “-2pt https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figf_HTML.gif -2pt https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figg_HTML.gif , 2007. (English translation: Shafarevich I.R. Foundations of Algebraic Geometry MCCME, Moscow. 2007).”
 
Literatur
1.
Zurück zum Zitat Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: to- wards NLP-based bibliometrics. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606. Association for Computational Linguistics, Atlanta (2013) Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: to- wards NLP-based bibliometrics. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606. Association for Computational Linguistics, Atlanta (2013)
3.
Zurück zum Zitat Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.: SPECTER: document-level representation learning using citation-informed transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2270–2282. Association for Computational Linguistics (2020) Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.: SPECTER: document-level representation learning using citation-informed transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2270–2282. Association for Computational Linguistics (2020)
4.
Zurück zum Zitat Colavizza, G., Romanello, M.: Citation mining of humanities journals: the progress to date and the challenges ahead. J. Eur. Period. Stud. 4(1), 36–53 (2019)CrossRef Colavizza, G., Romanello, M.: Citation mining of humanities journals: the progress to date and the challenges ahead. J. Eur. Period. Stud. 4(1), 36–53 (2019)CrossRef
5.
Zurück zum Zitat Eleta, I., Golbeck, J.: Bridging languages in social networks: how multilingual users of Twitter connect language communities? In: Proceedings of the American Society for Information Science and Technology, vol. 49, no. 1, pp. 1–4 (2012). https://doi.org/10.1002/meet.14504901327 Eleta, I., Golbeck, J.: Bridging languages in social networks: how multilingual users of Twitter connect language communities? In: Proceedings of the American Society for Information Science and Technology, vol. 49, no. 1, pp. 1–4 (2012). https://​doi.​org/​10.​1002/​meet.​14504901327
6.
Zurück zum Zitat Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)CrossRef Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)CrossRef
7.
Zurück zum Zitat Färber, M., Jatowt, A.: Citation recommendation: approaches and datasets. Int. J. Digit. Libraries (to appear) Färber, M., Jatowt, A.: Citation recommendation: approaches and datasets. Int. J. Digit. Libraries (to appear)
8.
Zurück zum Zitat Fukuda, S., et al.: Construction of a CiNii database driven research trend analysis system. In: 18 , pp. 539–542 (2012). (in Japanese) Fukuda, S., et al.: Construction of a CiNii https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figh_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figi_HTML.gif database driven research trend analysis system. In: https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figj_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figk_HTML.gif 18 https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figl_HTML.gif , pp. 539–542 (2012). (in Japanese)
9.
Zurück zum Zitat Gipp, B., Meuschke, N., Lipinski, M.: CITREC : an evaluation framework for citation-based similarity measures based on TREC genomics and PubMed central. In: iConference 2015 Proceedings. iSchools (2015) Gipp, B., Meuschke, N., Lipinski, M.: CITREC : an evaluation framework for citation-based similarity measures based on TREC genomics and PubMed central. In: iConference 2015 Proceedings. iSchools (2015)
10.
Zurück zum Zitat Hale, S.A.: Global connectivity and multilinguals in the Twitter network. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2014, pp. 833–842. Association for Computing Machinery, Toronto (2014). https://doi.org/10.1145/2556288.2557203 Hale, S.A.: Global connectivity and multilinguals in the Twitter network. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2014, pp. 833–842. Association for Computing Machinery, Toronto (2014). https://​doi.​org/​10.​1145/​2556288.​2557203
12.
Zurück zum Zitat Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Nat. Acad. Sci. 102(46), 16569–16572 (2005)CrossRef Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Nat. Acad. Sci. 102(46), 16569–16572 (2005)CrossRef
14.
Zurück zum Zitat Jauhiainen, T.S., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: a survey. J. Artif. Intell. Res. 65, 675–782 (2019)MathSciNetCrossRef Jauhiainen, T.S., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: a survey. J. Artif. Intell. Res. 65, 675–782 (2019)MathSciNetCrossRef
15.
Zurück zum Zitat Jiang, Z., Lu, Y., Liu, X.: Cross-language citation recommendation via publication content and citation representation fusion. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, pp. 347–348. Association for Computing Machinery, Fort Worth (2018). https://doi.org/10.1145/3197026.3203898 Jiang, Z., Lu, Y., Liu, X.: Cross-language citation recommendation via publication content and citation representation fusion. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, pp. 347–348. Association for Computing Machinery, Fort Worth (2018). https://​doi.​org/​10.​1145/​3197026.​3203898
16.
Zurück zum Zitat Jiang, Z., Yin, Y., Gao, L., Lu, Y., Liu, X.: Cross-language citation recommendation via hierarchical representation learning on heterogeneous graph. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, pp. 635–644. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3209978.3210032 Jiang, Z., Yin, Y., Gao, L., Lu, Y., Liu, X.: Cross-language citation recommendation via hierarchical representation learning on heterogeneous graph. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, pp. 635–644. Association for Computing Machinery, New York (2018). https://​doi.​org/​10.​1145/​3209978.​3210032
18.
Zurück zum Zitat Kellsey, C., Knievel, J.E.: Global English in the humanities? A longitudinal citation study of foreign-language use by humanities scholars. Coll. Res. Libr. 65(3), 194–204 (2004)CrossRef Kellsey, C., Knievel, J.E.: Global English in the humanities? A longitudinal citation study of foreign-language use by humanities scholars. Coll. Res. Libr. 65(3), 194–204 (2004)CrossRef
20.
22.
Zurück zum Zitat Liu, X., Chen, X.: CJK languages or English: languages used by academic journals in China, Japan, and Korea. J. Sch. Publish. 50(3), 201–214 (2019)CrossRef Liu, X., Chen, X.: CJK languages or English: languages used by academic journals in China, Japan, and Korea. J. Sch. Publish. 50(3), 201–214 (2019)CrossRef
23.
Zurück zum Zitat Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4969–4983. Association for Computational Linguistics (2020) Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4969–4983. Association for Computational Linguistics (2020)
24.
27.
Zurück zum Zitat Moskaleva, O., Akoev, M.: Non-English language publications in Citation In- dexes - quantity and quality. In: Proceedings 17th International Conference on Scientometrics & Informetrics, pp. 35–46. Edizioni Efesto, Italy (2019) Moskaleva, O., Akoev, M.: Non-English language publications in Citation In- dexes - quantity and quality. In: Proceedings 17th International Conference on Scientometrics & Informetrics, pp. 35–46. Edizioni Efesto, Italy (2019)
31.
Zurück zum Zitat Tang, X., Wan, X., Zhang, X.: Cross-language context-aware citation recommendation in scientific articles. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2014, pp. 817–826. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2600428.2609564 Tang, X., Wan, X., Zhang, X.: Cross-language context-aware citation recommendation in scientific articles. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2014, pp. 817–826. Association for Computing Machinery, New York (2014). https://​doi.​org/​10.​1145/​2600428.​2609564
Metadaten
Titel
A Large-Scale Analysis of Cross-lingual Citations in English Papers
verfasst von
Tarek Saier
Michael Färber
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-64452-9_11