nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

A Large-Scale Analysis of Cross-lingual Citations in English Papers

verfasst von : Tarek Saier, Michael Färber

Erschienen in: Digital Libraries at Times of Massive Societal Transition

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Citation data is an important source of insight into the scholarly discourse and the reception of publications. Outcomes of citation analyses and the applicability of citation based machine learning approaches heavily depend on the completeness of citation data. One particular shortcoming of scholarly data nowadays is language coverage. That is, non-English publications are often not included in data sets, or language metadata is not available. While national citation indices exist, these are often not interconnected to other data sets. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on one million English papers, covering three scientific disciplines and a time span of 27 years. Our results unveil differences between languages and disciplines, show developments over time, and give insight into the impact of cross-lingual citations on scholarly data mining as well as the publications that contain them. To facilitate further analyses, we make our collected data and code for analysis publicly available.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel ReViz: A Tool for Automatically Generating Citation Graphs and Variants

Nächstes Kapitel How Do Retractions Influence the Citations of Retracted Articles?

The selection of RQs is motivated by existing literature [18, 21] (1–3) as well as the intent to inform future endeavors in handling multilingual scholarly data (4–5).

See https://github.com/IllDepence/icadl2020.

Language information is given for the cited document by the “<Language>” part of the marker, and for the citing document by the fact, that the marker is in English.

Identification of marked entries is detailed in Sect. 3.3. For the identification of non-English titles we used the reference string parser module of GROBID [24] and the Python module langdetect (see https://github.com/Mimino666/langdetect).

This is because the detection of untranslated non-English reference titles requires language identification on reference titles, which turned out to be unreliable for Latin script languages (e.g., many English titles were falsely identified as German).

See https://www.ncbi.nlm.nih.gov/pmc/about/faq/#q16.

See https://doaj.org/.

Full evaluation details can be found at https://github.com/IllDepence/icadl2020.

See http://hdl.handle.net/2433/172983, https://ci.nii.ac.jp/naid/10008827159/.

As, for example, in reference [15] in arXiv:1503.05573: “-2pt

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figf_HTML.gif

-2pt

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figg_HTML.gif

, 2007. (English translation: Shafarevich I.R. Foundations of Algebraic Geometry MCCME, Moscow. 2007).”

See https://www.ncbi.nlm.nih.gov/pmc/about/faq/#q16.

See https://support.nii.ac.jp/cia/cinii_db.

Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: to- wards NLP-based bibliometrics. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606. Association for Computational Linguistics, Atlanta (2013)

Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Tech. 57(3), 359–377 (2006). https://doi.org/10.1002/asi.20317CrossRef

Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.: SPECTER: document-level representation learning using citation-informed transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2270–2282. Association for Computational Linguistics (2020)

Colavizza, G., Romanello, M.: Citation mining of humanities journals: the progress to date and the challenges ahead. J. Eur. Period. Stud. 4(1), 36–53 (2019)CrossRef

Eleta, I., Golbeck, J.: Bridging languages in social networks: how multilingual users of Twitter connect language communities? In: Proceedings of the American Society for Information Science and Technology, vol. 49, no. 1, pp. 1–4 (2012). https://doi.org/10.1002/meet.14504901327

Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)CrossRef

Färber, M., Jatowt, A.: Citation recommendation: approaches and datasets. Int. J. Digit. Libraries (to appear)

Fukuda, S., et al.: Construction of a CiNii

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figh_HTML.gif

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figi_HTML.gif

database driven research trend analysis system. In:

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figj_HTML.gif

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figk_HTML.gif

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-64452-9_11/MediaObjects/502070_1_En_11_Figl_HTML.gif

, pp. 539–542 (2012). (in Japanese)

Gipp, B., Meuschke, N., Lipinski, M.: CITREC : an evaluation framework for citation-based similarity measures based on TREC genomics and PubMed central. In: iConference 2015 Proceedings. iSchools (2015)

10.

Hale, S.A.: Global connectivity and multilinguals in the Twitter network. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2014, pp. 833–842. Association for Computing Machinery, Toronto (2014). https://doi.org/10.1145/2556288.2557203

11.

Hale, S.A.: Net increase? Cross-lingual linking in the blogosphere. J. Comput. Mediated Commun. 17(2), 135–151 (2012). https://doi.org/10.1111/j.1083-6101.2011.01568.x

12.

Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Nat. Acad. Sci. 102(46), 16569–16572 (2005)CrossRef

13.

Huh, S.: Journal article tag suite 1.0: national information standards organization standard of journal extensible markup language. Sci. Edit. 1(2), 99–104 (2014). https://doi.org/10.6087/kcse.2014.1.99CrossRef

14.

Jauhiainen, T.S., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: a survey. J. Artif. Intell. Res. 65, 675–782 (2019)MathSciNetCrossRef

15.

Jiang, Z., Lu, Y., Liu, X.: Cross-language citation recommendation via publication content and citation representation fusion. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, pp. 347–348. Association for Computing Machinery, Fort Worth (2018). https://doi.org/10.1145/3197026.3203898

16.

Jiang, Z., Yin, Y., Gao, L., Lu, Y., Liu, X.: Cross-language citation recommendation via hierarchical representation learning on heterogeneous graph. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, pp. 635–644. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3209978.3210032

17.

Jin, H., Toyoda, M., Yoshinaga, N.: Can cross-lingual information cascades be predicted on Twitter? In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 457–472. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_28CrossRef

18.

Kellsey, C., Knievel, J.E.: Global English in the humanities? A longitudinal citation study of foreign-language use by humanities scholars. Coll. Res. Libr. 65(3), 194–204 (2004)CrossRef

19.

Khan, S., Liu, X., Shakil, K.A., Alam, M.: A survey on scholarly data: from big data perspective. Inf. Process. Manage. 53(4), 923–944 (2017). https://doi.org/10.1016/j.ipm.2017.03.006CrossRef

20.

Kirchik, O., Gingras, Y., Larivière, V.: Changes in publication languages and citation practices and their effect on the scientific impact of Russian science (1993–2010). J. Am. Soc. Inf. Sci. Technol. 63(7), 1411–1419 (2012). https://doi.org/10.1002/asi.22642CrossRef

21.

Lillis, T., Hewings, A., Vladimirou, D., Curry, M.J.: The geolinguistics of English as an academic lingua franca: citation practices across English-medium national and English-medium international journals. Int. J. Appl. Linguist. 20(1), 111–135 (2010). https://doi.org/10.1111/j.1473-4192.2009.00233.xCrossRef

22.

Liu, X., Chen, X.: CJK languages or English: languages used by academic journals in China, Japan, and Korea. J. Sch. Publish. 50(3), 201–214 (2019)CrossRef

23.

Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4969–4983. Association for Computational Linguistics (2020)

24.

Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62CrossRef

25.

Ma, S., Zhang, C., Liu, X.: A review of citation recommendation: from textual content to enriched context. Scientometrics 122(3), 1445–1472 (2020). https://doi.org/10.1007/s11192-019-03336-0CrossRef

26.

Moed, H.F., Markusova, V., Akoev, M.: Trends in Russian research output indexed in Scopus and Web of science. Scientometrics 116(2), 1153–1180 (2018). https://doi.org/10.1007/s11192-018-2769-8CrossRef

27.

Moskaleva, O., Akoev, M.: Non-English language publications in Citation In- dexes - quantity and quality. In: Proceedings 17th International Conference on Scientometrics & Informetrics, pp. 35–46. Edizioni Efesto, Italy (2019)

28.

Saier, T., Färber, M.: unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata. Scientometrics (2), 1–24 (2020). https://doi.org/10.1007/s11192-020-03382-z

29.

Schrader, B.: Cross-language Citation Analysis of Traditional and Open Access Journals (2019). https://doi.org/10.17615/djpr-1k06

30.

Sinha, A., et al.: An overview of microsoft academic service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015 Companion, pp. 243–246. ACM (2015). https://doi.org/10.1145/2740908.2742839

31.

Tang, X., Wan, X., Zhang, X.: Cross-language context-aware citation recommendation in scientific articles. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2014, pp. 817–826. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2600428.2609564

32.

Vera-Baceta, M.-A., Thelwall, M., Kousha, K.: Web of science and scopus language coverage. Scientometrics 121(3), 1803–1813 (2019). https://doi.org/10.1007/s11192-019-03264-zCrossRef

33.

Wang, K., et al.: A review of Microsoft academic services for science of science studies. Front. Big Data 2, 45 (2019). https://doi.org/10.3389/fdata.2019CrossRef

34.

Zuckerman, E.: Meet the bridgebloggers. Public Choice 134(1), 47–65 (2008). https://doi.org/10.1007/s11127-007-9200-yCrossRef

Titel: A Large-Scale Analysis of Cross-lingual Citations in English Papers
verfasst von: Tarek Saier
Michael Färber
Verlag: Springer International Publishing
Buch: Digital Libraries at Times of Massive Societal Transition
Print ISBN: 978-3-030-64451-2

Electronic ISBN: 978-3-030-64452-9

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-64452-9_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"