Skip to main content
Top

2018 | OriginalPaper | Chapter

Tracing Language Variation for Romanian

Authors : Daniela Gîfu, Radu Simionescu

Published in: Computational Linguistics and Intelligent Text Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper illustrates a pilot study on two collections of publications, written at the middle of the 19th century in two countries, Romania and Republic of Moldavia. The corpus includes articles from the most important Romanian and Bessarabian publications, categorized in three periods: 1840–1917, 1918–1940, and 1941–1991. The research conducted on these resources focuses on the lexical evolution of words. We use a machine learning approach to explore the patterns that govern the lexical differences between two lexicons. The model is used for automatically correlating different forms of a word. The approach is suitable for bootstrapping, in order to increase the quantity and quality of the training data. The presented approach is language independent. By using the contemporary language as a pivot, the data is analyzed and compared from various perspectives.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Here are some corpus resources that involve diverse languages: ELDA (Evaluations and Language Resource Distribution Agency), TRACTOR (TELTRI Research Archive of Computational Tools and Resources), OTA (Oxford Text Archive), LDC (Linguistic Data Consortium), COROLA (COrpus of ROmanian Language).
 
2
The word was extracted from Foaia pentru minte, anima si literature, 1854.
 
Literature
1.
go back to reference Baron, N.S.: Language Acquisition and Historical Change. North-Holland, Amsterdam (1977) Baron, N.S.: Language Acquisition and Historical Change. North-Holland, Amsterdam (1977)
2.
go back to reference Densusianu, O.: Filologia Romanică în universitatea noastră. Bucuresci, J. V. Socecu Editeur, p. 23 (1902) Densusianu, O.: Filologia Romanică în universitatea noastră. Bucuresci, J. V. Socecu Editeur, p. 23 (1902)
3.
go back to reference Maiden, M., Smith, J.C., Ledgewav, A. (eds.): The Cambridge History of the Romance Languages. Cambridge University Press, Cambridge (2011) Maiden, M., Smith, J.C., Ledgewav, A. (eds.): The Cambridge History of the Romance Languages. Cambridge University Press, Cambridge (2011)
4.
go back to reference Saltarelii, M., Wanner, D. (eds.): Diachronic Studies in Romance Linguistics at the Conference on Diachronic Romance Linguistics, University of Illinois, April 1972, Janua Linguarum. Series Practica 207 (1972) Saltarelii, M., Wanner, D. (eds.): Diachronic Studies in Romance Linguistics at the Conference on Diachronic Romance Linguistics, University of Illinois, April 1972, Janua Linguarum. Series Practica 207 (1972)
5.
go back to reference Lüdeling, A., Poschenrieder, T., Faulstich, L.C., et al.: DeutschDiachronDigital - Ein diachrones Korpus des Deutschen. Jahrbuch für Computerphilologie 6, 119–136 (2005) Lüdeling, A., Poschenrieder, T., Faulstich, L.C., et al.: DeutschDiachronDigital - Ein diachrones Korpus des Deutschen. Jahrbuch für Computerphilologie 6, 119–136 (2005)
6.
go back to reference Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., Stede, M.: A flexible framework for integrating annotations from different tools and tag sets. Traitment Automatique des Langues 49, 271–293 (2008) Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., Stede, M.: A flexible framework for integrating annotations from different tools and tag sets. Traitment Automatique des Langues 49, 271–293 (2008)
7.
go back to reference Claridge, C.: Historical corpora. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 242–259. De Gruyter, Berlin (2008) Claridge, C.: Historical corpora. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 242–259. De Gruyter, Berlin (2008)
8.
go back to reference Rissanen, M.: Corpus linguistics and historical linguistics. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 53–68. Walter de Gruyter, Berlin and New York (2008) Rissanen, M.: Corpus linguistics and historical linguistics. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 53–68. Walter de Gruyter, Berlin and New York (2008)
9.
go back to reference Kytö, M.: Corpora and historical linguistics. Rev. Bras. Linguística Apl. 11(2), 417–457 (2011)CrossRef Kytö, M.: Corpora and historical linguistics. Rev. Bras. Linguística Apl. 11(2), 417–457 (2011)CrossRef
10.
go back to reference Kytö, M., Pahta, P.: Evidence from historical corpora up to the twentieth century. In: Nevalainen, T., Traugott, E.C. (eds.) The Oxford Handbook of the History of English, pp. 123–133. Oxford University Press, Oxford (2012) Kytö, M., Pahta, P.: Evidence from historical corpora up to the twentieth century. In: Nevalainen, T., Traugott, E.C. (eds.) The Oxford Handbook of the History of English, pp. 123–133. Oxford University Press, Oxford (2012)
12.
go back to reference Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009) Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009)
13.
go back to reference Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH
14.
go back to reference Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology. Cambridge University Press, Cambridge (1998) Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology. Cambridge University Press, Cambridge (1998)
15.
go back to reference Tufiș, D., Filip, F.Gh. (coord.): Limba română în Societatea informațională – Societatea Cunoașterii. Expert, București (2002) Tufiș, D., Filip, F.Gh. (coord.): Limba română în Societatea informațională – Societatea Cunoașterii. Expert, București (2002)
16.
go back to reference Cristea, D., Butnariu, C.: Hierarchical XML representation for heavily annotated corpora. In: Proceedings of the LREC 2004 Workshop on XML-Based Richly Annotated Corpora, Lisbon, Portugal (2004) Cristea, D., Butnariu, C.: Hierarchical XML representation for heavily annotated corpora. In: Proceedings of the LREC 2004 Workshop on XML-Based Richly Annotated Corpora, Lisbon, Portugal (2004)
18.
go back to reference Gîfu, D.: Contrastive diachronic study on romanian language. In: Cojocaru, S., Gaindric, C. (eds.) Proceedings FOI-2015 Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, pp. 296–310 (2015) Gîfu, D.: Contrastive diachronic study on romanian language. In: Cojocaru, S., Gaindric, C. (eds.) Proceedings FOI-2015 Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, pp. 296–310 (2015)
19.
go back to reference Scannel, K.: Statistical models for text normalization and machine translation. In: Proceedings of the First Celtic Language Technology Workshop, Dublin, Ireland, pp. 33–40, 23 August 2014 Scannel, K.: Statistical models for text normalization and machine translation. In: Proceedings of the First Celtic Language Technology Workshop, Dublin, Ireland, pp. 33–40, 23 August 2014
20.
go back to reference Hajič, J., Hric, J., Kuboň, V.: Machine translation of very close languages. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 7–12. Association for Computational Linguistics (2000) Hajič, J., Hric, J., Kuboň, V.: Machine translation of very close languages. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 7–12. Association for Computational Linguistics (2000)
21.
go back to reference Koppel, M., Ordan, N.: Translationese and its dialects. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), Portland, Oregon (2011) Koppel, M., Ordan, N.: Translationese and its dialects. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), Portland, Oregon (2011)
22.
go back to reference Gooskens, C.: Linguistic and extra-linguistic predictors of Inter-Scandinavian intelligibility. In: Van de Weijer, J., Los, B. (eds.) Linguistics in the Netherlands, vol. 23, pp. 101–113. John Benjamins, Amsterdam (2006) Gooskens, C.: Linguistic and extra-linguistic predictors of Inter-Scandinavian intelligibility. In: Van de Weijer, J., Los, B. (eds.) Linguistics in the Netherlands, vol. 23, pp. 101–113. John Benjamins, Amsterdam (2006)
23.
go back to reference Gooskens, C., Beijering, K., Heeringa, W.: Phonetic and lexical predictors of intelligibility. Int. J. Humanit. Arts Comput. 2(1–2), 63–81 (2008)CrossRef Gooskens, C., Beijering, K., Heeringa, W.: Phonetic and lexical predictors of intelligibility. Int. J. Humanit. Arts Comput. 2(1–2), 63–81 (2008)CrossRef
24.
go back to reference Delmestri, A., Cristianini, N.: String similarity measures and PAM-like matrices for cognate identification. Bucharest Work. Pap. Linguist. 12(2), 71–82 (2010) Delmestri, A., Cristianini, N.: String similarity measures and PAM-like matrices for cognate identification. Bucharest Work. Pap. Linguist. 12(2), 71–82 (2010)
25.
go back to reference Frunza, O., Inkpen, D., Nadeau, D.: A text processing tool for the Romanian language. In: Proceedings of the EuroLAN 2005 Workshop on Cross-Language Knowledge Induction (2005) Frunza, O., Inkpen, D., Nadeau, D.: A text processing tool for the Romanian language. In: Proceedings of the EuroLAN 2005 Workshop on Cross-Language Knowledge Induction (2005)
26.
go back to reference Rama, T., Borin, L.: Comparative evaluation of string similarity measures for automatic language classification. In: Mikros, G.K., Macutek, J. (eds.) Sequences in Language and Text. De Gruyter Mouton (2014) Rama, T., Borin, L.: Comparative evaluation of string similarity measures for automatic language classification. In: Mikros, G.K., Macutek, J. (eds.) Sequences in Language and Text. De Gruyter Mouton (2014)
27.
go back to reference Ciobanu, A., Dinu, L.: An etymological approach to cross-language orthographic similarity. In: Application on Romanian in “Proceedings of EMNLP-2014ˮ, Doha, Quatar, 25–29 October 2014, pp. 1047–1058 (2014) Ciobanu, A., Dinu, L.: An etymological approach to cross-language orthographic similarity. In: Application on Romanian in “Proceedings of EMNLP-2014ˮ, Doha, Quatar, 25–29 October 2014, pp. 1047–1058 (2014)
28.
go back to reference Hristea, T.: Sinteze de limba română, 2nd edn, pp. 100–102. Didactică și Pedagogică, București (1981) Hristea, T.: Sinteze de limba română, 2nd edn, pp. 100–102. Didactică și Pedagogică, București (1981)
Metadata
Title
Tracing Language Variation for Romanian
Authors
Daniela Gîfu
Radu Simionescu
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_45

Premium Partner