Skip to main content

2018 | OriginalPaper | Buchkapitel

Tracing Language Variation for Romanian

verfasst von : Daniela Gîfu, Radu Simionescu

Erschienen in: Computational Linguistics and Intelligent Text Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper illustrates a pilot study on two collections of publications, written at the middle of the 19th century in two countries, Romania and Republic of Moldavia. The corpus includes articles from the most important Romanian and Bessarabian publications, categorized in three periods: 1840–1917, 1918–1940, and 1941–1991. The research conducted on these resources focuses on the lexical evolution of words. We use a machine learning approach to explore the patterns that govern the lexical differences between two lexicons. The model is used for automatically correlating different forms of a word. The approach is suitable for bootstrapping, in order to increase the quantity and quality of the training data. The presented approach is language independent. By using the contemporary language as a pivot, the data is analyzed and compared from various perspectives.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Here are some corpus resources that involve diverse languages: ELDA (Evaluations and Language Resource Distribution Agency), TRACTOR (TELTRI Research Archive of Computational Tools and Resources), OTA (Oxford Text Archive), LDC (Linguistic Data Consortium), COROLA (COrpus of ROmanian Language).
 
2
The word was extracted from Foaia pentru minte, anima si literature, 1854.
 
Literatur
1.
Zurück zum Zitat Baron, N.S.: Language Acquisition and Historical Change. North-Holland, Amsterdam (1977) Baron, N.S.: Language Acquisition and Historical Change. North-Holland, Amsterdam (1977)
2.
Zurück zum Zitat Densusianu, O.: Filologia Romanică în universitatea noastră. Bucuresci, J. V. Socecu Editeur, p. 23 (1902) Densusianu, O.: Filologia Romanică în universitatea noastră. Bucuresci, J. V. Socecu Editeur, p. 23 (1902)
3.
Zurück zum Zitat Maiden, M., Smith, J.C., Ledgewav, A. (eds.): The Cambridge History of the Romance Languages. Cambridge University Press, Cambridge (2011) Maiden, M., Smith, J.C., Ledgewav, A. (eds.): The Cambridge History of the Romance Languages. Cambridge University Press, Cambridge (2011)
4.
Zurück zum Zitat Saltarelii, M., Wanner, D. (eds.): Diachronic Studies in Romance Linguistics at the Conference on Diachronic Romance Linguistics, University of Illinois, April 1972, Janua Linguarum. Series Practica 207 (1972) Saltarelii, M., Wanner, D. (eds.): Diachronic Studies in Romance Linguistics at the Conference on Diachronic Romance Linguistics, University of Illinois, April 1972, Janua Linguarum. Series Practica 207 (1972)
5.
Zurück zum Zitat Lüdeling, A., Poschenrieder, T., Faulstich, L.C., et al.: DeutschDiachronDigital - Ein diachrones Korpus des Deutschen. Jahrbuch für Computerphilologie 6, 119–136 (2005) Lüdeling, A., Poschenrieder, T., Faulstich, L.C., et al.: DeutschDiachronDigital - Ein diachrones Korpus des Deutschen. Jahrbuch für Computerphilologie 6, 119–136 (2005)
6.
Zurück zum Zitat Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., Stede, M.: A flexible framework for integrating annotations from different tools and tag sets. Traitment Automatique des Langues 49, 271–293 (2008) Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., Stede, M.: A flexible framework for integrating annotations from different tools and tag sets. Traitment Automatique des Langues 49, 271–293 (2008)
7.
Zurück zum Zitat Claridge, C.: Historical corpora. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 242–259. De Gruyter, Berlin (2008) Claridge, C.: Historical corpora. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 242–259. De Gruyter, Berlin (2008)
8.
Zurück zum Zitat Rissanen, M.: Corpus linguistics and historical linguistics. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 53–68. Walter de Gruyter, Berlin and New York (2008) Rissanen, M.: Corpus linguistics and historical linguistics. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 53–68. Walter de Gruyter, Berlin and New York (2008)
9.
Zurück zum Zitat Kytö, M.: Corpora and historical linguistics. Rev. Bras. Linguística Apl. 11(2), 417–457 (2011)CrossRef Kytö, M.: Corpora and historical linguistics. Rev. Bras. Linguística Apl. 11(2), 417–457 (2011)CrossRef
10.
Zurück zum Zitat Kytö, M., Pahta, P.: Evidence from historical corpora up to the twentieth century. In: Nevalainen, T., Traugott, E.C. (eds.) The Oxford Handbook of the History of English, pp. 123–133. Oxford University Press, Oxford (2012) Kytö, M., Pahta, P.: Evidence from historical corpora up to the twentieth century. In: Nevalainen, T., Traugott, E.C. (eds.) The Oxford Handbook of the History of English, pp. 123–133. Oxford University Press, Oxford (2012)
12.
Zurück zum Zitat Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009) Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009)
13.
Zurück zum Zitat Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH
14.
Zurück zum Zitat Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology. Cambridge University Press, Cambridge (1998) Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology. Cambridge University Press, Cambridge (1998)
15.
Zurück zum Zitat Tufiș, D., Filip, F.Gh. (coord.): Limba română în Societatea informațională – Societatea Cunoașterii. Expert, București (2002) Tufiș, D., Filip, F.Gh. (coord.): Limba română în Societatea informațională – Societatea Cunoașterii. Expert, București (2002)
16.
Zurück zum Zitat Cristea, D., Butnariu, C.: Hierarchical XML representation for heavily annotated corpora. In: Proceedings of the LREC 2004 Workshop on XML-Based Richly Annotated Corpora, Lisbon, Portugal (2004) Cristea, D., Butnariu, C.: Hierarchical XML representation for heavily annotated corpora. In: Proceedings of the LREC 2004 Workshop on XML-Based Richly Annotated Corpora, Lisbon, Portugal (2004)
18.
Zurück zum Zitat Gîfu, D.: Contrastive diachronic study on romanian language. In: Cojocaru, S., Gaindric, C. (eds.) Proceedings FOI-2015 Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, pp. 296–310 (2015) Gîfu, D.: Contrastive diachronic study on romanian language. In: Cojocaru, S., Gaindric, C. (eds.) Proceedings FOI-2015 Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, pp. 296–310 (2015)
19.
Zurück zum Zitat Scannel, K.: Statistical models for text normalization and machine translation. In: Proceedings of the First Celtic Language Technology Workshop, Dublin, Ireland, pp. 33–40, 23 August 2014 Scannel, K.: Statistical models for text normalization and machine translation. In: Proceedings of the First Celtic Language Technology Workshop, Dublin, Ireland, pp. 33–40, 23 August 2014
20.
Zurück zum Zitat Hajič, J., Hric, J., Kuboň, V.: Machine translation of very close languages. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 7–12. Association for Computational Linguistics (2000) Hajič, J., Hric, J., Kuboň, V.: Machine translation of very close languages. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 7–12. Association for Computational Linguistics (2000)
21.
Zurück zum Zitat Koppel, M., Ordan, N.: Translationese and its dialects. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), Portland, Oregon (2011) Koppel, M., Ordan, N.: Translationese and its dialects. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), Portland, Oregon (2011)
22.
Zurück zum Zitat Gooskens, C.: Linguistic and extra-linguistic predictors of Inter-Scandinavian intelligibility. In: Van de Weijer, J., Los, B. (eds.) Linguistics in the Netherlands, vol. 23, pp. 101–113. John Benjamins, Amsterdam (2006) Gooskens, C.: Linguistic and extra-linguistic predictors of Inter-Scandinavian intelligibility. In: Van de Weijer, J., Los, B. (eds.) Linguistics in the Netherlands, vol. 23, pp. 101–113. John Benjamins, Amsterdam (2006)
23.
Zurück zum Zitat Gooskens, C., Beijering, K., Heeringa, W.: Phonetic and lexical predictors of intelligibility. Int. J. Humanit. Arts Comput. 2(1–2), 63–81 (2008)CrossRef Gooskens, C., Beijering, K., Heeringa, W.: Phonetic and lexical predictors of intelligibility. Int. J. Humanit. Arts Comput. 2(1–2), 63–81 (2008)CrossRef
24.
Zurück zum Zitat Delmestri, A., Cristianini, N.: String similarity measures and PAM-like matrices for cognate identification. Bucharest Work. Pap. Linguist. 12(2), 71–82 (2010) Delmestri, A., Cristianini, N.: String similarity measures and PAM-like matrices for cognate identification. Bucharest Work. Pap. Linguist. 12(2), 71–82 (2010)
25.
Zurück zum Zitat Frunza, O., Inkpen, D., Nadeau, D.: A text processing tool for the Romanian language. In: Proceedings of the EuroLAN 2005 Workshop on Cross-Language Knowledge Induction (2005) Frunza, O., Inkpen, D., Nadeau, D.: A text processing tool for the Romanian language. In: Proceedings of the EuroLAN 2005 Workshop on Cross-Language Knowledge Induction (2005)
26.
Zurück zum Zitat Rama, T., Borin, L.: Comparative evaluation of string similarity measures for automatic language classification. In: Mikros, G.K., Macutek, J. (eds.) Sequences in Language and Text. De Gruyter Mouton (2014) Rama, T., Borin, L.: Comparative evaluation of string similarity measures for automatic language classification. In: Mikros, G.K., Macutek, J. (eds.) Sequences in Language and Text. De Gruyter Mouton (2014)
27.
Zurück zum Zitat Ciobanu, A., Dinu, L.: An etymological approach to cross-language orthographic similarity. In: Application on Romanian in “Proceedings of EMNLP-2014ˮ, Doha, Quatar, 25–29 October 2014, pp. 1047–1058 (2014) Ciobanu, A., Dinu, L.: An etymological approach to cross-language orthographic similarity. In: Application on Romanian in “Proceedings of EMNLP-2014ˮ, Doha, Quatar, 25–29 October 2014, pp. 1047–1058 (2014)
28.
Zurück zum Zitat Hristea, T.: Sinteze de limba română, 2nd edn, pp. 100–102. Didactică și Pedagogică, București (1981) Hristea, T.: Sinteze de limba română, 2nd edn, pp. 100–102. Didactică și Pedagogică, București (1981)
Metadaten
Titel
Tracing Language Variation for Romanian
verfasst von
Daniela Gîfu
Radu Simionescu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_45

Premium Partner