Skip to main content

2020 | OriginalPaper | Buchkapitel

A Method of Semantic Change Detection Using Diachronic Corpora Data

verfasst von : Vladimir Bochkarev, Anna Shevlyakova, Valery Solovyev

Erschienen in: Analysis of Images, Social Networks and Texts

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The article proposes a method for detecting semantic change using diachronic corpora data. The method is based on the distributional hypothesis. The analysis is performed using frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram. To obtain the word co-occurrence profile in its new meaning, syntactic bigrams that contributed most to the word distribution change are selected and their time series are clustered. The method is tested on a group of English and Russian words which gained new meanings in the 20th century. The obtained results show that the proposed method allows one to detect semantics changes, as well as to determine the time of these changes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bloomfield, L.: Language. Allen & Unwin (1933) Bloomfield, L.: Language. Allen & Unwin (1933)
2.
Zurück zum Zitat Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, pp. 625–635 (2015) Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, pp. 625–635 (2015)
3.
Zurück zum Zitat Kutuzov, A., Øvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1384–1397 (2018) Kutuzov, A., Øvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1384–1397 (2018)
4.
Zurück zum Zitat Tahmasebi, N., Borin, L., Jatowt, A.: Survey of computational approaches to diachronic conceptual change detection. arXiv preprint: arXiv:1811.06278v1 (2018) Tahmasebi, N., Borin, L., Jatowt, A.: Survey of computational approaches to diachronic conceptual change detection. arXiv preprint: arXiv:​1811.​06278v1 (2018)
5.
Zurück zum Zitat Juola, P.: The time course of language change. Comput. Humanit. 37(1), 77–96 (2003)CrossRef Juola, P.: The time course of language change. Comput. Humanit. 37(1), 77–96 (2003)CrossRef
6.
Zurück zum Zitat Hilpert, M., Gries, S.: Assessing frequency changes in multistage diachronic corpora: applications for historical corpus linguistics and the study of language acquisition. Lit. Linguist. Comput. 24(4), 385–401 (2009)CrossRef Hilpert, M., Gries, S.: Assessing frequency changes in multistage diachronic corpora: applications for historical corpus linguistics and the study of language acquisition. Lit. Linguist. Comput. 24(4), 385–401 (2009)CrossRef
7.
8.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (NIPS 2013), pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (NIPS 2013), pp. 3111–3119 (2013)
9.
Zurück zum Zitat Basile, B., Caputo, A., Semeraro, G.: Analysing word meaning over time by exploiting temporal random indexing. In: Proceedings of the First Italian Conference on Computational Linguistics, Turin, Italy, pp. 38–42 (2014) Basile, B., Caputo, A., Semeraro, G.: Analysing word meaning over time by exploiting temporal random indexing. In: Proceedings of the First Italian Conference on Computational Linguistics, Turin, Italy, pp. 38–42 (2014)
10.
Zurück zum Zitat Mitra, S., Mitra, R., Riedl, R., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: Automatic identification of word sense change across different timescales. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, pp. 1020–1029 (2014) Mitra, S., Mitra, R., Riedl, R., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: Automatic identification of word sense change across different timescales. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, pp. 1020–1029 (2014)
11.
Zurück zum Zitat Kim, Y., Chiu, Yi.-I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp. 61–65 (2014) Kim, Y., Chiu, Yi.-I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp. 61–65 (2014)
12.
Zurück zum Zitat Yao, Z., Sun, Y., Ding, W., Rao, H., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, pp. 673–681 (2018) Yao, Z., Sun, Y., Ding, W., Rao, H., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, pp. 673–681 (2018)
13.
Zurück zum Zitat Solovyev, V.: Vozmozhnye mehanizmy izmenenija kognitivnoj struktury sinonimi-cheskih rjadov. V sb. “Jazyk i mysl’: Sovremennaja kognitivnaja lingvistika”, pp. 478–487. Jazyki slavjanskoj kul’tury, Moskva (2015) Solovyev, V.: Vozmozhnye mehanizmy izmenenija kognitivnoj struktury sinonimi-cheskih rjadov. V sb. “Jazyk i mysl’: Sovremennaja kognitivnaja lingvistika”, pp. 478–487. Jazyki slavjanskoj kul’tury, Moskva (2015)
15.
Zurück zum Zitat Schütze, H., Manning, C.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH Schütze, H., Manning, C.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH
16.
Zurück zum Zitat Bochkarev, V.V., Solovyev, V.D., Wichmann, S.: Universals versus historical contingencies in lexical evolution. J. R. Soc. Interface 11, 20140841 (2014)CrossRef Bochkarev, V.V., Solovyev, V.D., Wichmann, S.: Universals versus historical contingencies in lexical evolution. J. R. Soc. Interface 11, 20140841 (2014)CrossRef
17.
Zurück zum Zitat Montemurro, M., Zanette, D.: Coherent oscillations in word-use data from 1700 to 2008. Palgrave Commun. 2, 16084 (2016)CrossRef Montemurro, M., Zanette, D.: Coherent oscillations in word-use data from 1700 to 2008. Palgrave Commun. 2, 16084 (2016)CrossRef
18.
Zurück zum Zitat Bochkarev, V., Maslennikova, Yu., Svetovidov, A.: Semantic similarity and analysis of the word frequency dynamics. J. Phys. Conf. Ser. 936(1), 012067 (2017)CrossRef Bochkarev, V., Maslennikova, Yu., Svetovidov, A.: Semantic similarity and analysis of the word frequency dynamics. J. Phys. Conf. Ser. 936(1), 012067 (2017)CrossRef
19.
Zurück zum Zitat Gikhman, I., Skorokhod, A.: Introduction to the Theory of Random Processes. Dover Publications, New York (1996)MATH Gikhman, I., Skorokhod, A.: Introduction to the Theory of Random Processes. Dover Publications, New York (1996)MATH
20.
Zurück zum Zitat Cocho, G., Flores, J., Gershenson, C., Pineda, C., Sánchez, S.: Rank diversity of languages: generic behavior in computational linguistics. PLoS ONE 10(4), e0121898 (2015)CrossRef Cocho, G., Flores, J., Gershenson, C., Pineda, C., Sánchez, S.: Rank diversity of languages: generic behavior in computational linguistics. PLoS ONE 10(4), e0121898 (2015)CrossRef
21.
Zurück zum Zitat Janda, L., Lyashevskaya, O.: Grammatical profiles and the interaction of the lexicon with aspect, tense, and mood in Russian. Cogn. Linguist. 22(4), 719–763 (2011)CrossRef Janda, L., Lyashevskaya, O.: Grammatical profiles and the interaction of the lexicon with aspect, tense, and mood in Russian. Cogn. Linguist. 22(4), 719–763 (2011)CrossRef
22.
Zurück zum Zitat Janda, L., Solovyev, V.: What constructional profiles reveal about synonymy: a case study of Russian words for sadness and happiness. Cogn. Linguist. 20(2), 367–393 (2009)CrossRef Janda, L., Solovyev, V.: What constructional profiles reveal about synonymy: a case study of Russian words for sadness and happiness. Cogn. Linguist. 20(2), 367–393 (2009)CrossRef
23.
Zurück zum Zitat Gries, S., Divjak, D.: Behavioral profiles: a corpus-based approach towards cognitive semantic analysis. In: Evans, V., Pourcel, S. (eds.) New Directions in Cognitive Linguistics, pp. 57–75. John Benjamins, Amsterdam (2009)CrossRef Gries, S., Divjak, D.: Behavioral profiles: a corpus-based approach towards cognitive semantic analysis. In: Evans, V., Pourcel, S. (eds.) New Directions in Cognitive Linguistics, pp. 57–75. John Benjamins, Amsterdam (2009)CrossRef
24.
Zurück zum Zitat Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef
25.
Zurück zum Zitat Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google Books Ngram Corpus. In: Li, H., Lin, C.-Y., Osborne, M., Lee, G.G., Park, J.C. (eds.) Proceedings of the Conference on 50th Annual Meeting of the Association for Computational Linguistics 2012, Jeju Island, Korea, vol. 2, pp. 238–242. Association for Computational Linguistics (2012) Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google Books Ngram Corpus. In: Li, H., Lin, C.-Y., Osborne, M., Lee, G.G., Park, J.C. (eds.) Proceedings of the Conference on 50th Annual Meeting of the Association for Computational Linguistics 2012, Jeju Island, Korea, vol. 2, pp. 238–242. Association for Computational Linguistics (2012)
26.
Zurück zum Zitat Gordon, A., Ford, R.: Sputnik khimika. Mir, Moskow (1979) Gordon, A., Ford, R.: Sputnik khimika. Mir, Moskow (1979)
27.
Zurück zum Zitat Yada, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 2145–2158. Association for Computational Linguistics (2018) Yada, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 2145–2158. Association for Computational Linguistics (2018)
Metadaten
Titel
A Method of Semantic Change Detection Using Diachronic Corpora Data
verfasst von
Vladimir Bochkarev
Anna Shevlyakova
Valery Solovyev
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-39575-9_10