Skip to main content
Top
Published in: International Journal on Digital Libraries 3/2020

20-05-2019

Improving semantic change analysis by combining word embeddings and word frequencies

Authors: Adrian Englhardt, Jens Willkomm, Martin Schäler, Klemens Böhm

Published in: International Journal on Digital Libraries | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Language is constantly evolving. As part of diachronic linguistics, semantic change analysis examines how the meanings of words evolve over time. Such semantic awareness is important to retrieve content from digital libraries. Recent research on semantic change analysis relying on word embeddings has yielded significant improvements over previous work. However, a recent, but somewhat neglected observation so far is that the rate of semantic shift negatively correlates with word-usage frequency. In this article, we therefore propose SCAF, Semantic Change Analysis with Frequency. It abstracts from the concrete embeddings and includes word frequencies as an orthogonal feature. SCAF allows using different combinations of embedding type, optimization algorithm and alignment method. Additionally, we leverage existing approaches for time series analysis, by using change detection methods to identify semantic shifts. In an evaluation with a realistic setup, SCAF achieves better detection rates than prior approaches, 95% instead of 51%. On the Google Books Ngram data set, our approach detects both known and yet unknown shifts for popular words.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Ravenio Books, Cambridge (2016) Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Ravenio Books, Cambridge (2016)
2.
go back to reference Schatz, B.R.: Information retrieval in digital libraries: bringing search to the net. Science 275(5298), 327–334 (1997)CrossRef Schatz, B.R.: Information retrieval in digital libraries: bringing search to the net. Science 275(5298), 327–334 (1997)CrossRef
3.
go back to reference Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef
4.
go back to reference Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: ACL, vol. 1, pp. 1489–1501 (2016) Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: ACL, vol. 1, pp. 1489–1501 (2016)
5.
go back to reference Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: WWW, pp. 625–635 (2015) Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: WWW, pp. 625–635 (2015)
6.
go back to reference Kim, Y., Chiu, Y.I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. In: ACL, pp. 61–65 (2014) Kim, Y., Chiu, Y.I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. In: ACL, pp. 61–65 (2014)
7.
go back to reference Jatowt, A., Duh, K.: A framework for analyzing semantic change of words across time. In: IJDL, pp. 229–238 (2014) Jatowt, A., Duh, K.: A framework for analyzing semantic change of words across time. In: IJDL, pp. 229–238 (2014)
8.
go back to reference Basile, P., Caputo, A., Semeraro, G.: Temporal random indexing: a system for analysing word meaning over time. Ital. J. Comput. Linguist. 1(1), 55–68 (2015) Basile, P., Caputo, A., Semeraro, G.: Temporal random indexing: a system for analysing word meaning over time. Ital. J. Comput. Linguist. 1(1), 55–68 (2015)
9.
go back to reference Phillips, L., Shaffer, K., Arendt, D., Hodas, N., Volkova, S.: Intrinsic and extrinsic evaluation of spatiotemporal text representations in twitter streams. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 201–210 (2017) Phillips, L., Shaffer, K., Arendt, D., Hodas, N., Volkova, S.: Intrinsic and extrinsic evaluation of spatiotemporal text representations in twitter streams. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 201–210 (2017)
10.
go back to reference Basile, P., Caputo, A., Semeraro, G.: Temporal random indexing: a tool for analysing word meaning variations in news. In: ECIR, pp. 39–41 (2016) Basile, P., Caputo, A., Semeraro, G.: Temporal random indexing: a tool for analysing word meaning variations in news. In: ECIR, pp. 39–41 (2016)
11.
go back to reference Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: WSDM, pp. 673–681 (2018) Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: WSDM, pp. 673–681 (2018)
12.
go back to reference Kendall, D.G.: Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded Markov chain. Ann. Math. Stat. 3(6), 338–354 (1953)MathSciNetMATHCrossRef Kendall, D.G.: Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded Markov chain. Ann. Math. Stat. 3(6), 338–354 (1953)MathSciNetMATHCrossRef
13.
go back to reference Zhang, Y., Jatowt, A., Bhowmick, S.S., Tanaka, K.: The past is not a foreign country: detecting semantically similar terms across time. TKDE 28(10), 2793–2807 (2016) Zhang, Y., Jatowt, A., Bhowmick, S.S., Tanaka, K.: The past is not a foreign country: detecting semantically similar terms across time. TKDE 28(10), 2793–2807 (2016)
14.
go back to reference Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? Comparing two computational measures of semantic change. In: EMNLP, pp. 2116–2121 (2016) Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? Comparing two computational measures of semantic change. In: EMNLP, pp. 2116–2121 (2016)
15.
go back to reference Basseville, M., Nikiforov, I.V.: Others: Detection of abrupt changes: theory and application, vol. 104. Prentice-Hall, Inc, Englewood Cliffs (1993)MATH Basseville, M., Nikiforov, I.V.: Others: Detection of abrupt changes: theory and application, vol. 104. Prentice-Hall, Inc, Englewood Cliffs (1993)MATH
17.
go back to reference Ghanbarnejad, F., Gerlach, M., Miotto, J.M., Altmann, E.G.: Extracting information from s-curves of language change. J. R. Soc. Interface 11(101), 20141044 (2014)CrossRef Ghanbarnejad, F., Gerlach, M., Miotto, J.M., Altmann, E.G.: Extracting information from s-curves of language change. J. R. Soc. Interface 11(101), 20141044 (2014)CrossRef
18.
go back to reference Piantadosi, S.T.: Zipf’s word frequency law in natural language: a critical review and future directions. Psychon. Bull. Rev. 21(5), 1112–1130 (2014)CrossRef Piantadosi, S.T.: Zipf’s word frequency law in natural language: a critical review and future directions. Psychon. Bull. Rev. 21(5), 1112–1130 (2014)CrossRef
19.
go back to reference Krishnamoorthy, N., Malkarnenkar, G., Mooney, R., Saenko, K., Guadarrama, S.: Generating natural-language video descriptions using text-mined knowledge. In: AAAI, pp. 541–547 (2013) Krishnamoorthy, N., Malkarnenkar, G., Mooney, R., Saenko, K., Guadarrama, S.: Generating natural-language video descriptions using text-mined knowledge. In: AAAI, pp. 541–547 (2013)
20.
go back to reference Bassil, Y., Alwani, M.: Ocr post-processing error correction algorithm using Google’s online spelling suggestion. J. Emerg. Trends Comput. Inf. Sci. 3(1), 90–99 (2012) Bassil, Y., Alwani, M.: Ocr post-processing error correction algorithm using Google’s online spelling suggestion. J. Emerg. Trends Comput. Inf. Sci. 3(1), 90–99 (2012)
21.
go back to reference Nazar, R., Renau, I.: Google books n-gram corpus used as a grammar checker. In: EACL, pp. 27–34 (2012) Nazar, R., Renau, I.: Google books n-gram corpus used as a grammar checker. In: EACL, pp. 27–34 (2012)
22.
go back to reference Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013) Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
23.
go back to reference Pennington, J., Socher, R., Manning, C.D.: GloVe: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
24.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS pp. 3111–3119 (2013)
25.
go back to reference Muromägi, A., Sirts, K., Laur, S.: Linear ensembles of word embedding models. In: Proceedings of the 21st Nordic Conference on Computational Linguistics, pp. 96–104 (2017) Muromägi, A., Sirts, K., Laur, S.: Linear ensembles of word embedding models. In: Proceedings of the 21st Nordic Conference on Computational Linguistics, pp. 96–104 (2017)
26.
27.
go back to reference Gladkova, A., Drozd, A.: Intrinsic evaluations of word embeddings: What can we do better? In: ACL, pp. 36–42 (2016) Gladkova, A., Drozd, A.: Intrinsic evaluations of word embeddings: What can we do better? In: ACL, pp. 36–42 (2016)
28.
go back to reference Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: EMNLP, pp. 298–307 (2015) Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: EMNLP, pp. 298–307 (2015)
29.
go back to reference Hellrich, J., Hahn, U.: An assessment of experimental protocols for tracing changes in word semantics relative to accuracy and reliability. In: SIGHUM, pp. 111–117 (2016) Hellrich, J., Hahn, U.: An assessment of experimental protocols for tracing changes in word semantics relative to accuracy and reliability. In: SIGHUM, pp. 111–117 (2016)
30.
go back to reference Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL 3, 211–225 (2015) Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL 3, 211–225 (2015)
31.
go back to reference Elekes, Á., Englhardt, A., Schäler, M., Böhm, K.: Toward meaningful notions of similarity in nlp embedding models. IJDL 18, 1–20 (2018) Elekes, Á., Englhardt, A., Schäler, M., Böhm, K.: Toward meaningful notions of similarity in nlp embedding models. IJDL 18, 1–20 (2018)
32.
go back to reference Elekes, A., Englhardt, A., Schäler, M., Böhm, K.: Resources to examine the quality of word embedding models trained on n-gram data. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 423–432 (2018) Elekes, A., Englhardt, A., Schäler, M., Böhm, K.: Resources to examine the quality of word embedding models trained on n-gram data. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 423–432 (2018)
33.
go back to reference Blank, A.: Why do new meanings occur? A cognitive typology of the motivations for lexical semantic change. Hist. Seman. Cognit. 13, 61–89 (1999) Blank, A.: Why do new meanings occur? A cognitive typology of the motivations for lexical semantic change. Hist. Seman. Cognit. 13, 61–89 (1999)
34.
go back to reference Traugott, E.C., Dasher, R.B.: Regularities in Semantic Change. Cambridge University Press, Cambridge (2001)CrossRef Traugott, E.C., Dasher, R.B.: Regularities in Semantic Change. Cambridge University Press, Cambridge (2001)CrossRef
35.
go back to reference Hopper, P.J., Traugott, E.C.: Grammaticalization. Cambridge University Press, Cambridge (2003)CrossRef Hopper, P.J., Traugott, E.C.: Grammaticalization. Cambridge University Press, Cambridge (2003)CrossRef
36.
go back to reference Bréal, M.: Essai de sémantique: (Science des Significations). Hachette, New York (1904) Bréal, M.: Essai de sémantique: (Science des Significations). Hachette, New York (1904)
37.
go back to reference Ullmann, S.: Semantics: An Introduction to the Science of Meaning. Barnes & Noble, New York (1962) Ullmann, S.: Semantics: An Introduction to the Science of Meaning. Barnes & Noble, New York (1962)
38.
go back to reference Traugott, E.C.: On the rise of epistemic meanings in English: an example of subjectification in semantic change. Language 65, 31–55 (1989)CrossRef Traugott, E.C.: On the rise of epistemic meanings in English: an example of subjectification in semantic change. Language 65, 31–55 (1989)CrossRef
39.
go back to reference Durie, M., Ross, M.: The Comparative Method Reviewed: Regularity and Irregularity in Language Change. Oxford University Press, Oxford (1996) Durie, M., Ross, M.: The Comparative Method Reviewed: Regularity and Irregularity in Language Change. Oxford University Press, Oxford (1996)
40.
go back to reference Lin, Y., Michel, J.B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the google books ngram corpus. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8–14 July 2012, pp. 169–174. Association for Computational Linguistics (2012) Lin, Y., Michel, J.B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the google books ngram corpus. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8–14 July 2012, pp. 169–174. Association for Computational Linguistics (2012)
41.
go back to reference Gulordava, K., Baroni, M.: A distributional similarity approach to the detection of semantic change in the Google books ngram corpus. In: GEMS, pp. 67–71 (2011) Gulordava, K., Baroni, M.: A distributional similarity approach to the detection of semantic change in the Google books ngram corpus. In: GEMS, pp. 67–71 (2011)
42.
go back to reference van Aggelen, A., Hollink, L., van Ossenbruggen, J.: Combining distributional semantics and structured data to study lexical change. In: EKAW, pp. 40–49 (2016) van Aggelen, A., Hollink, L., van Ossenbruggen, J.: Combining distributional semantics and structured data to study lexical change. In: EKAW, pp. 40–49 (2016)
43.
go back to reference Del, Tredici, M., Nissim, M., Zaninello, A.: Tracing metaphors in time through self-distance in vector spaces (2016). arXiv preprint arXiv:161103279 Del, Tredici, M., Nissim, M., Zaninello, A.: Tracing metaphors in time through self-distance in vector spaces (2016). arXiv preprint arXiv:​161103279
44.
go back to reference Basile, P., Caputo, A., Luisi, R., Semeraro, G.: Diachronic analysis of the Italian language exploiting Google ngram. In: CLiC-it, pp. 56–60 (2016) Basile, P., Caputo, A., Luisi, R., Semeraro, G.: Diachronic analysis of the Italian language exploiting Google ngram. In: CLiC-it, pp. 56–60 (2016)
45.
go back to reference Takamura, H., Nagata, R., Kawasaki, Y.: Analyzing semantic change in japanese loanwords. In: EACL, vol. 1, pp. 1195–1204 (2017) Takamura, H., Nagata, R., Kawasaki, Y.: Analyzing semantic change in japanese loanwords. In: EACL, vol. 1, pp. 1195–1204 (2017)
46.
go back to reference Tahmasebi, N., Gossen, G., Kanhabua, N., Holzmann, H., Risse, T.: Neer: An unsupervised method for named entity evolution recognition. In: COLING, pp. 2553–2568 (2012) Tahmasebi, N., Gossen, G., Kanhabua, N., Holzmann, H., Risse, T.: Neer: An unsupervised method for named entity evolution recognition. In: COLING, pp. 2553–2568 (2012)
47.
go back to reference Rehurek, R., Sojka, P.: software framework for topic modelling with large corpora. In: LREC, pp. 45–50 (2010) Rehurek, R., Sojka, P.: software framework for topic modelling with large corpora. In: LREC, pp. 45–50 (2010)
Metadata
Title
Improving semantic change analysis by combining word embeddings and word frequencies
Authors
Adrian Englhardt
Jens Willkomm
Martin Schäler
Klemens Böhm
Publication date
20-05-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 3/2020
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-019-00271-6

Other articles of this Issue 3/2020

International Journal on Digital Libraries 3/2020 Go to the issue

Premium Partner