Skip to main content
Top

2017 | OriginalPaper | Chapter

Using Time Series Analysis for Estimating the Time Stamp of a Text

Authors : Costin-Gabriel Chiru, Madalina Toia

Published in: Advances in Time Series Analysis and Forecasting

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Language is constantly changing, with words being created or disappearing over time. Moreover, the usage of different words tends to fluctuate due to influences from different fields, such as historical events, cultural movements or scientific discoveries. These changes are reflected in the written texts and thus, by tracking them, one can determine the moment when these texts were written. In this paper, we present an application based on time series analysis built on top of the Google Books N-gram corpus to determine the time stamp of different written texts. The application is using two heuristics: words’ fingerprinting, to find the time interval when they were most probable used, and words’ importance for the given text, to weight the influence of words’ fingerprinting for estimating the text time stamp. Combining these two heuristics allows time stamping of that text.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall (2000) Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall (2000)
2.
go back to reference Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef
3.
go back to reference Fromkin, V., Robert, R., Hyams, N.: An Introduction to Language, 7th edn. Thomson Wadswor (2003) Fromkin, V., Robert, R., Hyams, N.: An Introduction to Language, 7th edn. Thomson Wadswor (2003)
4.
go back to reference Wijaya, D.T., Yeniterzi, R.: Understanding semantic change of words over centuries. In: DETECT’11, pp. 35–40 (2011) Wijaya, D.T., Yeniterzi, R.: Understanding semantic change of words over centuries. In: DETECT’11, pp. 35–40 (2011)
5.
go back to reference Mitra, S., Mitra, R., Riedl, M., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: automatic identification of word sense change across different timescales. In: 52nd ACL, pp. 1020–1029 (2014) Mitra, S., Mitra, R., Riedl, M., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: automatic identification of word sense change across different timescales. In: 52nd ACL, pp. 1020–1029 (2014)
6.
go back to reference Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012)CrossRef Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012)CrossRef
7.
go back to reference Garcia-Fernandez, A., Ligozat, A.-L., Dinarelli, M., Bernhard, D.: When was it written? Automatically determining publication dates. In: String Processing and Information Retrieval, pp. 221–236 (2011) Garcia-Fernandez, A., Ligozat, A.-L., Dinarelli, M., Bernhard, D.: When was it written? Automatically determining publication dates. In: String Processing and Information Retrieval, pp. 221–236 (2011)
8.
go back to reference de Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Proceedings of the AHC’05, pp. 161–168 (2005) de Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Proceedings of the AHC’05, pp. 161–168 (2005)
9.
go back to reference Szymanski, T., Lynch, G.: UCD: diachronic text classification with character, word, and syntactic N-grams. In: SemEval 2015, 879–883 (2015) Szymanski, T., Lynch, G.: UCD: diachronic text classification with character, word, and syntactic N-grams. In: SemEval 2015, 879–883 (2015)
11.
go back to reference Rubner, Y., Tomasi, C., Guibas, L. J.: A metric for distributions with applications to image databases. In: Computer Vision and Image Understanding, pp. 86–109 (2004) Rubner, Y., Tomasi, C., Guibas, L. J.: A metric for distributions with applications to image databases. In: Computer Vision and Image Understanding, pp. 86–109 (2004)
12.
go back to reference Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2001)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2001)MATH
Metadata
Title
Using Time Series Analysis for Estimating the Time Stamp of a Text
Authors
Costin-Gabriel Chiru
Madalina Toia
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-55789-2_3