Skip to main content
Top

2019 | OriginalPaper | Chapter

A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham’s Manuscripts

Authors : Alejandro H. Toselli, Verónica Romero, Enrique Vidal, Joan Andreu Sánchez, Louise Seaward, Philip Schofield

Published in: New Trends in Image Analysis and Processing – ICIAP 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Probabilistic indexes (PI) are obtained from untranscribed handwritten text images by means of recently introduced lexicon-free, query-by-string, probabilistic keyword spotting techniques. PIs have proven to be a powerful tool that allow efficient, free textual searching in very large collections of handwritten historical documents. PIs convey uncertain information about the textual contents of the document images. However, text uncertainty is accurately modeled by the associated lexical probability distributions, which can be conveniently exploited in many applications. As an example of these applications, here we study the dating of a number of English neologisms in the large collection of Bentham’s manuscripts, which encompass \(90\,000\) images. The statistical techniques used for neologism dating are theoretically motivated and experiments on this collection are reported. Among other interesting contributions of this study, it provides sound evidence that some commonly assumed neologism introduction dates need to be revised.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Bluche, T., et al.: Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: Proceedings of ICDAR, vol. 01, pp. 311–316 (2017) Bluche, T., et al.: Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: Proceedings of ICDAR, vol. 01, pp. 311–316 (2017)
2.
go back to reference Fischer, A., Frinken, V., Bunke, H., Suen, C.: Improving HMM-based keyword spotting with character language models. In: Proceedings of ICDAR, pp. 506–510, August 2013 Fischer, A., Frinken, V., Bunke, H., Suen, C.: Improving HMM-based keyword spotting with character language models. In: Proceedings of ICDAR, pp. 506–510, August 2013
3.
go back to reference Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn. Lett. 33(7), 934–942 (2012)CrossRef Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn. Lett. 33(7), 934–942 (2012)CrossRef
4.
go back to reference Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intel. 34(2), 211–224 (2012)CrossRef Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intel. 34(2), 211–224 (2012)CrossRef
5.
go back to reference Lang, E., Puigcerver, J., Toselli, A.H., Vidal, E.: Probabilistic indexing and search for information extraction on handwritten German Parish records. In: Proceedings of ICFHR, pp. 44–49, August 2018 Lang, E., Puigcerver, J., Toselli, A.H., Vidal, E.: Probabilistic indexing and search for information extraction on handwritten German Parish records. In: Proceedings of ICFHR, pp. 44–49, August 2018
6.
go back to reference Lin, Y., Michel, J., Lieberman, E., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books NGram corpus. In: Proceedings of ACL, vol. 2, pp. 169–174 (2012) Lin, Y., Michel, J., Lieberman, E., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books NGram corpus. In: Proceedings of ACL, vol. 2, pp. 169–174 (2012)
7.
go back to reference Michel, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef Michel, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef
8.
go back to reference Puigcerver, J., Toselli, A., Vidal, E.: Word-graph and character-lattice combination for KWS in handwritten documents. In: Proceedings of ICFHR, pp. 181–186 (2014) Puigcerver, J., Toselli, A., Vidal, E.: Word-graph and character-lattice combination for KWS in handwritten documents. In: Proceedings of ICFHR, pp. 181–186 (2014)
9.
go back to reference Puigcerver, J., Vidal, E., Toselli, A.H.: Probabilistic interpretation and improvements to the HMM-filler for handwritten keyword spotting. In: Proceedings of ICDAR, pp. 731–735 (2015) Puigcerver, J., Vidal, E., Toselli, A.H.: Probabilistic interpretation and improvements to the HMM-filler for handwritten keyword spotting. In: Proceedings of ICDAR, pp. 731–735 (2015)
10.
go back to reference Puigcerver, J.: A probabilistic formulation of keyword spotting. Ph.D. thesis, Universitat Politècnica de València (2018) Puigcerver, J.: A probabilistic formulation of keyword spotting. Ph.D. thesis, Universitat Politècnica de València (2018)
11.
go back to reference Toselli, A., Puigcerver, J., Vidal, E.: Two methods to improve confidence scores for Lexicon-free word spotting in handwritten text. In: Proceedings of ICFHR, pp. 349–354 (2016) Toselli, A., Puigcerver, J., Vidal, E.: Two methods to improve confidence scores for Lexicon-free word spotting in handwritten text. In: Proceedings of ICFHR, pp. 349–354 (2016)
12.
go back to reference Toselli, A.H., Puigcerver, J., Vidal, E.: Context-aware lattice based filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR, pp. 736–740, August 2015 Toselli, A.H., Puigcerver, J., Vidal, E.: Context-aware lattice based filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR, pp. 736–740, August 2015
13.
go back to reference Toselli, A.H., Vidal, E., Puigcerver, J., Noya-García, E.: Probabilistic multi-word spotting in handwritten text images. Pattern Anal. Appl. 22(1), 23–32 (2019)MathSciNetCrossRef Toselli, A.H., Vidal, E., Puigcerver, J., Noya-García, E.: Probabilistic multi-word spotting in handwritten text images. Pattern Anal. Appl. 22(1), 23–32 (2019)MathSciNetCrossRef
14.
go back to reference Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef
15.
go back to reference Toselli, A.H., Vidal, E.: Fast HMM-filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR (2013) Toselli, A.H., Vidal, E.: Fast HMM-filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR (2013)
16.
go back to reference Toselli, A.H., Romero, V., Vidal, E., Sánchez, J.A.: Making two vast historical manuscript collections searchable and extracting meaningful textual features through large-scale probabilistic indexing. In: Proceedings of ICDAR (2019) Toselli, A.H., Romero, V., Vidal, E., Sánchez, J.A.: Making two vast historical manuscript collections searchable and extracting meaningful textual features through large-scale probabilistic indexing. In: Proceedings of ICDAR (2019)
Metadata
Title
A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham’s Manuscripts
Authors
Alejandro H. Toselli
Verónica Romero
Enrique Vidal
Joan Andreu Sánchez
Louise Seaward
Philip Schofield
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-30754-7_10

Premium Partner