Skip to main content

2019 | OriginalPaper | Buchkapitel

A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham’s Manuscripts

verfasst von : Alejandro H. Toselli, Verónica Romero, Enrique Vidal, Joan Andreu Sánchez, Louise Seaward, Philip Schofield

Erschienen in: New Trends in Image Analysis and Processing – ICIAP 2019

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Probabilistic indexes (PI) are obtained from untranscribed handwritten text images by means of recently introduced lexicon-free, query-by-string, probabilistic keyword spotting techniques. PIs have proven to be a powerful tool that allow efficient, free textual searching in very large collections of handwritten historical documents. PIs convey uncertain information about the textual contents of the document images. However, text uncertainty is accurately modeled by the associated lexical probability distributions, which can be conveniently exploited in many applications. As an example of these applications, here we study the dating of a number of English neologisms in the large collection of Bentham’s manuscripts, which encompass \(90\,000\) images. The statistical techniques used for neologism dating are theoretically motivated and experiments on this collection are reported. Among other interesting contributions of this study, it provides sound evidence that some commonly assumed neologism introduction dates need to be revised.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Bluche, T., et al.: Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: Proceedings of ICDAR, vol. 01, pp. 311–316 (2017) Bluche, T., et al.: Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: Proceedings of ICDAR, vol. 01, pp. 311–316 (2017)
2.
Zurück zum Zitat Fischer, A., Frinken, V., Bunke, H., Suen, C.: Improving HMM-based keyword spotting with character language models. In: Proceedings of ICDAR, pp. 506–510, August 2013 Fischer, A., Frinken, V., Bunke, H., Suen, C.: Improving HMM-based keyword spotting with character language models. In: Proceedings of ICDAR, pp. 506–510, August 2013
3.
Zurück zum Zitat Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn. Lett. 33(7), 934–942 (2012)CrossRef Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn. Lett. 33(7), 934–942 (2012)CrossRef
4.
Zurück zum Zitat Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intel. 34(2), 211–224 (2012)CrossRef Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intel. 34(2), 211–224 (2012)CrossRef
5.
Zurück zum Zitat Lang, E., Puigcerver, J., Toselli, A.H., Vidal, E.: Probabilistic indexing and search for information extraction on handwritten German Parish records. In: Proceedings of ICFHR, pp. 44–49, August 2018 Lang, E., Puigcerver, J., Toselli, A.H., Vidal, E.: Probabilistic indexing and search for information extraction on handwritten German Parish records. In: Proceedings of ICFHR, pp. 44–49, August 2018
6.
Zurück zum Zitat Lin, Y., Michel, J., Lieberman, E., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books NGram corpus. In: Proceedings of ACL, vol. 2, pp. 169–174 (2012) Lin, Y., Michel, J., Lieberman, E., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books NGram corpus. In: Proceedings of ACL, vol. 2, pp. 169–174 (2012)
7.
Zurück zum Zitat Michel, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef Michel, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRef
8.
Zurück zum Zitat Puigcerver, J., Toselli, A., Vidal, E.: Word-graph and character-lattice combination for KWS in handwritten documents. In: Proceedings of ICFHR, pp. 181–186 (2014) Puigcerver, J., Toselli, A., Vidal, E.: Word-graph and character-lattice combination for KWS in handwritten documents. In: Proceedings of ICFHR, pp. 181–186 (2014)
9.
Zurück zum Zitat Puigcerver, J., Vidal, E., Toselli, A.H.: Probabilistic interpretation and improvements to the HMM-filler for handwritten keyword spotting. In: Proceedings of ICDAR, pp. 731–735 (2015) Puigcerver, J., Vidal, E., Toselli, A.H.: Probabilistic interpretation and improvements to the HMM-filler for handwritten keyword spotting. In: Proceedings of ICDAR, pp. 731–735 (2015)
10.
Zurück zum Zitat Puigcerver, J.: A probabilistic formulation of keyword spotting. Ph.D. thesis, Universitat Politècnica de València (2018) Puigcerver, J.: A probabilistic formulation of keyword spotting. Ph.D. thesis, Universitat Politècnica de València (2018)
11.
Zurück zum Zitat Toselli, A., Puigcerver, J., Vidal, E.: Two methods to improve confidence scores for Lexicon-free word spotting in handwritten text. In: Proceedings of ICFHR, pp. 349–354 (2016) Toselli, A., Puigcerver, J., Vidal, E.: Two methods to improve confidence scores for Lexicon-free word spotting in handwritten text. In: Proceedings of ICFHR, pp. 349–354 (2016)
12.
Zurück zum Zitat Toselli, A.H., Puigcerver, J., Vidal, E.: Context-aware lattice based filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR, pp. 736–740, August 2015 Toselli, A.H., Puigcerver, J., Vidal, E.: Context-aware lattice based filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR, pp. 736–740, August 2015
13.
Zurück zum Zitat Toselli, A.H., Vidal, E., Puigcerver, J., Noya-García, E.: Probabilistic multi-word spotting in handwritten text images. Pattern Anal. Appl. 22(1), 23–32 (2019)MathSciNetCrossRef Toselli, A.H., Vidal, E., Puigcerver, J., Noya-García, E.: Probabilistic multi-word spotting in handwritten text images. Pattern Anal. Appl. 22(1), 23–32 (2019)MathSciNetCrossRef
14.
Zurück zum Zitat Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef
15.
Zurück zum Zitat Toselli, A.H., Vidal, E.: Fast HMM-filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR (2013) Toselli, A.H., Vidal, E.: Fast HMM-filler approach for key word spotting in handwritten documents. In: Proceedings of ICDAR (2013)
16.
Zurück zum Zitat Toselli, A.H., Romero, V., Vidal, E., Sánchez, J.A.: Making two vast historical manuscript collections searchable and extracting meaningful textual features through large-scale probabilistic indexing. In: Proceedings of ICDAR (2019) Toselli, A.H., Romero, V., Vidal, E., Sánchez, J.A.: Making two vast historical manuscript collections searchable and extracting meaningful textual features through large-scale probabilistic indexing. In: Proceedings of ICDAR (2019)
Metadaten
Titel
A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham’s Manuscripts
verfasst von
Alejandro H. Toselli
Verónica Romero
Enrique Vidal
Joan Andreu Sánchez
Louise Seaward
Philip Schofield
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-30754-7_10