Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 3/2018

14.02.2018 | Special Issue Paper

Attribute CNNs for word spotting in handwritten documents

verfasst von: Sebastian Sudholt, Gernot A. Fink

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation (Almazán et al. in IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566, 2014). At their time, this influential method defined the state of the art in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with convolutional neural networks(CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functions for binary and real-valued word string embeddings. In addition, we propose two different CNN architectures, specifically designed for word spotting. These architectures are able to be trained in an end-to-end fashion. In a number of experiments, we investigate the influence of different word string embeddings and optimization strategies. We show our attribute CNNs to achieve state-of-the-art results for segmentation-based word spotting on a large variety of data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: International Conference on Database Theory, pp. 420–434 (2001) Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: International Conference on Database Theory, pp. 420–434 (2001)
2.
Zurück zum Zitat Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 511–515 (2013) Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 511–515 (2013)
3.
Zurück zum Zitat Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)CrossRef Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)CrossRef
4.
Zurück zum Zitat Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: conjoined triple deep network for learning local image descriptors. arXiv (2016) Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: conjoined triple deep network for learning local image descriptors. arXiv (2016)
5.
Zurück zum Zitat Chollet, F.: Information-theoretical label embeddings for large-scale image classification. arXiv (2016) Chollet, F.: Information-theoretical label embeddings for large-scale image classification. arXiv (2016)
7.
Zurück zum Zitat Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78 (2012)CrossRef Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78 (2012)CrossRef
8.
Zurück zum Zitat Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetMATH Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetMATH
9.
Zurück zum Zitat Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Computer Vision and Pattern Recognition, pp. 1778–1785. Miami (2009) Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Computer Vision and Pattern Recognition, pp. 1778–1785. Miami (2009)
10.
Zurück zum Zitat Fischer, A., Keller, A., Frinken, V., Bunke, H.: HMM-based word spotting in handwritten documents using subword models. In: Proceedings of the International Conference on Pattern Recognition, pp. 3416–3419 (2010) Fischer, A., Keller, A., Frinken, V., Bunke, H.: HMM-based word spotting in handwritten documents using subword models. In: Proceedings of the International Conference on Pattern Recognition, pp. 3416–3419 (2010)
11.
Zurück zum Zitat Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34, 211–224 (2012)CrossRef Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34, 211–224 (2012)CrossRef
12.
Zurück zum Zitat Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the International Conference on Machine Learning, pp. 1050–1059. New York City (2016) Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the International Conference on Machine Learning, pp. 1050–1059. New York City (2016)
13.
Zurück zum Zitat Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68, 310–332 (2017)CrossRef Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68, 310–332 (2017)CrossRef
14.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)
15.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014) He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)
16.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the International Conference on Computer Vision, pp. 1026–1034 (2015) He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the International Conference on Computer Vision, pp. 1026–1034 (2015)
17.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016)
18.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Neural Information Processing Systems. Montreal (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Neural Information Processing Systems. Montreal (2014)
19.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., Eecs, U.C.B.: Caffe: convolutional architecture for fast feature embedding. In: ACM Conference on Multimedia, pp. 675–678. Orlando (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., Eecs, U.C.B.: Caffe: convolutional architecture for fast feature embedding. In: ACM Conference on Multimedia, pp. 675–678. Orlando (2014)
20.
Zurück zum Zitat Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4565–4574. Las Vegas (2016) Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4565–4574. Las Vegas (2016)
21.
Zurück zum Zitat Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations. San Diego (2015) Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations. San Diego (2015)
22.
Zurück zum Zitat Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: International Conference on Document Analysis and Recognition, pp. 560–564. Washingotn (2013) Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: International Conference on Document Analysis and Recognition, pp. 560–564. Washingotn (2013)
23.
Zurück zum Zitat Kołcz, A., Alspector, J., Augusteijn, M., Carlson, R., Viorel Popescu, G.: A line-oriented approach to word spotting in handwritten documents. Pattern Anal. Appl. 3(2), 154–168 (2000) Kołcz, A., Alspector, J., Augusteijn, M., Carlson, R., Viorel Popescu, G.: A line-oriented approach to word spotting in handwritten documents. Pattern Anal. Appl. 3(2), 154–168 (2000)
24.
Zurück zum Zitat Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten Text. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 289–294 (2016) Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten Text. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 289–294 (2016)
25.
Zurück zum Zitat Krishnan, P., Jawahar, C.: Matching handwritten document images. In: European Conference on Computer Vision. Amsterdam (2016) Krishnan, P., Jawahar, C.: Matching handwritten document images. In: European Conference on Computer Vision. Amsterdam (2016)
26.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105. Montreal (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105. Montreal (2012)
27.
Zurück zum Zitat Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Computer Vision and Pattern Recognition, pp. 951–958. Miami (2009) Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Computer Vision and Pattern Recognition, pp. 951–958. Miami (2009)
28.
Zurück zum Zitat Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRef Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRef
29.
Zurück zum Zitat Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. New York City (2006) Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. New York City (2006)
30.
Zurück zum Zitat LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404. Denver (1990) LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404. Denver (1990)
31.
Zurück zum Zitat Manmatha, R., Han, C., Riseman, E.: Word spotting: a new approach to indexing handwriting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–29 (1996) Manmatha, R., Han, C., Riseman, E.: Word spotting: a new approach to indexing handwriting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–29 (1996)
32.
Zurück zum Zitat Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH
33.
Zurück zum Zitat Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015) Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015)
34.
Zurück zum Zitat Ojala, M., Garriga, G.C.: Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010)MathSciNetMATH Ojala, M., Garriga, G.C.: Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010)MathSciNetMATH
35.
Zurück zum Zitat Pechwitz, M., Maddouri, S., Märgner, V.: IFN/ENIT-database of handwritten Arabic words. Colloque International Francophone sur l’Ecrit et le Document, pp. 1–8 (2002) Pechwitz, M., Maddouri, S., Märgner, V.: IFN/ENIT-database of handwritten Arabic words. Colloque International Francophone sur l’Ecrit et le Document, pp. 1–8 (2002)
36.
Zurück zum Zitat Poznanski, A., Wolf, L.: CNN-N-Gram for Handwriting Word Recognition. In: Computer Vision and Pattern Recognition, pp. 2305–2314. Las Vegas (NV), USA (2016) Poznanski, A., Wolf, L.: CNN-N-Gram for Handwriting Word Recognition. In: Computer Vision and Pattern Recognition, pp. 2305–2314. Las Vegas (NV), USA (2016)
37.
Zurück zum Zitat Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: International Conference on Frontiers in Handwriting Recognition, pp. 613–618. Shenzhen (2016) Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: International Conference on Frontiers in Handwriting Recognition, pp. 613–618. Shenzhen (2016)
38.
Zurück zum Zitat Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recogn. 9, 139–152 (2007)CrossRef Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recogn. 9, 139–152 (2007)CrossRef
39.
Zurück zum Zitat Retsinas, G., Sfikas, G., Gatos, B.: Transferable deep features for keyword spotting. In: Proceedings of the European Signal Processing Conference. Kos Island (2017) Retsinas, G., Sfikas, G., Gatos, B.: Transferable deep features for keyword spotting. In: Proceedings of the European Signal Processing Conference. Kos Island (2017)
40.
Zurück zum Zitat Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2108–2120 (2012)CrossRef Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2108–2120 (2012)CrossRef
41.
Zurück zum Zitat Rodriguez-Serrano, J.A., Perronnin, F.: Label embedding for text recognition. In: British Machine Vision Conference (2013) Rodriguez-Serrano, J.A., Perronnin, F.: Label embedding for text recognition. In: British Machine Vision Conference (2013)
42.
Zurück zum Zitat Romero, V., Fornés, A., Serrano, N., Sánchez, J.A., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46(6), 1658–1669 (2013)CrossRef Romero, V., Fornés, A., Serrano, N., Sánchez, J.A., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46(6), 1658–1669 (2013)CrossRef
43.
Zurück zum Zitat Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: International Conference on Document Analysis and Recognition, pp. 661–665. Nancy (2015) Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: International Conference on Document Analysis and Recognition, pp. 661–665. Nancy (2015)
44.
Zurück zum Zitat Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 1305–1309 (2013) Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 1305–1309 (2013)
45.
Zurück zum Zitat Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: Proceedings of the International Conference on Document Analysis and Recognition. Kyoto (2017) Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: Proceedings of the International Conference on Document Analysis and Recognition. Kyoto (2017)
46.
Zurück zum Zitat Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: International Conference on Document Analysis and Recognition, pp. 63–67. Beijing (2011) Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: International Conference on Document Analysis and Recognition, pp. 63–67. Beijing (2011)
47.
Zurück zum Zitat Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn. 48(2), 545–555 (2015)CrossRef Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn. 48(2), 545–555 (2015)CrossRef
48.
Zurück zum Zitat Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Towards query-by-speech handwritten keyword spotting. In: International Conference on Document Image Analysis, pp. 501–505. Nancy (2015) Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Towards query-by-speech handwritten keyword spotting. In: International Conference on Document Image Analysis, pp. 501–505. Nancy (2015)
49.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef
50.
Zurück zum Zitat Shalizi, C.R.: Advanced Data Analysis from an Elementary Point of View. Cambridge University Press, Cambridge (2013) Shalizi, C.R.: Advanced Data Analysis from an Elementary Point of View. Cambridge University Press, Cambridge (2013)
51.
Zurück zum Zitat Sharma, A., Pramod, S.K.: Adapting off-the-shelf CNNs for word spotting & recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 986–990 (2015) Sharma, A., Pramod, S.K.: Adapting off-the-shelf CNNs for word spotting & recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 986–990 (2015)
52.
Zurück zum Zitat Silberpfennig, A., Wolf, L., Dershowitz, N., Bhagesh, S., Chaudhuri, B.B.: Improving OCR for an under-resourced script using unsupervised word-spotting. In: International Conference on Document Analysis and Recognition, pp. 706–710. Nancy (2015) Silberpfennig, A., Wolf, L., Dershowitz, N., Bhagesh, S., Chaudhuri, B.B.: Improving OCR for an under-resourced script using unsupervised word-spotting. In: International Conference on Document Analysis and Recognition, pp. 706–710. Nancy (2015)
53.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (2015)
54.
Zurück zum Zitat Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Conference on Information and Knowledge Management, pp. 623–632. Lisbon (2007) Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Conference on Information and Knowledge Management, pp. 623–632. Lisbon (2007)
55.
Zurück zum Zitat Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: Proceedings of the International Conference on Learning Representations (2015) Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: Proceedings of the International Conference on Learning Representations (2015)
56.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH
57.
Zurück zum Zitat Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Proceedings of the German Conference on Pattern Recognition, pp. 529–539 (2015) Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Proceedings of the German Conference on Pattern Recognition, pp. 529–539 (2015)
58.
Zurück zum Zitat Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 277–282 (2016) Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 277–282 (2016)
59.
Zurück zum Zitat Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition (2017) Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition (2017)
60.
Zurück zum Zitat Sudholt, S., Gurjar, N., Fink, G.A.: Learning deep representations for word spotting under weak supervision. arXiv (2017) Sudholt, S., Gurjar, N., Fink, G.A.: Learning deep representations for word spotting under weak supervision. arXiv (2017)
61.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., Hill, C., Arbor, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2014) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., Hill, C., Arbor, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2014)
62.
Zurück zum Zitat Tieleman, T., Hinton, G.: Lecture 6.5–RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012) Tieleman, T., Hinton, G.: Lecture 6.5–RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012)
63.
Zurück zum Zitat Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370, 497–518 (2016)CrossRef Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370, 497–518 (2016)CrossRef
64.
Zurück zum Zitat Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 307–312 (2016) Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 307–312 (2016)
Metadaten
Titel
Attribute CNNs for word spotting in handwritten documents
verfasst von
Sebastian Sudholt
Gernot A. Fink
Publikationsdatum
14.02.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 3/2018
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-018-0295-0

Weitere Artikel der Ausgabe 3/2018

International Journal on Document Analysis and Recognition (IJDAR) 3/2018 Zur Ausgabe

Premium Partner