Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 2/2014

01.06.2014 | Original Paper

Efficient multiscale Sauvola’s binarization

verfasst von: Guillaume Lazzara, Thierry Géraud

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work focuses on the most commonly used binarization method: Sauvola’s. It performs relatively well on classical documents, however, three main defects remain: the window parameter of Sauvola’s formula does not fit automatically to the contents, it is not robust to low contrasts, and it is not invariant with respect to contrast inversion. Thus, on documents such as magazines, the contents may not be retrieved correctly, which is crucial for indexing purpose. In this paper, we describe how to implement an efficient multiscale implementation of Sauvola’s algorithm in order to guarantee good binarization for both small and large objects inside a single document without adjusting manually the window size to the contents. We also describe how to implement it in an efficient way, step by step. This algorithm remains notably fast compared to the original one. For fixed parameters, text recognition rates and binarization quality are equal or better than other methods on text with low and medium x-height and are significantly improved on text with large x-height. Pixel-based accuracy and OCR evaluations are performed on more than 120 documents. Compared to awarded methods in the latest binarization contests, Sauvola’s formula does not give the best results on historical documents. On the other hand, on clean magazines, it outperforms those methods. This implementation improves the robustness of Sauvola’s algorithm by making the results almost insensible to the window size whatever the object sizes. Its properties make it usable in full document analysis toolchains.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–165 (2004)CrossRef Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–165 (2004)CrossRef
2.
Zurück zum Zitat Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern 9(1), 62–66 (1979)CrossRefMathSciNet Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern 9(1), 62–66 (1979)CrossRefMathSciNet
3.
Zurück zum Zitat Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985) Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985)
4.
Zurück zum Zitat Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)CrossRef Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)CrossRef
5.
Zurück zum Zitat Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 1005–1014 (2005) Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 1005–1014 (2005)
6.
Zurück zum Zitat Bernsen, J.: Dynamic thresholding of grey-level images. In: Proceedings of the International Conference on, Pattern Recognition, pp. 1251–1255 (1986) Bernsen, J.: Dynamic thresholding of grey-level images. In: Proceedings of the International Conference on, Pattern Recognition, pp. 1251–1255 (1986)
7.
Zurück zum Zitat Gabarra, E., Tabbone, A.: Combining global and local threshold to binarize document of images. In: Pattern Recognition and Image Analysis, vol. 3523 of LNCS, pp. 173–186. Springer, Berlin (2005) Gabarra, E., Tabbone, A.: Combining global and local threshold to binarize document of images. In: Pattern Recognition and Image Analysis, vol. 3523 of LNCS, pp. 173–186. Springer, Berlin (2005)
8.
Zurück zum Zitat Rangoni, Y., Shafait, F., Breuel, T.M.: OCR based thresholding. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 98–101 (2009) Rangoni, Y., Shafait, F., Breuel, T.M.: OCR based thresholding. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 98–101 (2009)
9.
Zurück zum Zitat Tabbone, S., Wendling, L.: Multi-scale binarization of images. Pattern Recogn. Lett. 24(1–3), 403–411 (2003)CrossRef Tabbone, S., Wendling, L.: Multi-scale binarization of images. Pattern Recogn. Lett. 24(1–3), 403–411 (2003)CrossRef
10.
Zurück zum Zitat Farrahi Moghaddam, R., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recogn. 43(6), 2186–2198 (2010)CrossRefMATH Farrahi Moghaddam, R., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recogn. 43(6), 2186–2198 (2010)CrossRefMATH
11.
Zurück zum Zitat Chang, F., Liang, K.-H., Tan, T.-M., Hwang, W.-L.: Binarization of document images using hadamard multiresolution analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 157–160 (1999) Chang, F., Liang, K.-H., Tan, T.-M., Hwang, W.-L.: Binarization of document images using hadamard multiresolution analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 157–160 (1999)
12.
Zurück zum Zitat Bukhari, S.S., Shafait, F., Breuel, T.: Foreground-background regions guided binarization of camera-captured document images. In: Proceedings of the International Workshop on Camera Based Document Analysis and Recognition, 7 (2009) Bukhari, S.S., Shafait, F., Breuel, T.: Foreground-background regions guided binarization of camera-captured document images. In: Proceedings of the International Workshop on Camera Based Document Analysis and Recognition, 7 (2009)
13.
Zurück zum Zitat Lu, S., Su, B., Tan, C.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recogn. 13, 303–314 (2010)CrossRef Lu, S., Su, B., Tan, C.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recogn. 13, 303–314 (2010)CrossRef
14.
Zurück zum Zitat Lelore, T., Bouchara, F.: Super-resolved binarization of text based on the FAIR algorithm. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 839–843 (2011) Lelore, T., Bouchara, F.: Super-resolved binarization of text based on the FAIR algorithm. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 839–843 (2011)
15.
Zurück zum Zitat Lelore, T., Bouchara, F.: FAIR: a fast algorithm for document image restoration. (2013, published) Lelore, T., Bouchara, F.: FAIR: a fast algorithm for document image restoration. (2013, published)
16.
Zurück zum Zitat Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO). In: Proceedings of ICDAR, pp. 1375–1382 (2009) Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO). In: Proceedings of ICDAR, pp. 1375–1382 (2009)
17.
Zurück zum Zitat Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010—handwritten document image binarization competition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 727–732 (2010) Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010—handwritten document image binarization competition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 727–732 (2010)
18.
Zurück zum Zitat Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO). In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1506–1510 (2011) Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO). In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1506–1510 (2011)
19.
Zurück zum Zitat Howe, N.: A laplacian energy for document binarization. In: Proceedings of the IEEE International Conference on Document Analysis and Recognition, pp. 6–10 (2011) Howe, N.: A laplacian energy for document binarization. In: Proceedings of the IEEE International Conference on Document Analysis and Recognition, pp. 6–10 (2011)
20.
Zurück zum Zitat Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV, vol. 6815, p. 681510 (2008) Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV, vol. 6815, p. 681510 (2008)
21.
Zurück zum Zitat Wolf, C., Jolion, J.-M.: Extraction and recognition of artificial text in multimedia documents. Pattern Anal. Appl. 6, 309–326 (2004) Wolf, C., Jolion, J.-M.: Extraction and recognition of artificial text in multimedia documents. Pattern Anal. Appl. 6, 309–326 (2004)
22.
Zurück zum Zitat Kim I.-J. Multi-window binarization of camera image for document recognition. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 323–327 (2004) Kim I.-J. Multi-window binarization of camera image for document recognition. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 323–327 (2004)
23.
Zurück zum Zitat Vincent, L.: Exact Euclidean distance function by chain propagations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 520–525 (1991) Vincent, L.: Exact Euclidean distance function by chain propagations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 520–525 (1991)
24.
Zurück zum Zitat Chassery, J.-M., Montanvert, A.: Geometrical representation of shapes and objects for visual perception. In: Geometric Reasoning for Perception and Action, vol. 708 of LNCS, pp. 163–182. Springer, Berlin (1993) Chassery, J.-M., Montanvert, A.: Geometrical representation of shapes and objects for visual perception. In: Geometric Reasoning for Perception and Action, vol. 708 of LNCS, pp. 163–182. Springer, Berlin (1993)
25.
Zurück zum Zitat Dillencourt, M.B., Samet, H., Tamminen, M.: A general approach to connected-component labeling for arbitrary image representations. J. ACM 39(2), 253–280 (1992)CrossRefMATHMathSciNet Dillencourt, M.B., Samet, H., Tamminen, M.: A general approach to connected-component labeling for arbitrary image representations. J. ACM 39(2), 253–280 (1992)CrossRefMATHMathSciNet
26.
Zurück zum Zitat Morton, G.M.: A computer oriented geodetic data base; and a new technique in file sequencing. Technical report, IBM Company (1966) Morton, G.M.: A computer oriented geodetic data base; and a new technique in file sequencing. Technical report, IBM Company (1966)
27.
Zurück zum Zitat Antonacopoulos, A., Pletschacher, S., Bridson, D., Papadopoulos, C.: ICDAR 2009 page segmentation competition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1370–1374 (2009a) Antonacopoulos, A., Pletschacher, S., Bridson, D., Papadopoulos, C.: ICDAR 2009 page segmentation competition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1370–1374 (2009a)
28.
Zurück zum Zitat Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: Proceedings of IEEE International Conference on Image Processing (2009) Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: Proceedings of IEEE International Conference on Image Processing (2009)
29.
Zurück zum Zitat Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of the IAPR International Workshop on Document Analysis Systems, pp. 159–166 (2010) Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of the IAPR International Workshop on Document Analysis Systems, pp. 159–166 (2010)
30.
Zurück zum Zitat Serra, J.: Toggle mappings. Technical report, CMM, Ecole des Mines, France (1989) Serra, J.: Toggle mappings. Technical report, CMM, Ecole des Mines, France (1989)
31.
Zurück zum Zitat Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 296–300 (2009b) Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 296–300 (2009b)
32.
Zurück zum Zitat Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 813–818 (2102) Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 813–818 (2102)
33.
Zurück zum Zitat Mollah, A.F., Basu, S., Nasipuri, M.: Computationally efficient implementation of convolution-based locally adaptive binarization techniques. In: Wireless Networks and Computational Intelligence, vol. 292 of CCIS pp. 159–168. Springer, Berlin (2012) Mollah, A.F., Basu, S., Nasipuri, M.: Computationally efficient implementation of convolution-based locally adaptive binarization techniques. In: Wireless Networks and Computational Intelligence, vol. 292 of CCIS pp. 159–168. Springer, Berlin (2012)
34.
Zurück zum Zitat Smith, R.: An overview of the Tesseract OCR engine. In Proceedings of International Conference on Document Analysis and Recognition 2, 629–633 (2007) Smith, R.: An overview of the Tesseract OCR engine. In Proceedings of International Conference on Document Analysis and Recognition 2, 629–633 (2007)
35.
Zurück zum Zitat Vandewalle, P., Kovacevic, J., Vetterli, M.: Reproducible research in signal processing. IEEE Signal Process. Mag. 26(3), 37–47 (2009) Vandewalle, P., Kovacevic, J., Vetterli, M.: Reproducible research in signal processing. IEEE Signal Process. Mag. 26(3), 37–47 (2009)
36.
Zurück zum Zitat Lazzara, G., Levillain, R., Géraud, T., Jacquelet, Y., Marquegnies, J., Crépin-Leblond, A.: The SCRIBO module of the Olena platform: a free software framework for document image analysis. In Proc. of the Intl. Conf. on Document Analysis and Recognition (2011) Lazzara, G., Levillain, R., Géraud, T., Jacquelet, Y., Marquegnies, J., Crépin-Leblond, A.: The SCRIBO module of the Olena platform: a free software framework for document image analysis. In Proc. of the Intl. Conf. on Document Analysis and Recognition (2011)
37.
Zurück zum Zitat Levillain, R., Géraud, T., Najman, L.: Milena: Write generic morphological algorithms once, run on many kinds of images. In: Mathematical Morphology and Its Application to Signal and Image Processing (Proceedings of the International Symposium on Mathematical Morphology), pp. 295–306. Springer, Berlin (2009) Levillain, R., Géraud, T., Najman, L.: Milena: Write generic morphological algorithms once, run on many kinds of images. In: Mathematical Morphology and Its Application to Signal and Image Processing (Proceedings of the International Symposium on Mathematical Morphology), pp. 295–306. Springer, Berlin (2009)
38.
Zurück zum Zitat Levillain, R., Géraud, T., Najman, L.: Why and how to design a generic and efficient image processing framework: The case of the Milena library. In: Proceedings of the IEEE International Conference on Image Processing, pp. 1941–1944 (2010) Levillain, R., Géraud, T., Najman, L.: Why and how to design a generic and efficient image processing framework: The case of the Milena library. In: Proceedings of the IEEE International Conference on Image Processing, pp. 1941–1944 (2010)
Metadaten
Titel
Efficient multiscale Sauvola’s binarization
verfasst von
Guillaume Lazzara
Thierry Géraud
Publikationsdatum
01.06.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 2/2014
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-013-0209-0

Weitere Artikel der Ausgabe 2/2014

International Journal on Document Analysis and Recognition (IJDAR) 2/2014 Zur Ausgabe

Premium Partner