Skip to main content
Top
Published in: Pattern Analysis and Applications 4/2009

01-12-2009 | Theoretical Advances

Overlapping and multi-touching text-line segmentation by Block Covering analysis

Authors: Abderrazak Zahour, Brunco Taconet, Laurence Likforman-Sulem, Wafa Boussellaa

Published in: Pattern Analysis and Applications | Issue 4/2009

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents a new approach for text-line segmentation based on Block Covering which solves the problem of overlapping and multi-touching components. Block Covering is the core of a system which processes a set of ancient Arabic documents from historical archives. The system is designed for separating text-lines even if they are overlapping and multi-touching. We exploit the Block Covering technique in three steps: a new fractal analysis (Block Counting) for document classification, a statistical analysis of block heights for block classification and a neighboring analysis for building text-lines. The Block Counting fractal analysis, associated with a fuzzy C-means scheme, is performed on document images in order to classify them according to their complexity: tightly (closely) spaced documents (TSD) or widely spaced documents (WSD). An optimal Block Covering is applied on TSD documents which include overlapping and multi-touching lines. The large blocks generated by the covering are then segmented by relying on the statistical analysis of block heights. The final labeling into text-lines is based on a block neighboring analysis. Experimental results provided on images of the Tunisian Historical Archives reveal the feasibility of the Block Covering technique for segmenting ancient Arabic documents.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
2.
go back to reference Kolcz A, Alspector J, Augusteyn M, Carlson R, Viorel Popescu G (2000) A line-oriented approach to word spotting in handwritten documents. Pattern Anal Appl 3:155–168CrossRef Kolcz A, Alspector J, Augusteyn M, Carlson R, Viorel Popescu G (2000) A line-oriented approach to word spotting in handwritten documents. Pattern Anal Appl 3:155–168CrossRef
3.
go back to reference Lakshmi CV, Patvardhan C (2004) An optical character recognition system for printed Telugu text. Pattern Anal Appl 7:190–204MathSciNet Lakshmi CV, Patvardhan C (2004) An optical character recognition system for printed Telugu text. Pattern Anal Appl 7:190–204MathSciNet
4.
go back to reference Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. IJDAR 9(2–4):123–138 Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. IJDAR 9(2–4):123–138
5.
go back to reference Abuhaiba ISI, Datta S, Holt MJJ (2005) Line extraction and stroke ordering of text pages. In: Proceedings of ICDAR’05, Seoul (South Korea), pp 390–393 Abuhaiba ISI, Datta S, Holt MJJ (2005) Line extraction and stroke ordering of text pages. In: Proceedings of ICDAR’05, Seoul (South Korea), pp 390–393
6.
go back to reference Oztop E, Mulayim AY, Atalay V, Yarman-Vural F (1999) Repulsive attractive network for baseline extraction on document images. Signal Process 75:1–10CrossRef Oztop E, Mulayim AY, Atalay V, Yarman-Vural F (1999) Repulsive attractive network for baseline extraction on document images. Signal Process 75:1–10CrossRef
7.
go back to reference Li Y, Zheng Y, Doermann D (2006) Detecting text lines in handwritten documents. In: Proceedings of ICPR’06, Hong Kong, pp 1030–1033 Li Y, Zheng Y, Doermann D (2006) Detecting text lines in handwritten documents. In: Proceedings of ICPR’06, Hong Kong, pp 1030–1033
9.
go back to reference Lorigo LM, Govindaraju V (2006) Off-line Arabic handwriting recognition—a survey. IEEE PAMI 28(5):712–724 Lorigo LM, Govindaraju V (2006) Off-line Arabic handwriting recognition—a survey. IEEE PAMI 28(5):712–724
10.
go back to reference Arivazhagan M, Srinivasan H, Srihari S (2007) A statistical approach to line segmentation in handwritten documents. In: Proceedings of Document Recognition and Retrieval XIV, IST&SPIE, San Jose Arivazhagan M, Srinivasan H, Srihari S (2007) A statistical approach to line segmentation in handwritten documents. In: Proceedings of Document Recognition and Retrieval XIV, IST&SPIE, San Jose
11.
go back to reference Zahour A, Taconet B, Mercy P, Ramdane S (2001) Arabic hand-written text-line extraction. In: Proceedings of ICDAR’01, 10–13 Sept., Seattle, USA, pp 281–285 Zahour A, Taconet B, Mercy P, Ramdane S (2001) Arabic hand-written text-line extraction. In: Proceedings of ICDAR’01, 10–13 Sept., Seattle, USA, pp 281–285
12.
go back to reference Amin A, Fischer S (2000) A document skew detection method using the Hough transform. Pattern Anal Appl 3:243–253MATHCrossRef Amin A, Fischer S (2000) A document skew detection method using the Hough transform. Pattern Anal Appl 3:243–253MATHCrossRef
13.
go back to reference Boussellaa W, Zahour A, El Abed H (2006) A concept for the separation of foreground/background in Arabic historical manuscripts using hybrid methods. In: Ioannides M, Arnold D, Niccolucci F, Mania K (eds) Proceedings of the 7th internat. symp. on virtual reality, archaeology and cultural heritage VAST, pp 1–5 Boussellaa W, Zahour A, El Abed H (2006) A concept for the separation of foreground/background in Arabic historical manuscripts using hybrid methods. In: Ioannides M, Arnold D, Niccolucci F, Mania K (eds) Proceedings of the 7th internat. symp. on virtual reality, archaeology and cultural heritage VAST, pp 1–5
14.
go back to reference Dodson M, Kristensen S (2004) Hausdorff dimension and diophantine approximation. Fractal geometry and applications: a jubilee of Benoit Mandelbrot. Part 1. Proceedings of Sympos. Pure Math., vol 72, Part 1, Amer. Math. Soc., Providence, pp 305–347 Dodson M, Kristensen S (2004) Hausdorff dimension and diophantine approximation. Fractal geometry and applications: a jubilee of Benoit Mandelbrot. Part 1. Proceedings of Sympos. Pure Math., vol 72, Part 1, Amer. Math. Soc., Providence, pp 305–347
15.
go back to reference Boulétreau V, Vincent N, Emptoz H, Sabourin R (2000) How to use fractal dimension to qualify writings and writers. Fractals Complex Geometry Patterns Scaling Nat Soc 8(1):85–98CrossRef Boulétreau V, Vincent N, Emptoz H, Sabourin R (2000) How to use fractal dimension to qualify writings and writers. Fractals Complex Geometry Patterns Scaling Nat Soc 8(1):85–98CrossRef
16.
go back to reference Vincent N, Emptoz H (1995) A classification of writing based on fractals. In: Novak MM (ed) Fractal reviews in the natural and applied sciences. Chapman & Hall, London, pp 320–331 Vincent N, Emptoz H (1995) A classification of writing based on fractals. In: Novak MM (ed) Fractal reviews in the natural and applied sciences. Chapman & Hall, London, pp 320–331
17.
go back to reference Ben Moussa S, Zahour A, Alimi MA, Benabdelhafid A (2005) Can fractal dimension be used in font classification. In: Proceedings of ICDAR 2005, Seoul (South Korea) Ben Moussa S, Zahour A, Alimi MA, Benabdelhafid A (2005) Can fractal dimension be used in font classification. In: Proceedings of ICDAR 2005, Seoul (South Korea)
19.
go back to reference Wu S, Chow TWS (2005) Clustering of the self-organizing map using a clustering validity index based on inter and intra-cluster density. Pattern Recognit 37(2):175–188CrossRef Wu S, Chow TWS (2005) Clustering of the self-organizing map using a clustering validity index based on inter and intra-cluster density. Pattern Recognit 37(2):175–188CrossRef
20.
go back to reference Falconer K (1997) Techniques in fractal geometry. Willey, New York, ISBN 0–471-92287-0 Falconer K (1997) Techniques in fractal geometry. Willey, New York, ISBN 0–471-92287-0
Metadata
Title
Overlapping and multi-touching text-line segmentation by Block Covering analysis
Authors
Abderrazak Zahour
Brunco Taconet
Laurence Likforman-Sulem
Wafa Boussellaa
Publication date
01-12-2009
Publisher
Springer-Verlag
Published in
Pattern Analysis and Applications / Issue 4/2009
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-008-0127-9

Other articles of this Issue 4/2009

Pattern Analysis and Applications 4/2009 Go to the issue

Premium Partner