Abstract
Contemporary business documents contain diverse, multi-layered mixtures of textual, graphical, and pictorial elements. Existing methods for document segmentation and classification do not handle well the complexity and variety of contents, geometric layout, and elemental shapes. This paper proposes a novel document image classification approach that distributes individual pixels into four fundamental classes (text, image, graphics, and background) through support vector machines. This approach uses a novel low-dimensional feature descriptor based on textural properties. The proposed feature vector is constructed by considering the sparseness of the document image responses to a filter bank on a multi-resolution and contextual basis. Qualitative and quantitative evaluations on business document images show the benefits of adopting a contextual and multi-resolution approach. The proposed approach achieves excellent results; it is able to handle varied contents and complex document layouts, without imposing any constraint or making assumptions about the shape and spatial arrangement of document elements.
Similar content being viewed by others
References
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst, Man Cybern. SMC–3(6), 610–621 (1973)
Galloway, M.M.: Texture analysis using gray level run lengths. Comput. Graph. Image Process. 4(2), 172–179 (1975)
Tuceryan, M., Jain, A.K.: Texture analysis. In: Chen, C.H., Pau, L.F., Wang, P.S.P. (eds.) Handbook of Pattern Recognition and Computer Vision, pp. 235–276. World Scientific, Singapore (1993)
Turner, M.R.: Texture discrimination by Gabor functions. Biol. Cybern. 55(2–3), 71–82 (1986)
Liu, Y., Srihari, S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 540–544 (1997)
Bloomberg, D.S.: Multiresolution Morphological Approach to Document Image Analysis. ICDAR, Saint-Malo (1991)
Zhu, Y., Tan, T., Wang, Y.: Font recognition based on global texture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1192–1200 (2001)
Ma, H., Doermann, D.: Gabor filter based multi-class classifier for scanned document images. In: ICDAR, Edinburgh, UK pp. 968–72 (2003)
Aviles-Cruz, C., Rangel-Kuoppa, R., Reyes-Ayala, M., Andrade-Gonzalez, A., Escarela-Perez, R.: High-order statistical texture analysis—font recognition applied. Pattern Recognit. Lett. 26(2), 135–145 (2005)
Peake, G.S., Tan, T.N.: Script and language identification from document images. In: DIA, San Juan, Puerto Rico pp. 10–17 (1997)
Tan, T.N.: Rotation invariant texture features and their use in automatic script identification. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 751–756 (1998)
Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)
Hiremath, P.S., Shivashankar, S.: Wavelet based co-occurrence histogram features for texture classification with an application to script identification in a document image. Pattern Recognit. Lett. 29(9), 1182–1189 (2008)
Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 591–605 (2008)
Tian, Y., Narasimhan, S.G.: Rectification and 3D reconstruction of curved document images. In: CVPR, Providence, USA, pp. 377–84 (2011)
Cullen, J.F., Hull, J.J., Hart, P.E.: Document image database retrieval and browsing using texture analysis. In: ICDAR, Ulm, Germany vol. 2, pp. 718–721 (1997)
Journet, N., Ramel, J., Mullot, R., Eglin, V.: Document image characterization using a multiresolution analysis of the texture: application to old documents. Int. J. Doc. Anal. Recognit. 11(1), 9–18 (2008)
Wang, D., Srihari, S.N.: Classification of newspaper image blocks using texture analysis. Comput. Vis. Graph. Image Process. 47(3), 327–352 (1989)
Chetverikov, D., Liang, J., Komuves, J., Haralick, R.M.: Zone classification using texture features. In: ICPR, Vienna, Austria, vol. 3, pp. 676–80 (1996)
Eglin, V., Gagneux, A.: Visual Exploration and functional document labeling. In: ICDAR, Seattle, USA pp. 816–20 (2001)
Allier, B., Duong, J., Gagneux, A., Mallet, P., Emptoz, H.: Texture feature characterization for logical pre-labeling. In: ICDAR, Edinburgh, UK, vol. 1, pp. 567–71 (2003)
Payne, J.S., Stonham, T.J., Patel, D.: Document segmentation using texture analysis. In: ICPR, Jerusalem, Israel, vol. 2, pp. 380–382 (1994)
Chen, J.L.: A simplified approach to the HMM based texture analysis and its application to document segmentation. Pattern Recognit. Lett. 18(10), 993–1007 (1997)
Baird, H.S., Moll, M.A., An, C., Casey, M.R.: Document image content inventories. In: DRR XIV (Proc SPIE vol 6500), San Jose, USA 65000X-1-12 (2007)
Kim, B.R., Kim, W.H.: Texture-based PCA for classifying contents in document image. In: IPCV, Las Vegas, USA vol. 1, pp. 228–233 (2008)
Jain, A. K., Bhattacharjee, S.K., Chen, Y. (1992) On texture in document images. In: CVPR, Champaign, USA, pp. 677–80
Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognit. 29(5), 743–770 (1996)
Vieux, R., Domenger, J.P.: Hierarchical clustering model for pixel-based classification of document images. In: ICPR, Tsukuba, Japan, pp. 290–293 (2012)
Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: ICDAR, Barcelona, Spain, pp. 296–300 (2009)
Zhong, G., Cheriet, M.: Image patches analysis for text block identification. In: ISSPA, Montreal, Canada, pp. 1241–1246 (2012)
Etemad, K., Doermann, D., Chellappa, R.: Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 92–96 (1997)
Li, J., Gray, R.M.: Context-based multiscale classification of document images using wavelet coefficient distributions. IEEE Trans. Image Process. 9(9), 1604–1616 (2000)
Lee, S.W., Ryu, D.S.: Parameter-free geometric document layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1240–1256 (2001)
Acharyya, M., Kundu, M.K.: Document image segmentation using wavelet scale-space features. IEEE Trans. Circuits Syst. Video Technol. 12(12), 1117–1127 (2002)
Sauvola, J., Kauniskangas, H.: MediaTeam Document Database II, a CD-ROM Collection of Document Images. Univ of Oulu (1999)
Ford, G, Thoma, G.R.: Ground truth data for document image analysis. In: SDIUT, Greenbelt, USA, pp. 199–205 (2003)
Todoran, L., Worring, M., Smeulders, A.W.M.: The UvA color document dataset. Int. J. Doc. Anal. Recognit. 7(4), 228–240 (2005)
Clausner, C., Pletschacher, S., Antonacopoulos, A.: Aletheia—an advanced document layout and text ground-truthing system for production environments. In: ICDAR, Beijing, China pp. 48–52 (2011)
Pletschacher, S., Antonacopoulos, A.: The PAGE (Page Analysis and Ground-truth Elements) format framework. In: ICPR, Istanbul, Turkey, pp. 257–260 (2010)
O’Gorman, L., Kasturi, R.: Document Image Analysis. IEEE Computer Society Press, Los Alamitos (1997)
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1), 29–44 (2001)
Omer, I., Werman, M.: Image specific feature similarities. In: ECCV (Lect Notes Comput Sc vol 3952), Graz, Austria, pp. 321–333 (2006)
Lu, L., Toyama, K., Hager, G.D.: A two level approach for scene recognition. In: CVPR, San Diego, USA, vol. 1, pp. 688–695 (2005)
Garcia-Pineda, O., MacDonald, I., Zimmer, B.: Synthetic aperture radar image processing using the supervised textural-neural network classification algorithm. In: IGARSS, Boston, USA, vol. 4, pp. 1265–1268 (2008)
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. J. Comput. Vis. 62(1), 61–81 (2005)
Hurley, N., Rickard, S.: Comparing measures of sparsity. IEEE Trans. Inf. Theory 55(10), 4723–4741 (2009)
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)
Hoang, T.V., Tabbone, S.: Text extraction from graphical document images using sparse representation. In: DAS, Boston, USA, pp. 143–150 (2010)
Zhao, M., Li, S., Kwok, J.: Text detection in images using sparse representation with discriminative dictionaries. Image Vis. Comput. 28(12), 1590–1599 (2010)
Pan, W., Bui, T.D., Suen, C.Y.: Text detection from scene images using sparse representation. In: ICPR, Tampa, USA, pp. 1–5 (2008)
Zhang, F., Ye, X., Liu, W.: Image decomposition and texture segmentation via sparse representation. IEEE Signal Process. Lett. 15, 641–644 (2008)
Alpert, S., Galun, M., Basri, R., Brandt, A.: Image segmentation by probabilistic bottom-up aggregation and cue integration. In: CVPR, Minneapolis, USA, pp. 1–8 (2007)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5(9), 1457–1469 (2004)
Bukhari, S.S., Al-Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document image segmentation using discriminative learning over connected components. In: DAS, Boston, USA, pp. 183–90 (2010)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: COLT, Pittsburgh, USA, pp. 144–152 (1992)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Mathur, A., Foody, G.M.: Multiclass and binary SVM classification: implications for training and classification users. IEEE Geosci. Remote. Sens. Lett. 5(2), 241–245 (2008)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. In: Technical Report, Dept of Comput Sci, Natl Taiwan Univ (2003)
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org/
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Natural Sciences and Engineering Research Council of Canada and SAP Canada through the Collaborative Research and Development Grants Program. Special thanks to Prof. Nicholas Journet for his help on implementing the comparison method of Sect. 5.4.
Rights and permissions
About this article
Cite this article
Cote, M., Branzan Albu, A. Texture sparseness for pixel classification of business document images. IJDAR 17, 257–273 (2014). https://doi.org/10.1007/s10032-014-0217-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-014-0217-8