Skip to main content

2018 | OriginalPaper | Buchkapitel

Recognition System to Separate Text Graphics from Indian Newspaper

verfasst von : Shantanu Jana, Nibaran Das, Ram Sarkar, Mita Nasipuri

Erschienen in: Operations Research and Optimization

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Identification of graphics from newspaper pages and then their separation from text is a challenging task. Very few works have been reported in this field. In general, newspapers are printed in low quality papers which have a tendency to change color with time. This color change generates noise that adds with time to the document. In this work we have chosen several features to distinguish graphics from text as well as tried to reduce the noise. At first minimum bounding box around each object has been identified by connected component analysis of binary image. Each object was cropped thereafter and passed through geometric feature extraction system. Then we have done two different frequency analysis of each object. Thus we have collected both spatial and frequency domain features from objects which are used for training and testing purpose using different classifiers. We have applied the techniques on Indian newspapers written in roman script and got satisfactory results over that.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Garg, R., Bansal, A., Chaudhury, S., Roy, S.D.: Text graphic separation in Indian newspapers. In: Proceedings of 4th International Work Multiling. OCR-MOCR’13, August 24, p. 1 (2013) Garg, R., Bansal, A., Chaudhury, S., Roy, S.D.: Text graphic separation in Indian newspapers. In: Proceedings of 4th International Work Multiling. OCR-MOCR’13, August 24, p. 1 (2013)
3.
Zurück zum Zitat Mollah, A.F., Basu, S., Nasipuri, M., Basu, D.K.: Text/Graphics Separation for Business Card Devices, pp. 263–270 (2009) Mollah, A.F., Basu, S., Nasipuri, M., Basu, D.K.: Text/Graphics Separation for Business Card Devices, pp. 263–270 (2009)
4.
Zurück zum Zitat Rege, P.P., Chandrakar, C.A.: Text-Image Separation in Document Images Using Boundary/Perimeter Detection (2011) Rege, P.P., Chandrakar, C.A.: Text-Image Separation in Document Images Using Boundary/Perimeter Detection (2011)
5.
Zurück zum Zitat Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text Extraction in Complex Color Documents, vol. 35, pp. 1743–1758 (2002) Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text Extraction in Complex Color Documents, vol. 35, pp. 1743–1758 (2002)
6.
Zurück zum Zitat Garg, R., Hassan, E., Chaudhury, S., Gopal, M.: A CRF Based Scheme for Overlapping Multi-Colored Text Graphics Separation,” In: 2011 International Conference on Document Analysis and Recognition, no. c (2011) Garg, R., Hassan, E., Chaudhury, S., Gopal, M.: A CRF Based Scheme for Overlapping Multi-Colored Text Graphics Separation,” In: 2011 International Conference on Document Analysis and Recognition, no. c (2011)
7.
Zurück zum Zitat Cao, R., Tan, C.L.: Separation of Overlapping Text from Graphics, pp. 44–48 (2001) Cao, R., Tan, C.L.: Separation of Overlapping Text from Graphics, pp. 44–48 (2001)
8.
Zurück zum Zitat Science, C., Kent, L., Rd, R., Abe, N.: A Clustering-Based Approach to the Separation of Text Strings from Mixed Text Graphics Documents, pp. 706–710 (1996) Science, C., Kent, L., Rd, R., Abe, N.: A Clustering-Based Approach to the Separation of Text Strings from Mixed Text Graphics Documents, pp. 706–710 (1996)
9.
Zurück zum Zitat Vieux, R., Domenger, J., Talence, F.: Hierarchical Clustering Model for Pixel-Based Classification of Document Images, no. Icpr, pp. 290–293 (2012) Vieux, R., Domenger, J., Talence, F.: Hierarchical Clustering Model for Pixel-Based Classification of Document Images, no. Icpr, pp. 290–293 (2012)
10.
Zurück zum Zitat Chinnasarn, K.: Removing Salt-and-Pepper Noise in Text/Graphics Images, IEEE, pp. 459–462 Chinnasarn, K.: Removing Salt-and-Pepper Noise in Text/Graphics Images, IEEE, pp. 459–462
11.
Zurück zum Zitat Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image Analysis Using Mathemetical Morphology, IEEE Trans. Pattern Anal. Mach. Intel. (4), pp. 532–550 (1987) Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image Analysis Using Mathemetical Morphology, IEEE Trans. Pattern Anal. Mach. Intel. (4), pp. 532–550 (1987)
12.
Zurück zum Zitat Kowalczyk, M., Koza, P., Kupidura, P., Marciniak, J.: Application of Mathematical Morphology Operations for Simplification and Improvement of Correlation of Images in Close-Range Photogrammetry, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVII, part B5. Beijing (2008) Kowalczyk, M., Koza, P., Kupidura, P., Marciniak, J.: Application of Mathematical Morphology Operations for Simplification and Improvement of Correlation of Images in Close-Range Photogrammetry, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVII, part B5. Beijing (2008)
13.
Zurück zum Zitat Verma, R., Ali, J.: A Comparative Study of Various Types of Image Noise and Efficient Noise Removal Techniques, Int. J. Adv. Res. Comput. Sci. Soft. Eng. 3(10), 617–622 (2013) Verma, R., Ali, J.: A Comparative Study of Various Types of Image Noise and Efficient Noise Removal Techniques, Int. J. Adv. Res. Comput. Sci. Soft. Eng. 3(10), 617–622 (2013)
14.
Zurück zum Zitat Kumar, M., Saxena, R.: Algorithm and Technique on Various Edge Detection: A Survey, vol. 4, no. 3, pp. 65–75 (2013) Kumar, M., Saxena, R.: Algorithm and Technique on Various Edge Detection: A Survey, vol. 4, no. 3, pp. 65–75 (2013)
15.
Zurück zum Zitat To, E.: The, A DWT, DCT and SVD Based Watermarking, vol. 4, no. 2, pp. 21–32 (2013) To, E.: The, A DWT, DCT and SVD Based Watermarking, vol. 4, no. 2, pp. 21–32 (2013)
16.
Zurück zum Zitat Jiansheng, M., Sukang, L., Xiaomei, T.: A Digital Watermarking Algorithm Based on DCT and DWT, In: Proceedings of the 2009 International Symposium on Web Information Systems and Applications (WISA’09) Nanchang, P. R. China, May 22–24, vol. 8, no. 2, pp. 104–107 (2009) Jiansheng, M., Sukang, L., Xiaomei, T.: A Digital Watermarking Algorithm Based on DCT and DWT, In: Proceedings of the 2009 International Symposium on Web Information Systems and Applications (WISA’09) Nanchang, P. R. China, May 22–24, vol. 8, no. 2, pp. 104–107 (2009)
Metadaten
Titel
Recognition System to Separate Text Graphics from Indian Newspaper
verfasst von
Shantanu Jana
Nibaran Das
Ram Sarkar
Mita Nasipuri
Copyright-Jahr
2018
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-7814-9_14

Premium Partner