Skip to main content
Top

2017 | Book

Optical Character Recognition Systems for Different Languages with Soft Computing

insite
SEARCH

About this book

The book offers a comprehensive survey of soft-computing models for optical character recognition systems. The various techniques, including fuzzy and rough sets, artificial neural networks and genetic algorithms, are tested using real texts written in different languages, such as English, French, German, Latin, Hindi and Gujrati, which have been extracted by publicly available datasets. The simulation studies, which are reported in details here, show that soft-computing based modeling of OCR systems performs consistently better than traditional models. Mainly intended as state-of-the-art survey for postgraduates and researchers in pattern recognition, optical character recognition and soft computing, this book will be useful for professionals in computer vision and image processing alike, dealing with different issues related to optical character recognition.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
Optical character recognition (OCR) is one of the most popular areas of research in pattern recognition [3, 25] since past few decades. It is an actively studied topic in industry and academia [8, 15, 18, 24] because of its immense application potential. OCR was initially studied in early 1930s [23].
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 2. Optical Character Recognition Systems
Abstract
Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. This chapter presents the basic ideas of OCR needed for a better understanding of the book. The chapter starts with a brief background and history of OCR systems. Then the different techniques of OCR systems such as optical scanning, location segmentation, pre-processing, segmentation, representation, feature extraction, training and recognition and post-processing. The different applications of OCR systems are highlighted next followed by the current status of the OCR systems. Finally, the future of the OCR systems is presented.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 3. Soft Computing Techniques for Optical Character Recognition Systems
Abstract
The continuous increase in demand to discover robust and low cost optical character recognition (OCR) systems has prompted researchers to look for rigorous methods of character recognition. In the past OCR systems have been built through traditional pattern recognition and machine learning approaches. There has always been a quest to develop best OCR products which satisfy the user’s needs. Since past few decades soft computing techniques have come up as a promising candidate for the development of cost effective OCR systems. Some important soft computing techniques for optical character recognition (OCR) systems are presented in this chapter. They are hough transform for fuzzy feature extraction, genetic algorithms (GA) for feature selection, fuzzy multilayer perceptron (FMLP), rough fuzzy multilayer perceptron (RFMLP), fuzzy support vector machine (FSVM), fuzzy rough versions of support vector machine (FRSVM), hierarchical fuzzy bidirectional recurrent neural networks (HFBRNN) and fuzzy markov random fields (FMRF). These techniques are used for developing OCR systems for different languages viz English, French, German, Latin, Hindi and Gujrati languages. The soft computing methods are used in the different steps of OCR systems discussed in Chap. 2. A comprehensive assessment of these methods is performed in Chaps. 49 for the stated languages. A thorough understanding of this chapter will help the readers to appreciate the reading material presented in the abovementioned chapters.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 4. Optical Character Recognition Systems for English Language
Abstract
The optical character recognition (OCR) systems for English language were the most primitive ones and occupy a significant place in pattern recognition. The English language OCR systems have been used successfully in a wide array of commercial applications. The different challenges involved in the OCR systems for English language is investigated in this chapter. The pre-processing activities such as binarization, noise removal, skew detection and correction, character segmentation and thinning are performed on the datasets considered. The feature extraction is performed through discrete cosine transformation. The feature based classification is performed through important soft computing techniques viz fuzzy multilayer perceptron (FMLP), rough fuzzy multilayer perceptron (RFMLP), fuzzy support vector machine (FSVM) and fuzzy rough support vector machine (FRSVM). The superiority of soft computing techniques is demonstrated through the experimental results.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 5. Optical Character Recognition Systems for French Language
Abstract
The optical character recognition (OCR) systems for French language were the most primitive ones and occupy a significant place in pattern recognition. The French language OCR systems have been used successfully in a wide array of commercial applications. The different challenges involved in the OCR systems for French language is investigated in this chapter. The pre-processing activities such as text region extraction, skew detection and correction, binarization, noise removal, character segmentation and thinning are performed on the datasets considered. The feature extraction is performed through fuzzy Hough transform. The feature based classification is performed through important soft computing techniques viz rough fuzzy multilayer perceptron (RFMLP), fuzzy support vector machine (FSVM), fuzzy rough support vector machine (FRSVM) and hierarchical fuzzy bidirectional recurrent neural networks (HFBRNN). The superiority of soft computing techniques is demonstrated through the experimental results.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 6. Optical Character Recognition Systems for German Language
Abstract
The optical character recognition (OCR) systems for German language were the most primitive ones and occupy a significant place in pattern recognition. The German language OCR systems have been used successfully in a wide array of commercial applications. The different challenges involved in the OCR systems for German language is investigated in this chapter. The pre-processing activities such as text region extraction, skew detection and correction, binarization, noise removal, character segmentation and thinning are performed on the datasets considered. The feature extraction is performed through fuzzy Genetic Algorithms (GA). The feature based classification is performed through important soft computing techniques viz rough fuzzy multilayer perception (RFMLP), fuzzy support vector machine (FSVM), fuzzy rough support vector machine (FRSVM) and hierarchical fuzzy bidirectional recurrent neural networks (HFBRNN). The superiority of soft computing techniques is demonstrated through the experimental results.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 7. Optical Character Recognition Systems for Latin Language
Abstract
The optical character recognition (OCR) systems for Latin language were the most primitive ones and occupy a significant place in pattern recognition. The Latin language OCR systems have been used successfully in a wide array of commercial applications. The different challenges involved in the OCR systems for Latin language is investigated in this Chapter. The pre-processing activities such as text region extraction, skew detection and correction, binarization, noise removal, character segmentation and thinning are performed on the datasets considered. The feature extraction is performed through fuzzy Genetic Algorithms (GA). The feature based classification is performed through important soft computing techniques viz rough fuzzy multilayer perceptron (RFMLP), fuzzy support vector machine (FSVM) and fuzzy rough support vector machine (FRSVM) and hierarchical fuzzy bidirectional recurrent neural networks (HFBRNN). The superiority of soft computing techniques is demonstrated through the experimental results.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 8. Optical Character Recognition Systems for Hindi Language
Abstract
The optical character recognition (OCR) systems for Hindi language were the most primitive ones and occupy a significant place in pattern recognition. The Hindi language OCR systems have been used successfully in a wide array of commercial applications. The different challenges involved in the OCR systems for Hindi language is investigated in this Chapter. The pre-processing activities such as binarization, noise removal, skew detection, character segmentation and thinning performed on the datasets considered. The feature extraction is performed through fuzzy Hough transform. The feature based classification is performed through important soft computing techniques viz rough fuzzy multilayer perceptron (RFMLP), fuzzy support vector machine (FSVM), fuzzy rough support vector machine (FRSVM) and fuzzy markov random fields (FMRF). The superiority of soft computing techniques is demonstrated through the experimental results.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 9. Optical Character Recognition Systems for Gujrati Language
Abstract
The optical character recognition (OCR) systems for Gujrati language were the most primitive ones and occupy a significant place in pattern recognition. The Gujrati language OCR systems have been used successfully in a wide array of commercial applications. The different challenges involved in the OCR systems for Gujrati language is investigated in this Chapter. The pre-processing activities such as binarization, noise removal, skew detection, character segmentation and thinning performed on the datasets considered. The feature extraction is performed through fuzzy Genetic Algorithms (GA). The feature based classification is performed through important soft computing techniques viz rough fuzzy multilayer perceptron (RFMLP), fuzzy support vector machine (FSVM), fuzzy rough support vector machine (FRSVM) and fuzzy markov random fields (FMRF). The superiority of soft computing techniques is demonstrated through the experimental results.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Chapter 10. Summary and Future Research
Abstract
This research monograph is the outcome of the technical report some experiments on optical character recognition systems for different languages using soft computing techniques [5] from the research work done at Birla Institute of Technology Mesra, Patna Campus, India.
Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K. Ghosh
Backmatter
Metadata
Title
Optical Character Recognition Systems for Different Languages with Soft Computing
Authors
Arindam Chaudhuri
Krupa Mandaviya
Pratixa Badelia
Soumya K Ghosh
Copyright Year
2017
Electronic ISBN
978-3-319-50252-6
Print ISBN
978-3-319-50251-9
DOI
https://doi.org/10.1007/978-3-319-50252-6

Premium Partner