Abstract
Script identification from complex and colorful images is an integral part of the text recognition and classification system. Such images may contain twofold challenges: (1) Challenges related to the camera like blurring effect, non-uniform illumination and noisy background, and so on, and (2) Challenges related to the text shape, orientation, and text size. The present work in this area is much focused on non-Indian scripts. In contrast, Gurumukhi, Hindi, and English scripts play a vital role in communication among Indians and foreigners. In this article, we focus on the above said challenges in the field of identifying the script. Additionally, we have introduced a new dataset that contains Hindi, Gurumukhi, and English scripts from scenic images collected from different sources. We also proposed a CNN-based model, which is capable of distinguishing between the scripts with good accuracy. Performance of the method has been evaluated for own dataset, i.e., NITJDATASET and other benchmarked datasets available for Indian scripts, i.e., CVSI-2015 (Task-1 and Task 4) and ILST. This work is an extension to find the script from strict text background.
- 2020. Named entity recognition and classification for punjabi shahmukhi. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 4 (2020), 1–13.
DOI: Google ScholarDigital Library . - 2017. AROMA: A recursive deep learning model for opinion mining in arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 20 pages.
DOI: Google ScholarDigital Library . - 2020. Improved inception-residual convolutional neural network for object recognition. Neural Computing and Applications 32, 1 (2020), 279–293.
DOI: Google ScholarDigital Library . - 2020. Dynamic features based stroke recognition system for signboard images of gurmukhi text. Multimedia Tools and Applications 80, 1 (2020), 1–25.
DOI: Google ScholarDigital Library . - 2019. Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognition 85 (2019), 172–184.
DOI: Google ScholarCross Ref . - 2009. Word-wise thai and roman script identification. ACM Transactions on Asian Language Information Processing 8, 3 (2009), 21 pages.
DOI: Google ScholarDigital Library . - 2016. Script identification using gabor feature and SVM classifier. In Procedia Computer Science, Vol. 79. Elsevier Masson SAS, 85–92.
DOI: Google ScholarCross Ref . - 2019. Patch aggregator for scene text script identification. In Proceedings of the 2019 International Conference on Document Analysis and Recognition. IEEE, Sydney, Australia, Australia.
DOI: Google ScholarCross Ref . - 2016. A four-tier annotated Urdu handwritten text image dataset for multidisciplinary research on Urdu script. ACM Transactions on Asian and Low-Resource Language Information Processing 15, 4 (2016), 23 pages.
DOI: Google ScholarDigital Library . - 2011. Nonparametric statistics for non-statisticians. (2011).Google Scholar .
- 2019. A survey of deep learning and its applications: A new paradigm to machine learning. Archives of Computational Methods in EngineeringJuly 27, 4 (2019), 1071–1092.
DOI: Google ScholarCross Ref . - 2017. Word-level script identification from scene images. In Proceedings of the Advances in Intelligent Systems and Computing. Number March. 417–425.
DOI: Google ScholarCross Ref . - 2019. A lightweight residual-inception convolutional neural network. Journal of Physics: Conference Series 1237, 3 (2019), 1–7.
DOI: Google ScholarCross Ref . - 2019. Identifying the presence of graphical texts in scene images using CNN. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops. 86–91.
DOI: Google ScholarCross Ref . - 2005. Script recognition in images with complex backgrounds. In Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Vol. 2005. IEEE, 589–594.
DOI: Google ScholarCross Ref . - 2016. A fine-grained approach to scene text script identification. In Proceedings of the 12th IAPR International Workshop on Document Analysis Systems, DAS 2016. 192–197.
DOI: Google ScholarCross Ref . - 2017. Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognition 67 (2017), 85–96.
DOI: Google ScholarDigital Library . - 2019. Script identification from camera-captured multi-script scene text components. In Proceedings of the Advances in Intelligent Systems and Computing. Recent Developments in Machine Learning and Data AnalyticsSpringer Singapore, 159–166.
DOI: Google ScholarCross Ref . - 2007. A generalised framework for script identification. International Journal on Document Analysis and Recognition 10, 2 (2007), 55–68.
DOI: Google ScholarDigital Library . - 2019. Zero shot learning based script identification in the wild. In Proceedings of the 2019 International Conference on Document Analysis and Recognitionii (2019), 987–992.
DOI: Google ScholarCross Ref . - 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review0123456789 (2020).
DOI: Google ScholarDigital Library . - 2016. Deepti khanduja, neeta nain,” segmentation and recognition techniques for handwritten devanagari script. ACM Transactions on Asian and Low-Resource Language Information Processing 15, 1 (2016), 10 pages.Google Scholar .
- 2019. Character and numeral recognition for non-indic and indic scripts: A survey. Artificial Intelligence Review 52, 4 (2019), 2235–2261.
DOI: Google ScholarDigital Library . - 2019. Low-resource machine transliteration using recurrent neural networks. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 2 (2019), 1–14.Google ScholarDigital Library .
- 2019. Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7 (2019), 52669–52679.
DOI: Google ScholarCross Ref . - 2020. An Empirical Evaluation of Deep Learning Techniques for Human Activity Recognition. Ph.D. Dissertation.Google Scholar .
- 2018. Text extraction from indian and non-indian natural scene images : A review. In Proceedings of the 2018 1st International Conference on Secure Cyber Computing and Communication (2018), 584–588.
DOI: Google ScholarCross Ref . - 2019. A decade on script identification from natural images/videos: A review. In Proceedings of the 2019 International Conference on Issues and Challenges in Intelligent Computing Techniques. 1–5. Google ScholarCross Ref .
- 2021. Text detection and localization in scene images: A broad review. Artificial Intelligence Review54 (2021), 4317–4377.
DOI :Google ScholarDigital Library . - 2020. Recent trends in deep learning based personality detection. Artificial Intelligence Review 53, 4 (2020), 2313–2339.
DOI : Google ScholarCross Ref . - 2017. Scene text script identification with convolutional recurrent neural networks. In Proceedings - International Conference on Pattern Recognition. IEEE, 4053–4058.
DOI : Google ScholarCross Ref . - 2019. Handwritten manipuri meetei-mayek classification using convolutional neural network. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 4 (2019).
DOI : Google ScholarDigital Library . - 2017. Word-level multi-script indic document image dataset and baseline results on script identification. International Journal of Computer Vision and Image Processing 7, 2 (2017), 81–94.
DOI : Google ScholarDigital Library . - 2018a. Extreme learning machine for handwritten indic script identification in multiscript documents. Journal of Electronic Imaging 27, 05 (2018), 1.
DOI :Google ScholarCross Ref . - 2018b. PHDIndic_11: Page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimedia Tools and Applications 77, 2 (2018), 1643–1678.
DOI : Google ScholarDigital Library . - 2018c. Handwritten indic script identification in multi-script document images: A survey. International Journal of Pattern Recognition and Artificial Intelligence 32, 10 (2018), 1–7.
DOI :Google ScholarCross Ref . - 2019. Automatic indic script identification from handwritten documents: Page, block, line and word-level approach. International Journal of Machine Learning and Cybernetics 10, 1 (2019), 87–106.
DOI : Google ScholarCross Ref . - 2019. Deep learning techniques for grape plant species identification in natural images. Sensors (Switzerland) 19, 22 (2019), 4850–4865.
DOI :Google ScholarCross Ref . - 2018. Date-field retrieval in scene image and video frames using text enhancement and shape coding. Neurocomputing 274, 2017 (2018), 37–49.
DOI : Google ScholarDigital Library . - 2017. Script identification algorithms : A survey. International Journal of Multimedia Information Retrieval 6, 3 (2017), 211–232.
DOI : Google ScholarCross Ref . - 2013. Word-wise script identification from video frames. In Proceedings of the International Conference on Document Analysis and Recognition. IEEE, 867–871.
DOI: Google ScholarDigital Library . - 2015. ICDAR2015 competition on video script identification (CVSI 2015). In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition. IEEE, 1196–1200.Google ScholarDigital Library .
- 2016. Script identification in the wild via discriminative convolutional neural network. Pattern Recognition 52, abs/1505.02982 (2016), 448–458.
DOI :Google ScholarDigital Library . - 2015. Automatic script identification in the wild. In Proceedings of the International Conference on Document Analysis and Recognition, Vol. 2015-November. 531–535.
DOI : Google ScholarDigital Library . - 2015. New gradient-spatial-structural features for video script identification. Computer Vision and Image Understanding 130 (2015), 35–53.
DOI :Google ScholarDigital Library . - 2016. A simple and effective solution for script identification in the wild. In Proceedings of the 12th IAPR International Workshop on Document Analysis Systems, 428–433.
DOI : Google ScholarCross Ref . - 2020. A benchmark dataset of online handwritten gurmukhi script words and numerals. Communications in Computer and Information Science 1148 CCIS, March (2020), 457–466.
DOI : Google ScholarCross Ref . - 2017. Script identification of multi-script documents: A survey. IEEE Access 5 (2017), 6546–6559.
DOI : Google ScholarCross Ref . - 2020. Improved word-level handwritten indic script identification by integrating small convolutional neural networks. Neural Computing and Applications 32, 7 (2020), 2829–2844.
DOI : Google ScholarCross Ref . - 2017. Script identification in natural scene images: A dataset and texture-feature based performance evaluation. In Proceedings of the International Conference on Computer Vision and Image Processing, Vol. 460. 309–319.
DOI : Google ScholarCross Ref . - 2018. Bag of local convolutional triplets for script identification in scene text. In Proceedings of the International Conference on Document Analysis and Recognition, Vol. 1. 369–375.
DOI : Google ScholarCross Ref .
Index Terms
- Word Level Script Identification Using Convolutional Neural Network Enhancement for Scenic Images
Recommendations
Word-Wise Thai and Roman Script Identification
In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script ...
Word-Level Script Identification Using Texture Based Features
Script identification is an appealing research interest in the field of document image analysis during the last few decades. The accurate recognition of the script is paramount to many post-processing steps such as automated document sorting, machine ...
Line Parameter based Word-Level Indic Script Identification System
In this paper, a line parameter based approach is presented to identify the handwritten scripts written in eight popular scripts. Since Optical Character Recognition OCR engines are usually script-dependent, automatic text recognition in multi-script ...
Comments