Skip to main content
Top
Published in: Neural Computing and Applications 8/2021

01-09-2020 | Original Article

HINDIA: a deep-learning-based model for spell-checking of Hindi language

Authors: Shashank Singh, Shailendra Singh

Published in: Neural Computing and Applications | Issue 8/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The spelling error is a mistake occurred while typing the text document. The applications like search engines, information retrieval, emails, etc., require user typing. In such applications, good spell-checker is essential to rectify the misspelling. Spell-checkers for western languages like English are very powerful and can handle any type of spelling errors, whereas in the case of Indian languages like Hindi, Urdu, Bengali, Kannada, Assamese, etc., the available spell-checkers are very basic ones. These spell-checkers are developed using traditional methods like statistical methods and rule-based methods. This article presents a novel model HINDIA to handle the spelling errors of the Hindi language, one of the most spoken languages in India. It utilizes a deep-learning method for spelling error detection and correction. The proposed spell-checking model works in two phases. In the first phase model identifies the erroneous words in the input sample and in the second phase it replaces the wrong words with the most probable correct words. Model HINDIA is developed using the attention-based encoder–decoder bidirectional recurrent neural network (BiRNN) which uses long short-term memory cells. Several modifications in the BiRNN have been made and network is fine-tuned to process the spelling errors of Hindi language. It uses publicly available dataset ‘monolingual corpus’ developed by IIT Mumbai for training and testing. The performance of the proposed model is evaluated in two scenarios. In the first scenario where the testing dataset is generated using split function. HINDIA performs significantly well with precision 0.86, recall 0.72, f-measure 0.78 and accuracy 0.80. Further, in the second scenario, where a dataset is manually generated its performance is fairly good with precision 0.81, recall 0.72, f-measure 0.76 and accuracy 0.74. Model HINDIA gives better performance than the deep-learning-based Malayalam spell-checker and some other deep-learning-based correction models present in the literature.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Uddin MZ, Hassan MM (2019) Activity recognition for cognitive assistance using body sensors data and deep convolutional neural network. IEEE Sens J 19(19):8413–8419CrossRef Uddin MZ, Hassan MM (2019) Activity recognition for cognitive assistance using body sensors data and deep convolutional neural network. IEEE Sens J 19(19):8413–8419CrossRef
2.
go back to reference Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Futur Gener Comput Syst 81:307–313CrossRef Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Futur Gener Comput Syst 81:307–313CrossRef
3.
go back to reference Reshma U, Ganesh HBB, Mandar K, Mankame P, Kulkarni G (2018) Deep learning for digital text analytics: sentiment analysis, pp 1–8. arXiv Prepr. arXiv:1804.03673 Reshma U, Ganesh HBB, Mandar K, Mankame P, Kulkarni G (2018) Deep learning for digital text analytics: sentiment analysis, pp 1–8. arXiv Prepr. arXiv:​1804.​03673
4.
go back to reference Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval—SIGIR’03, vol 49, no. 2, p 72 Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval—SIGIR’03, vol 49, no. 2, p 72
5.
go back to reference Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv Prepr. arXiv:1611.06639 Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv Prepr. arXiv:​1611.​06639
6.
go back to reference Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv Prepr. arXiv:1604.05529 Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv Prepr. arXiv:​1604.​05529
8.
go back to reference Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka
9.
go back to reference Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188 Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188
10.
go back to reference Korhonen T (2008) Adaptive spell checker for dyslexic writers. In: Miesenberger K, Klaus J, Zagler W, Karshmer A. In: Comput. help. people with spec. needs. ICCHP 2008. Lect. notes comput. sci., vol 5105, pp 733–741 Korhonen T (2008) Adaptive spell checker for dyslexic writers. In: Miesenberger K, Klaus J, Zagler W, Karshmer A. In: Comput. help. people with spec. needs. ICCHP 2008. Lect. notes comput. sci., vol 5105, pp 733–741
11.
go back to reference Lai KH, Topaz M, Goss FR, Zhou L (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inf 55:188–195CrossRef Lai KH, Topaz M, Goss FR, Zhou L (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inf 55:188–195CrossRef
12.
go back to reference Singh SP, Kumar A, Singh L, Bhargava M, Goyal K, Sharma B (2016) Frequency based spell checking and rule based grammar checking. In: International conference on electrical, electronics, and optimization techniques, ICEEOT 2016, pp 4435–4439 Singh SP, Kumar A, Singh L, Bhargava M, Goyal K, Sharma B (2016) Frequency based spell checking and rule based grammar checking. In: International conference on electrical, electronics, and optimization techniques, ICEEOT 2016, pp 4435–4439
13.
go back to reference Liu PLT, Paas F (2017) Effects of spell checkers on english as a second language students’ incidental spelling learning: a cognitive load perspective. Read Writ 30(7):1501–1525CrossRef Liu PLT, Paas F (2017) Effects of spell checkers on english as a second language students’ incidental spelling learning: a cognitive load perspective. Read Writ 30(7):1501–1525CrossRef
14.
go back to reference Al-hussaini L (2017) Experience: insights into the benchmarking data of hunspell and aspell spell checkers. ACM J Data Inf Qual 8(3):1–10 Al-hussaini L (2017) Experience: insights into the benchmarking data of hunspell and aspell spell checkers. ACM J Data Inf Qual 8(3):1–10
15.
go back to reference Octaviano M, Borra A (2017) A spell checker for a low-resourced and morphologically rich language. In: Proceedings of the 2017 IEEE region 10 conference (TELCON), pp 1853–1856 Octaviano M, Borra A (2017) A spell checker for a low-resourced and morphologically rich language. In: Proceedings of the 2017 IEEE region 10 conference (TELCON), pp 1853–1856
16.
go back to reference Rajashekara Murthy, S Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell checker with sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956 Rajashekara Murthy, S Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell checker with sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956
17.
go back to reference Das M, Borgohain S, Gogoi J, Nair SB (2002) Design and implementation of a spell checker for assamese. In: Language engineering conference, proceedings IEEE, pp 156–162 Das M, Borgohain S, Gogoi J, Nair SB (2002) Design and implementation of a spell checker for assamese. In: Language engineering conference, proceedings IEEE, pp 156–162
18.
go back to reference Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spellchecker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161 Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spellchecker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161
19.
go back to reference Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell checker. In: Sixth tamil internet conference, Chennai, Tamilnadu, India, pp 18–27 Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell checker. In: Sixth tamil internet conference, Chennai, Tamilnadu, India, pp 18–27
20.
go back to reference Christopher M, Uma Maheshwar Rao G, Amba PK, (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8 Christopher M, Uma Maheshwar Rao G, Amba PK, (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8
21.
go back to reference Singh S, Singh S (2018) Review of real-word error detection and correction methods in text documents. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1076–1081 Singh S, Singh S (2018) Review of real-word error detection and correction methods in text documents. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1076–1081
22.
go back to reference Jain A, Jain M, Jain G, Tayal DK (2018) ‘UTTAM’ An efficient spelling correction system for Hindi language based on supervised learning. ACM Trans Asian Low-Resour Lang Inf Process 18(1):1–26CrossRef Jain A, Jain M, Jain G, Tayal DK (2018) ‘UTTAM’ An efficient spelling correction system for Hindi language based on supervised learning. ACM Trans Asian Low-Resour Lang Inf Process 18(1):1–26CrossRef
23.
go back to reference Rajashekara MS, Madi V, Sachin D, Ramakanth PK (2012) A non-word kannada spell checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52 Rajashekara MS, Madi V, Sachin D, Ramakanth PK (2012) A non-word kannada spell checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52
24.
go back to reference Segar J, Sarveswaran K (2015) Contextual spell checking for Tamil language. In: 14th Tamil internet conference, pp 1–5 Segar J, Sarveswaran K (2015) Contextual spell checking for Tamil language. In: 14th Tamil internet conference, pp 1–5
25.
go back to reference Fossati D, Di Eugenio B (2007) I Saw TREE trees in the park : how to correct real-word spelling mistakes. In: LREC, pp 896–901 Fossati D, Di Eugenio B (2007) I Saw TREE trees in the park : how to correct real-word spelling mistakes. In: LREC, pp 896–901
26.
go back to reference Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353 Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353
27.
go back to reference Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON) Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON)
28.
go back to reference Naseem T, Hussain S (2007) A Novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128CrossRef Naseem T, Hussain S (2007) A Novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128CrossRef
29.
go back to reference Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell checking : reverse edit distance approach. In: Proceedings of the 4th workshop on south and southeast asian natural language processing, pp 58–65 Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell checking : reverse edit distance approach. In: Proceedings of the 4th workshop on south and southeast asian natural language processing, pp 58–65
30.
go back to reference Ghosh S, Kristensson PO (2015) Neural networks for text correction and completion in keyboard decoding. J Letex Cl Files 14(8):1–14 Ghosh S, Kristensson PO (2015) Neural networks for text correction and completion in keyboard decoding. J Letex Cl Files 14(8):1–14
31.
go back to reference Sakaguchi K, Duh K, Post M, Van Durme B (2017) Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-first AAAI conference on artificial intelligence, pp 3281–3287 Sakaguchi K, Duh K, Post M, Van Durme B (2017) Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-first AAAI conference on artificial intelligence, pp 3281–3287
32.
go back to reference Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell checker for malayalam language. J Intell Fuzzy Syst 34(3):1427–1434CrossRef Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell checker for malayalam language. J Intell Fuzzy Syst 34(3):1427–1434CrossRef
33.
go back to reference Gumaei A, Hassan MM, Alelaiwi A, Alsalman H (2019) A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7:99152–99160CrossRef Gumaei A, Hassan MM, Alelaiwi A, Alsalman H (2019) A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7:99152–99160CrossRef
34.
go back to reference Uddin MZ, Hassan MM, Alsanad A, Savaglio C (2020) A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf Fusion 55:105–115CrossRef Uddin MZ, Hassan MM, Alsanad A, Savaglio C (2020) A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf Fusion 55:105–115CrossRef
35.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
36.
go back to reference Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143MathSciNetMATH Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143MathSciNetMATH
37.
go back to reference Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. pp 1–11 Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. pp 1–11
38.
go back to reference Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
39.
go back to reference Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: CoNLL 2016 - 20th SIGNLL conf. comput. nat. lang. learn. proc., pp 10–21 Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: CoNLL 2016 - 20th SIGNLL conf. comput. nat. lang. learn. proc., pp 10–21
40.
go back to reference Tong E, Jones C, Zadeh A, Morency LP (2017) Combating human trafficking with deep multimodal models. In: ACL 2017—55th annu. meet. assoc. comput. linguist. proc. conf. (Long Pap.) vol 1, pp 1547–1556 Tong E, Jones C, Zadeh A, Morency LP (2017) Combating human trafficking with deep multimodal models. In: ACL 2017—55th annu. meet. assoc. comput. linguist. proc. conf. (Long Pap.) vol 1, pp 1547–1556
41.
go back to reference Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef
42.
go back to reference Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: 30th conference on neural information processing systems (NIPS 2016), pp 1–8 Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: 30th conference on neural information processing systems (NIPS 2016), pp 1–8
43.
go back to reference Kunchukuttan A, Mehta P, Bhattacharyya P (2018) The IIT Bombay English-Hindi parallel corpus. In: Language resources and evaluation conference Kunchukuttan A, Mehta P, Bhattacharyya P (2018) The IIT Bombay English-Hindi parallel corpus. In: Language resources and evaluation conference
44.
go back to reference Bojar O et al (2014) HindiEnCorp- Hindi-English and Hindi only corpus for machine translation. In: Ninth workshop on statistical machine translation, pp 3550–3555 Bojar O et al (2014) HindiEnCorp- Hindi-English and Hindi only corpus for machine translation. In: Ninth workshop on statistical machine translation, pp 3550–3555
45.
go back to reference Kaur B, Singh H (2015) Design and implementation of HINSPELL—Hindi spell checker using hybrid approach. Int J Sci Res Manag 3(2):20158–22062 Kaur B, Singh H (2015) Design and implementation of HINSPELL—Hindi spell checker using hybrid approach. Int J Sci Res Manag 3(2):20158–22062
Metadata
Title
HINDIA: a deep-learning-based model for spell-checking of Hindi language
Authors
Shashank Singh
Shailendra Singh
Publication date
01-09-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 8/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05207-9

Other articles of this Issue 8/2021

Neural Computing and Applications 8/2021 Go to the issue

Premium Partner