Skip to main content
Erschienen in: Neural Computing and Applications 8/2021

01.09.2020 | Original Article

HINDIA: a deep-learning-based model for spell-checking of Hindi language

verfasst von: Shashank Singh, Shailendra Singh

Erschienen in: Neural Computing and Applications | Ausgabe 8/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The spelling error is a mistake occurred while typing the text document. The applications like search engines, information retrieval, emails, etc., require user typing. In such applications, good spell-checker is essential to rectify the misspelling. Spell-checkers for western languages like English are very powerful and can handle any type of spelling errors, whereas in the case of Indian languages like Hindi, Urdu, Bengali, Kannada, Assamese, etc., the available spell-checkers are very basic ones. These spell-checkers are developed using traditional methods like statistical methods and rule-based methods. This article presents a novel model HINDIA to handle the spelling errors of the Hindi language, one of the most spoken languages in India. It utilizes a deep-learning method for spelling error detection and correction. The proposed spell-checking model works in two phases. In the first phase model identifies the erroneous words in the input sample and in the second phase it replaces the wrong words with the most probable correct words. Model HINDIA is developed using the attention-based encoder–decoder bidirectional recurrent neural network (BiRNN) which uses long short-term memory cells. Several modifications in the BiRNN have been made and network is fine-tuned to process the spelling errors of Hindi language. It uses publicly available dataset ‘monolingual corpus’ developed by IIT Mumbai for training and testing. The performance of the proposed model is evaluated in two scenarios. In the first scenario where the testing dataset is generated using split function. HINDIA performs significantly well with precision 0.86, recall 0.72, f-measure 0.78 and accuracy 0.80. Further, in the second scenario, where a dataset is manually generated its performance is fairly good with precision 0.81, recall 0.72, f-measure 0.76 and accuracy 0.74. Model HINDIA gives better performance than the deep-learning-based Malayalam spell-checker and some other deep-learning-based correction models present in the literature.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Uddin MZ, Hassan MM (2019) Activity recognition for cognitive assistance using body sensors data and deep convolutional neural network. IEEE Sens J 19(19):8413–8419CrossRef Uddin MZ, Hassan MM (2019) Activity recognition for cognitive assistance using body sensors data and deep convolutional neural network. IEEE Sens J 19(19):8413–8419CrossRef
2.
Zurück zum Zitat Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Futur Gener Comput Syst 81:307–313CrossRef Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Futur Gener Comput Syst 81:307–313CrossRef
3.
Zurück zum Zitat Reshma U, Ganesh HBB, Mandar K, Mankame P, Kulkarni G (2018) Deep learning for digital text analytics: sentiment analysis, pp 1–8. arXiv Prepr. arXiv:1804.03673 Reshma U, Ganesh HBB, Mandar K, Mankame P, Kulkarni G (2018) Deep learning for digital text analytics: sentiment analysis, pp 1–8. arXiv Prepr. arXiv:​1804.​03673
4.
Zurück zum Zitat Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval—SIGIR’03, vol 49, no. 2, p 72 Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval—SIGIR’03, vol 49, no. 2, p 72
5.
Zurück zum Zitat Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv Prepr. arXiv:1611.06639 Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv Prepr. arXiv:​1611.​06639
6.
Zurück zum Zitat Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv Prepr. arXiv:1604.05529 Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv Prepr. arXiv:​1604.​05529
7.
8.
Zurück zum Zitat Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka
9.
Zurück zum Zitat Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188 Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188
10.
Zurück zum Zitat Korhonen T (2008) Adaptive spell checker for dyslexic writers. In: Miesenberger K, Klaus J, Zagler W, Karshmer A. In: Comput. help. people with spec. needs. ICCHP 2008. Lect. notes comput. sci., vol 5105, pp 733–741 Korhonen T (2008) Adaptive spell checker for dyslexic writers. In: Miesenberger K, Klaus J, Zagler W, Karshmer A. In: Comput. help. people with spec. needs. ICCHP 2008. Lect. notes comput. sci., vol 5105, pp 733–741
11.
Zurück zum Zitat Lai KH, Topaz M, Goss FR, Zhou L (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inf 55:188–195CrossRef Lai KH, Topaz M, Goss FR, Zhou L (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inf 55:188–195CrossRef
12.
Zurück zum Zitat Singh SP, Kumar A, Singh L, Bhargava M, Goyal K, Sharma B (2016) Frequency based spell checking and rule based grammar checking. In: International conference on electrical, electronics, and optimization techniques, ICEEOT 2016, pp 4435–4439 Singh SP, Kumar A, Singh L, Bhargava M, Goyal K, Sharma B (2016) Frequency based spell checking and rule based grammar checking. In: International conference on electrical, electronics, and optimization techniques, ICEEOT 2016, pp 4435–4439
13.
Zurück zum Zitat Liu PLT, Paas F (2017) Effects of spell checkers on english as a second language students’ incidental spelling learning: a cognitive load perspective. Read Writ 30(7):1501–1525CrossRef Liu PLT, Paas F (2017) Effects of spell checkers on english as a second language students’ incidental spelling learning: a cognitive load perspective. Read Writ 30(7):1501–1525CrossRef
14.
Zurück zum Zitat Al-hussaini L (2017) Experience: insights into the benchmarking data of hunspell and aspell spell checkers. ACM J Data Inf Qual 8(3):1–10 Al-hussaini L (2017) Experience: insights into the benchmarking data of hunspell and aspell spell checkers. ACM J Data Inf Qual 8(3):1–10
15.
Zurück zum Zitat Octaviano M, Borra A (2017) A spell checker for a low-resourced and morphologically rich language. In: Proceedings of the 2017 IEEE region 10 conference (TELCON), pp 1853–1856 Octaviano M, Borra A (2017) A spell checker for a low-resourced and morphologically rich language. In: Proceedings of the 2017 IEEE region 10 conference (TELCON), pp 1853–1856
16.
Zurück zum Zitat Rajashekara Murthy, S Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell checker with sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956 Rajashekara Murthy, S Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell checker with sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956
17.
Zurück zum Zitat Das M, Borgohain S, Gogoi J, Nair SB (2002) Design and implementation of a spell checker for assamese. In: Language engineering conference, proceedings IEEE, pp 156–162 Das M, Borgohain S, Gogoi J, Nair SB (2002) Design and implementation of a spell checker for assamese. In: Language engineering conference, proceedings IEEE, pp 156–162
18.
Zurück zum Zitat Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spellchecker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161 Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spellchecker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161
19.
Zurück zum Zitat Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell checker. In: Sixth tamil internet conference, Chennai, Tamilnadu, India, pp 18–27 Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell checker. In: Sixth tamil internet conference, Chennai, Tamilnadu, India, pp 18–27
20.
Zurück zum Zitat Christopher M, Uma Maheshwar Rao G, Amba PK, (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8 Christopher M, Uma Maheshwar Rao G, Amba PK, (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8
21.
Zurück zum Zitat Singh S, Singh S (2018) Review of real-word error detection and correction methods in text documents. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1076–1081 Singh S, Singh S (2018) Review of real-word error detection and correction methods in text documents. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1076–1081
22.
Zurück zum Zitat Jain A, Jain M, Jain G, Tayal DK (2018) ‘UTTAM’ An efficient spelling correction system for Hindi language based on supervised learning. ACM Trans Asian Low-Resour Lang Inf Process 18(1):1–26CrossRef Jain A, Jain M, Jain G, Tayal DK (2018) ‘UTTAM’ An efficient spelling correction system for Hindi language based on supervised learning. ACM Trans Asian Low-Resour Lang Inf Process 18(1):1–26CrossRef
23.
Zurück zum Zitat Rajashekara MS, Madi V, Sachin D, Ramakanth PK (2012) A non-word kannada spell checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52 Rajashekara MS, Madi V, Sachin D, Ramakanth PK (2012) A non-word kannada spell checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52
24.
Zurück zum Zitat Segar J, Sarveswaran K (2015) Contextual spell checking for Tamil language. In: 14th Tamil internet conference, pp 1–5 Segar J, Sarveswaran K (2015) Contextual spell checking for Tamil language. In: 14th Tamil internet conference, pp 1–5
25.
Zurück zum Zitat Fossati D, Di Eugenio B (2007) I Saw TREE trees in the park : how to correct real-word spelling mistakes. In: LREC, pp 896–901 Fossati D, Di Eugenio B (2007) I Saw TREE trees in the park : how to correct real-word spelling mistakes. In: LREC, pp 896–901
26.
Zurück zum Zitat Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353 Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353
27.
Zurück zum Zitat Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON) Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON)
28.
Zurück zum Zitat Naseem T, Hussain S (2007) A Novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128CrossRef Naseem T, Hussain S (2007) A Novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128CrossRef
29.
Zurück zum Zitat Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell checking : reverse edit distance approach. In: Proceedings of the 4th workshop on south and southeast asian natural language processing, pp 58–65 Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell checking : reverse edit distance approach. In: Proceedings of the 4th workshop on south and southeast asian natural language processing, pp 58–65
30.
Zurück zum Zitat Ghosh S, Kristensson PO (2015) Neural networks for text correction and completion in keyboard decoding. J Letex Cl Files 14(8):1–14 Ghosh S, Kristensson PO (2015) Neural networks for text correction and completion in keyboard decoding. J Letex Cl Files 14(8):1–14
31.
Zurück zum Zitat Sakaguchi K, Duh K, Post M, Van Durme B (2017) Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-first AAAI conference on artificial intelligence, pp 3281–3287 Sakaguchi K, Duh K, Post M, Van Durme B (2017) Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-first AAAI conference on artificial intelligence, pp 3281–3287
32.
Zurück zum Zitat Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell checker for malayalam language. J Intell Fuzzy Syst 34(3):1427–1434CrossRef Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell checker for malayalam language. J Intell Fuzzy Syst 34(3):1427–1434CrossRef
33.
Zurück zum Zitat Gumaei A, Hassan MM, Alelaiwi A, Alsalman H (2019) A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7:99152–99160CrossRef Gumaei A, Hassan MM, Alelaiwi A, Alsalman H (2019) A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7:99152–99160CrossRef
34.
Zurück zum Zitat Uddin MZ, Hassan MM, Alsanad A, Savaglio C (2020) A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf Fusion 55:105–115CrossRef Uddin MZ, Hassan MM, Alsanad A, Savaglio C (2020) A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf Fusion 55:105–115CrossRef
35.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
36.
Zurück zum Zitat Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143MathSciNetMATH Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143MathSciNetMATH
37.
Zurück zum Zitat Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. pp 1–11 Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. pp 1–11
38.
Zurück zum Zitat Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
39.
Zurück zum Zitat Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: CoNLL 2016 - 20th SIGNLL conf. comput. nat. lang. learn. proc., pp 10–21 Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: CoNLL 2016 - 20th SIGNLL conf. comput. nat. lang. learn. proc., pp 10–21
40.
Zurück zum Zitat Tong E, Jones C, Zadeh A, Morency LP (2017) Combating human trafficking with deep multimodal models. In: ACL 2017—55th annu. meet. assoc. comput. linguist. proc. conf. (Long Pap.) vol 1, pp 1547–1556 Tong E, Jones C, Zadeh A, Morency LP (2017) Combating human trafficking with deep multimodal models. In: ACL 2017—55th annu. meet. assoc. comput. linguist. proc. conf. (Long Pap.) vol 1, pp 1547–1556
41.
Zurück zum Zitat Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef
42.
Zurück zum Zitat Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: 30th conference on neural information processing systems (NIPS 2016), pp 1–8 Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: 30th conference on neural information processing systems (NIPS 2016), pp 1–8
43.
Zurück zum Zitat Kunchukuttan A, Mehta P, Bhattacharyya P (2018) The IIT Bombay English-Hindi parallel corpus. In: Language resources and evaluation conference Kunchukuttan A, Mehta P, Bhattacharyya P (2018) The IIT Bombay English-Hindi parallel corpus. In: Language resources and evaluation conference
44.
Zurück zum Zitat Bojar O et al (2014) HindiEnCorp- Hindi-English and Hindi only corpus for machine translation. In: Ninth workshop on statistical machine translation, pp 3550–3555 Bojar O et al (2014) HindiEnCorp- Hindi-English and Hindi only corpus for machine translation. In: Ninth workshop on statistical machine translation, pp 3550–3555
45.
Zurück zum Zitat Kaur B, Singh H (2015) Design and implementation of HINSPELL—Hindi spell checker using hybrid approach. Int J Sci Res Manag 3(2):20158–22062 Kaur B, Singh H (2015) Design and implementation of HINSPELL—Hindi spell checker using hybrid approach. Int J Sci Res Manag 3(2):20158–22062
Metadaten
Titel
HINDIA: a deep-learning-based model for spell-checking of Hindi language
verfasst von
Shashank Singh
Shailendra Singh
Publikationsdatum
01.09.2020
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 8/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05207-9

Weitere Artikel der Ausgabe 8/2021

Neural Computing and Applications 8/2021 Zur Ausgabe