Skip to main content
Top

2023 | OriginalPaper | Chapter

Bengali POS Tagging Using Bi-LSTM with Word Embedding and Character-Level Embedding

Authors : Kaushik Bose, Kamal Sarkar

Published in: Proceedings of International Conference on Frontiers in Computing and Systems

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Part-of-speech tagging (POS) is an important and very fundamental process in natural language processing (NLP). POS tagging is required as a preprocessing task in many types of linguistic research such as named entity recognition (NER), word sense disambiguation, information extraction, natural language translation, and sentiment analysis. In this paper, we propose a practical Bengali POS tagger, which takes as input a text written in Bengali and gives a POS tagged output. In recent times, Bi-LSTM networks have been proven effective in sequential data processing but not very much tested on resource-poor and inflectional languages like Bengali. This paper addresses the issues of the POS tagging task for the Bengali language using Bi-LSTM with transfer learning by applying pre-trained word embedding information. The POS tagged output from our proposed model can be used directly for other applications of Bengali language processing as our proposed tagger can also handle out-of-vocabulary (OOV) words. Our experiment reveals that Bi-LSTM with transfer learning is effective for tagging Bengali documents.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jurafsky D, Martin JH (2009) Speech and language processing, Pearson Jurafsky D, Martin JH (2009) Speech and language processing, Pearson
2.
go back to reference Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef
3.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
4.
go back to reference Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef
5.
go back to reference Horsmann T, Zesch T (2017) “Do LSTMs really work so well for PoS tagging?—a replication study.” In Empirical methods in natural language processing. Copenhagen, Denmark Horsmann T, Zesch T (2017) “Do LSTMs really work so well for PoS tagging?—a replication study.” In Empirical methods in natural language processing. Copenhagen, Denmark
6.
go back to reference Plank B, Søgaard A, Goldberg Y (2016) “Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss.” In 54th annual meeting of the association for computational linguistics, Berlin, Germany Plank B, Søgaard A, Goldberg Y (2016) “Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss.” In 54th annual meeting of the association for computational linguistics, Berlin, Germany
7.
go back to reference Wai TT (2019) Myanmar language part-of-speech tagging using deep learning models. Int J Sci Eng Res 10(3):1020–1024 Wai TT (2019) Myanmar language part-of-speech tagging using deep learning models. Int J Sci Eng Res 10(3):1020–1024
8.
go back to reference Kumar S, Kumar MA, Soman K (2019) Deep learning based part-of-speech tagging for Malayalam twitter data. J Intell Syst 28(3):423–435CrossRef Kumar S, Kumar MA, Soman K (2019) Deep learning based part-of-speech tagging for Malayalam twitter data. J Intell Syst 28(3):423–435CrossRef
9.
go back to reference Anbananthen KSM, Krishnan JK, Sayeed MS, Muniapan P (2017) Comparison of stochastic and rule-based pos tagging on Malay online text. Am J Appl Sci 14(9):843–851CrossRef Anbananthen KSM, Krishnan JK, Sayeed MS, Muniapan P (2017) Comparison of stochastic and rule-based pos tagging on Malay online text. Am J Appl Sci 14(9):843–851CrossRef
10.
go back to reference Huang Z, Eidelman V, Harper M (2009) “Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training.” In Proceedings of human language technologies: the 2009 annual conference of the North American Chapter of the association for computational linguistics, companion volume, Short papers, Boulder, Colorado Huang Z, Eidelman V, Harper M (2009) “Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training.” In Proceedings of human language technologies: the 2009 annual conference of the North American Chapter of the association for computational linguistics, companion volume, Short papers, Boulder, Colorado
11.
go back to reference Lee SZ, Tsujii JI, Rim HC (2000) “Part-of-speech tagging based on hidden Markov model assuming joint independence.” In Proceedings of the 38th annual meeting on association for computational linguistics, Hong Kong Lee SZ, Tsujii JI, Rim HC (2000) “Part-of-speech tagging based on hidden Markov model assuming joint independence.” In Proceedings of the 38th annual meeting on association for computational linguistics, Hong Kong
12.
go back to reference Dandapat S, Sarkar S, Basu A (2007) “Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario.” In Proceedings of the 45th annual meeting of the acl on interactive poster and demonstration sessions, Prague, Czech Republic Dandapat S, Sarkar S, Basu A (2007) “Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario.” In Proceedings of the 45th annual meeting of the acl on interactive poster and demonstration sessions, Prague, Czech Republic
13.
go back to reference Ekbal A, Bandyopadhyay S (2008) “Part of speech tagging in Bengali using support vector machine.” In International conference on information technology, Bhubaneswar, India Ekbal A, Bandyopadhyay S (2008) “Part of speech tagging in Bengali using support vector machine.” In International conference on information technology, Bhubaneswar, India
14.
go back to reference Ekbal A, Haque R, Bandyopadhyay S (2007) “Bengali part of speech tagging using conditional random field.” In 7th international symposium of natural language processing (SNLP), Pattaya, Thailand Ekbal A, Haque R, Bandyopadhyay S (2007) “Bengali part of speech tagging using conditional random field.” In 7th international symposium of natural language processing (SNLP), Pattaya, Thailand
15.
go back to reference Ekbal A, Hasanuzzaman M, Bandyopadhyay S (2009) “Voted approach for part of speech tagging in Bengali.” In 23rd Pacific Asia conference on language, information and computation, Hong Kong. Ekbal A, Hasanuzzaman M, Bandyopadhyay S (2009) “Voted approach for part of speech tagging in Bengali.” In 23rd Pacific Asia conference on language, information and computation, Hong Kong.
16.
go back to reference Sarkar K, Gayen V (2012) “A practical part-of-speech tagger for Bengali.” In 2012 third international conference on emerging applications of information technology. Kolkata, India Sarkar K, Gayen V (2012) “A practical part-of-speech tagger for Bengali.” In 2012 third international conference on emerging applications of information technology. Kolkata, India
17.
go back to reference Sarkar K, Gayen V (2013) “A trigram HMM-based POS tagger for Indian languages.” In International conference on frontiers of intelligent computing: theory and applications (FICTA), Berlin, Heidelberg Sarkar K, Gayen V (2013) “A trigram HMM-based POS tagger for Indian languages.” In International conference on frontiers of intelligent computing: theory and applications (FICTA), Berlin, Heidelberg
18.
go back to reference Sarkar K (2016) A CRF based POS tagger for code-mixed Indian social media text. arXiv Sarkar K (2016) A CRF based POS tagger for code-mixed Indian social media text. arXiv
19.
go back to reference Kabir MF, Abdullah-Al-Mamun K, Huda MN (2016) “Deep learning based parts of speech tagger for Bengali.” In 5th international conference on informatics, electronics and vision (ICIEV), Dhaka, Bangladesh Kabir MF, Abdullah-Al-Mamun K, Huda MN (2016) “Deep learning based parts of speech tagger for Bengali.” In 5th international conference on informatics, electronics and vision (ICIEV), Dhaka, Bangladesh
22.
go back to reference Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
23.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) “Distributed representations of words and phrases and their compositionality.” In Proceedings of the 26th international conference on neural information processing systems, vol 2, Lake Tahoe, Nevada Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) “Distributed representations of words and phrases and their compositionality.” In Proceedings of the 26th international conference on neural information processing systems, vol 2, Lake Tahoe, Nevada
24.
go back to reference Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space, arXiv Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space, arXiv
Metadata
Title
Bengali POS Tagging Using Bi-LSTM with Word Embedding and Character-Level Embedding
Authors
Kaushik Bose
Kamal Sarkar
Copyright Year
2023
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-0105-8_55