Top

International Journal of Data Science and Analytics

Published in:

18-01-2022 | Regular Paper

Telugu named entity recognition using bert

Authors: SaiKiranmai Gorla, Sai Sharan Tangeda, Lalita Bhanu Murthy Neti, Aruna Malapati

Published in: International Journal of Data Science and Analytics | Issue 2/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Named entity recognition (NER) is a fundamental step for many Natural Language Processing tasks that aim to classify words into a predefined set of named entities (NE). For high-resource languages like English, many deep learning architectures have produced good results. However, the NER task has not yet achieved much progress for Telugu, a low resource Language. This paper performs the NER task on Telugu Language using Word2Vec, Glove, FastText, Contextual String embedding, and bidirectional encoder representations from transformers (BERT) embeddings generated using Telugu Wikipedia articles. These embeddings have been used as input to build deep learning models. We also investigated the effect of concatenating handcrafted features with the word embeddings on the deep learning model’s performance. Our experimental results demonstrate that embeddings generated from BERT added with handcrafted features have outperformed other word embedding models with an F1-Score 96.32%.

previous article AutoML: state of the art with a focus on anomaly detection, challenges, and research directions

next article Comparative analysis of different crossover structures for solving a periodic inventory routing problem

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://dumps.wikimedia.org/tewiki/.

https://meta.wikimedia.org/wiki/List_of_Wikipedias.

http://fire.irsi.res.in/fire/2018/home.

Grishman, R., Sundheim, B.: Message understanding conference- 6: a brief history. In: COLING 1996, The 16th International Conference on Computational Linguistics, vol. 1 (1996)

Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)CrossRef

Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1 (Long Papers), Berlin, pp. 1064–1074 (2016)

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef

Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649 (2018)

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Christopher, C. (2018)

Devlin, J., Chang, M.-W., Lee, K., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186 (2019)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, Bengio, U.V., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.), Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., pp. 5998–6008 (2017)

Srikanth, P., Murthy, V.: Named entity recognition for Telugu. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (2008)

10.

Shishtla, P.M., Gali, K., Pingali, P., Varma, V.: Experiments in Telugu NER: a conditional random field approach. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (2008)

11.

Raju, G.V., Srinivasu, B., Raju, S.V., Balaram, A.: Named entity recognition for Telugu using conditional random field. Int. J. Comput. Linguist. (IJCL) 1(3), 36 (2010)

12.

Sasidhar, B., Yohan, P.M., Babu, A.V., Govardhan, A.: Named entity recognition in Telugu language using language dependent features and rule based approach. Int. J. Comput. Appl. 22(8), 30–34 (2011)

13.

Gorla, S., Bhanu Murthy, N. L., Malapati, A.: A comparative study of named entity recognition for telugu. In: FIRE’17, New York, NY, pp. 21–24 (2017)

14.

Gorla, S., Velivelli, S., Bhanu Murthy, N.L., Malapati, A.: Named entity recognition for Telugu news articles using naïve bayes classifier. In: Albakour, D., Corney, D., Gonzalo, J., Martinez-Alvarez, M., Poblete, B., Valochas, A. (eds.), Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, March 26, 2018, CEUR Workshop Proceedings, vol. 2079, pp. 33–38 (2018)

15.

Gorla, S., Chandrashekhar, A., Bhanu Murthy, N.L., Malapati, A.: Telneclus: Telugu named entity clustering using semantic similarity. In: Verma, N.K., Ghosh, A.K. (eds.), Computational Intelligence: Theories, Applications and Future Directions, vol. II, Singapore, pp. 39–52 (2019)

16.

Gorla, S., Neti, L., Bhanu, M., Malapati, A.: Enhancing the performance of Telugu named entity recognition using gazetteer features. Information 11(2), 8 (2020)CrossRef

17.

Adusumilli, M., Gorla, S.K., Neti, L.B.M., Reddy, A.J., Malapati, A.: Named entity recognition for telugu using lstm-crf. In: Jha, G.N., Bali, K., Sobha, L., Ojha, A.K. (eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, France (2018)

18.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(76), 2493–2537 (2011)MATH

19.

dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. CoRR. arXiv:1505.05008 (2015)

20.

Kaur, K.: Khushleen@iecsil-fire-2018: Indic language named entity recognition using bidirectional lstms with subword information. In: Parth, M., Paolo, R., Prasenjit, M., Mandar, M, (eds.), Working Notes of FIRE 2018—Forum for Information Retrieval Evaluation, Gandhinagar, India, December 6–9, CEUR Workshop Proceedings, vol. 2266, CEUR-WS.org, pp. 153–157 (2018)

21.

Bhattu, S.N., Krishna, N.S., Somayajulu, D.V.: idrbt-team-a@iecsil-fire-2018 named entity recognition of Indian languages using bi-lstm’, booktitle =

22.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. CoRR. arXiv:1603.01360 (2016)

23.

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Yoshua, B., Yann, L. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, Workshop Track Proceedings (2013)

24.

Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543 (2014)

25.

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of of NAACL (2018)

26.

Ghaddar, A., Langlais, P.: Robust lexical features for improved neural network named-entity recognition (2018)

27.

Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging (2015)

28.

Strubell, E., Verga, P., Belanger, D., McCallum, A.: Fast and accurate entity recognition with iterated dilated convolutions (2017)

29.

Aguilar, G., Maharjan, S., López Monroy, A.P., Solorio, T.: A multi-task approach for named entity recognition in social media data. In: Proceedings of the 3rd Workshop on Noisy User-generated Text (2017)

30.

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

31.

Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using BERT-CRF. arXiv preprint arXiv:1909.10649 (2019)

32.

de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., Nissim, M. Bertje: a dutch bert model. arXiv preprint arXiv:1912.09582 (2019)

33.

Li, X., Zhang, H., Zhou, X.-H.: Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inform. 107, 103422 (2020)CrossRef

34.

Yonghui, W., Schuster, M., Chen, Z., Le, Q.V., et al.: Bridging the gap between human and machine translation, Google’s neural machine translation system (2016)

35.

Bharadwaja Kumar, G., Muthy, Kavi Narayana, Chaushri, B.B.: Statistical analyses of Telugu text corpora. IJDL Int. J. Dravid. Linguist. 36(2), 71–99 (2007)

36.

Gorla, S., Velivelli, S., Satpathi, D. K., Bhanu Murthy, N.L., Malapati, A.: Named entity recognition using part-of-speech rules for Telugu. In: Elçi, A., Sa, P.K., Modi, C.N. Olague, G., Sahoo, M.N., Bakshi, S. (eds.), Smart Computing Paradigms: New Progresses and Challenges, Singapore, pp. 147–157 (2020)

37.

Reddy, S., Sharoff, S.: Cross language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. In: Proceedings of the Fifth International Workshop On Cross Lingual Information Access, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 11–19 (2011)

38.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

Title: Telugu named entity recognition using bert
Authors: SaiKiranmai Gorla
Sai Sharan Tangeda
Lalita Bhanu Murthy Neti
Aruna Malapati
Publication date: 18-01-2022
Publisher: Springer International Publishing
Published in: International Journal of Data Science and Analytics / Issue 2/2022
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI: https://doi.org/10.1007/s41060-021-00305-w

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2022

Comparative analysis of different crossover structures for solving a periodic inventory routing problem

Adaptive k-center and diameter estimation in sliding windows

Accurate and efficient privacy-preserving string matching

Collective intelligence and knowledge exploration: an introduction

AutoML: state of the art with a focus on anomaly detection, challenges, and research directions

Video emotion analysis enhanced by recognizing emotion in video comments

Premium Partner