Skip to main content
Top

2018 | OriginalPaper | Chapter

Design and Development of a Dictionary Based Stemmer for Marathi Language

Authors : Harshali B. Patil, Neelima T. Mhaske, Ajay S. Patil

Published in: Smart and Innovative Trends in Next Generation Computing Technologies

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Stemming is one of the term conflation techniques used to reduce morphological variations of the term into a unique term called as “stem”. Stemming is one of the significant pre-processing steps performed in various applications of natural language processing (NLP) and information retrieval (IR): like machine translation, named entity recognition, automated document processing, etc. In this paper, we focus on the development of automated stemmer for the Marathi language. We have adopted the dictionary lookup technique for this task. The experiment is tested on news articles in the Marathi language consists of 4500 words. The proposed stemmer achieved a maximum accuracy of 80.6% when tested on nine different runs. The over-stemming error rate is low. The satisfactory result of proposed stemmer encourages us to use this stemmer for the information retrieval task.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lovins, J.B.: Development of a stemming algorithm. J. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968) Lovins, J.B.: Development of a stemming algorithm. J. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968)
2.
go back to reference Dawson, J.: Suffix removal and word conflation. Bull. Assoc. Lit. Linguist. Comput. 2(3), 33–46 (1974) Dawson, J.: Suffix removal and word conflation. Bull. Assoc. Lit. Linguist. Comput. 2(3), 33–46 (1974)
3.
go back to reference Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef
4.
go back to reference Harman, D.: How effective is suffixing? J. Am. Soc. Inf. Sci. 42(1), 7 (1991)CrossRef Harman, D.: How effective is suffixing? J. Am. Soc. Inf. Sci. 42(1), 7 (1991)CrossRef
5.
go back to reference Hull, D.A.: Stemming algorithms: A case study for detailed evaluation. JASIS 47(1), 70–84 (1996)CrossRef Hull, D.A.: Stemming algorithms: A case study for detailed evaluation. JASIS 47(1), 70–84 (1996)CrossRef
6.
go back to reference Aljlayl, M., Frieder O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 340–347. ACM (2002) Aljlayl, M., Frieder O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 340–347. ACM (2002)
7.
go back to reference Orengo, V.M., Buriol, L.S., Coelho, A.R.: A study on the use of stemming for monolingual ad-hoc Portuguese information retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 91–98. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74999-8_12CrossRef Orengo, V.M., Buriol, L.S., Coelho, A.R.: A study on the use of stemming for monolingual ad-hoc Portuguese information retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 91–98. Springer, Heidelberg (2007). https://​doi.​org/​10.​1007/​978-3-540-74999-8_​12CrossRef
8.
go back to reference Taghva, K., Beckley, R., Sadeh, M.: A stemming algorithm for the Farsi language. In: International Conference on in Information Technology: Coding and Computing, vol. 1, pp. 158–162 (2005) Taghva, K., Beckley, R., Sadeh, M.: A stemming algorithm for the Farsi language. In: International Conference on in Information Technology: Coding and Computing, vol. 1, pp. 158–162 (2005)
9.
go back to reference Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 18–23 (2010) Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 18–23 (2010)
10.
go back to reference Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(5), 711–717 (2012) Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(5), 711–717 (2012)
11.
go back to reference Joshi, G., Garg, K.D.: Enhanced version of Punjabi stemmer using synset. Int. J. Comput. Sci. Eng. 4(5), 1060–1065 (2014) Joshi, G., Garg, K.D.: Enhanced version of Punjabi stemmer using synset. Int. J. Comput. Sci. Eng. 4(5), 1060–1065 (2014)
12.
go back to reference Patil, H.B., Pawar, B.V., Patil, A.S.: A comprehensive analysis of stemmers available for Indic languages. Int. J. Nat. Lang. Comput. 05(1), 45–55 (2016)CrossRef Patil, H.B., Pawar, B.V., Patil, A.S.: A comprehensive analysis of stemmers available for Indic languages. Int. J. Nat. Lang. Comput. 05(1), 45–55 (2016)CrossRef
13.
go back to reference Almeida, A., Bhattacharyya, P.: Experiments in N-gram based indexing and retrieval in Marathi. FIRE Working Note (2010) Almeida, A., Bhattacharyya, P.: Experiments in N-gram based indexing and retrieval in Marathi. FIRE Working Note (2010)
14.
go back to reference Majgaonker, M.M.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 02(08), 2716–2720 (2010) Majgaonker, M.M.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 02(08), 2716–2720 (2010)
15.
go back to reference Patil, H.B., Patil, A.S.: MarS: a rule based stemmer for morphologically rich language Marathi. In: IEEE International Conference on Computer, Communications and Electronics (COMPTELIX 2017), Manipal University Jaipur, 1st-2nd July 2017 (2017) Patil, H.B., Patil, A.S.: MarS: a rule based stemmer for morphologically rich language Marathi. In: IEEE International Conference on Computer, Communications and Electronics (COMPTELIX 2017), Manipal University Jaipur, 1st-2nd July 2017 (2017)
16.
go back to reference Dolamic, L., Savoy, J.: Comparative study of indexing and search strategies for the Hindi, Marathi and Bengali languages. ACM Trans. Asian Lang. Inf. Process. (TALIP) 9(3), 11 (2010). Article no. 11 Dolamic, L., Savoy, J.: Comparative study of indexing and search strategies for the Hindi, Marathi and Bengali languages. ACM Trans. Asian Lang. Inf. Process. (TALIP) 9(3), 11 (2010). Article no. 11
17.
go back to reference Husain, M.S.: An unsupervised approach to develop stemmer. Int. J. Nat. Lang. Comput. (IJNLC) 1(2), 15–23 (2012)CrossRef Husain, M.S.: An unsupervised approach to develop stemmer. Int. J. Nat. Lang. Comput. (IJNLC) 1(2), 15–23 (2012)CrossRef
18.
go back to reference Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37(01), 26–30 (2003)CrossRef Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37(01), 26–30 (2003)CrossRef
19.
go back to reference Paice, C.D.: Method for evaluation of stemming algorithms based on error counting. JASIS 47(08), 632–649 (1996)CrossRef Paice, C.D.: Method for evaluation of stemming algorithms based on error counting. JASIS 47(08), 632–649 (1996)CrossRef
Metadata
Title
Design and Development of a Dictionary Based Stemmer for Marathi Language
Authors
Harshali B. Patil
Neelima T. Mhaske
Ajay S. Patil
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8657-1_60

Premium Partner