Skip to main content
Top

2016 | OriginalPaper | Chapter

Analogy Removal Stemmer Algorithm for Tamil Text Corpora

Authors : M. Thangarasu, H. Hannah Inbarani

Published in: Digital Connectivity – Social Impact

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Stemming is the process of generating root word from the given inflectional word. Tamil Language has technical challenges in stemming because it has rich morphological patterns than other languages, so Analogy Removal Stemmer (ARS) is proposed in this research, to find stem word for the given inflection Tamil word from text corpora. The performance of the proposed approach is compared with Light Stemmer (LS) and Improved Light Stemmer (ILS) algorithms based on correctly and incorrectly predicted stem words. The experimental result clearly shows that the proposed approach ARS for Tamil corpora performs better than the LS and ILS algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Porter, M.F.: An algorithm for suffix stripping. Readings Inf. Retrieval 4, 313–316 (1980) Porter, M.F.: An algorithm for suffix stripping. Readings Inf. Retrieval 4, 313–316 (1980)
2.
go back to reference Ramachandran, V.A., Krishnamurthi, I.: An iterative suffix stripping Tamil stemmer. In: Satapathy, S.C., Avadhani, P.S., Abraham, A. (eds.) Proceedings of the InConINDIA 2012. AISC, vol. 132, pp. 583–590. Springer, Heidelberg (2012) Ramachandran, V.A., Krishnamurthi, I.: An iterative suffix stripping Tamil stemmer. In: Satapathy, S.C., Avadhani, P.S., Abraham, A. (eds.) Proceedings of the InConINDIA 2012. AISC, vol. 132, pp. 583–590. Springer, Heidelberg (2012)
3.
go back to reference Savoy, J.: A stemming procedure and stop word list for general French Corpora. J. Am. Soc. Inf. Sci. 50, 944–952 (1999). WileyCrossRef Savoy, J.: A stemming procedure and stop word list for general French Corpora. J. Am. Soc. Inf. Sci. 50, 944–952 (1999). WileyCrossRef
4.
go back to reference Kilgarriff, A., Charalabopoulo, F.: Corpus-based vocabulary lists for language learners for nine languages. Lang. Resour. Eval. 48, 121–163 (2014). SpringerCrossRef Kilgarriff, A., Charalabopoulo, F.: Corpus-based vocabulary lists for language learners for nine languages. Lang. Resour. Eval. 48, 121–163 (2014). SpringerCrossRef
5.
go back to reference Goldsmith, J.A., Higgins, D., Soglasnova, S.: Automatic language-specific stemming in information retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 273–283. Springer, Heidelberg (2001). doi:10.1007/3-540-44645-1_27CrossRefMATH Goldsmith, J.A., Higgins, D., Soglasnova, S.: Automatic language-specific stemming in information retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 273–283. Springer, Heidelberg (2001). doi:10.​1007/​3-540-44645-1_​27CrossRefMATH
6.
go back to reference Manova, S.: Suffix combinations in Bulgarian: parsability and hierarchy-based ordering. Morphology 20, 267–296 (2010). SpringerCrossRef Manova, S.: Suffix combinations in Bulgarian: parsability and hierarchy-based ordering. Morphology 20, 267–296 (2010). SpringerCrossRef
7.
go back to reference Faust, N.: Decomposing the feminine suffixes of modern Hebrew: a morpho-syntactic analysis. Morphology 23, 409–440 (2013). SpringerCrossRef Faust, N.: Decomposing the feminine suffixes of modern Hebrew: a morpho-syntactic analysis. Morphology 23, 409–440 (2013). SpringerCrossRef
8.
go back to reference Bauer, L.: Grammaticality, acceptability, possible words and large corpora. Morphology 24, 83–103 (2014). SpringerCrossRef Bauer, L.: Grammaticality, acceptability, possible words and large corpora. Morphology 24, 83–103 (2014). SpringerCrossRef
9.
go back to reference Esher, L.: Autonomous morphology and extramorphological coherence. Morphology 24, 325–350 (2014). SpringerCrossRef Esher, L.: Autonomous morphology and extramorphological coherence. Morphology 24, 325–350 (2014). SpringerCrossRef
10.
go back to reference Jenny, A.: Booij, Geert: the grammar of words: an introduction to linguistic morphology. Morphology 24, 433–434 (2014). Springer Jenny, A.: Booij, Geert: the grammar of words: an introduction to linguistic morphology. Morphology 24, 433–434 (2014). Springer
11.
go back to reference Pertsova, K.: Interaction of morphological and phonological markedness in Russian genitive plural allomorphy. Morphology 25, 229–266 (2015). SpringerCrossRef Pertsova, K.: Interaction of morphological and phonological markedness in Russian genitive plural allomorphy. Morphology 25, 229–266 (2015). SpringerCrossRef
12.
go back to reference Sims, A.D., Parker, J.: Lexical processing and affix ordering: cross-linguistic predictions. Morphology 25, 143–182 (2015). SpringerCrossRef Sims, A.D., Parker, J.: Lexical processing and affix ordering: cross-linguistic predictions. Morphology 25, 143–182 (2015). SpringerCrossRef
13.
go back to reference Andreou, M.: Lexical negation in lexical semantics: the prefixes in and dis. Morphology 25, 391–410 (2015)CrossRef Andreou, M.: Lexical negation in lexical semantics: the prefixes in and dis. Morphology 25, 391–410 (2015)CrossRef
14.
go back to reference Braschler, M., Ripplinger, B.: How effective is stemming and de compounding for German text retrieval. Inf. Retrieval 7, 291–316 (2004)CrossRef Braschler, M., Ripplinger, B.: How effective is stemming and de compounding for German text retrieval. Inf. Retrieval 7, 291–316 (2004)CrossRef
15.
go back to reference Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: SIGIR 2002. ACM (2004) Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: SIGIR 2002. ACM (2004)
16.
go back to reference Korenius, T., Laurikkala, J., Järvelin, K., Juhola, M.: Stemming and Lemmatization in the Clustering of Finnish Text Documents, CIKM 2004. ACM (2004) Korenius, T., Laurikkala, J., Järvelin, K., Juhola, M.: Stemming and Lemmatization in the Clustering of Finnish Text Documents, CIKM 2004. ACM (2004)
17.
go back to reference Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Kluwer Academic Publishers, Dordrecht (2003) Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Kluwer Academic Publishers, Dordrecht (2003)
18.
go back to reference Ramanathan, A., Rao, D.: A lightweight stemmer for Hindi. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL) on Computational linguistics for South Asian Language (2003) Ramanathan, A., Rao, D.: A lightweight stemmer for Hindi. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL) on Computational linguistics for South Asian Language (2003)
19.
go back to reference Shambhavi, B.R., Kumar, P.R.: Kannada morphological analyzer and generator using trie. Int. J. Comput. Sci. Netw. Secur. 11, 112–116 (2011) Shambhavi, B.R., Kumar, P.R.: Kannada morphological analyzer and generator using trie. Int. J. Comput. Sci. Netw. Secur. 11, 112–116 (2011)
20.
go back to reference Islam, Z., Uddin, M.N., Khan, M.: A light weight stemmer for bengali and its use in spelling checker. In: Proceedings of First International Conference on Digital Communication and Computer Applications (DCCA 2007), pp. 19–23 (2007) Islam, Z., Uddin, M.N., Khan, M.: A light weight stemmer for bengali and its use in spelling checker. In: Proceedings of First International Conference on Digital Communication and Computer Applications (DCCA 2007), pp. 19–23 (2007)
21.
go back to reference Akram, Q.U.A., Naseer, A., Hussain, S.: Assas-band, an affix-exception-list based Urdu stemmer. In: Proceedings of the 7th Workshop on Asian Language Resources, pp. 40–47 (2009) Akram, Q.U.A., Naseer, A., Hussain, S.: Assas-band, an affix-exception-list based Urdu stemmer. In: Proceedings of the 7th Workshop on Asian Language Resources, pp. 40–47 (2009)
23.
go back to reference Ram, V.S., Devi, S.L.: Malayalam stemmer. In: Morphological Analysers and Generators, Mona Parakh, LDC-IL, Mysore, pp. 105–113 (2010) Ram, V.S., Devi, S.L.: Malayalam stemmer. In: Morphological Analysers and Generators, Mona Parakh, LDC-IL, Mysore, pp. 105–113 (2010)
24.
go back to reference Mudassar, M.: Majgaonker: discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 02, 2716–2720 (2010) Mudassar, M.: Majgaonker: discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 02, 2716–2720 (2010)
25.
go back to reference Sasidhar, B., Yohan, P.M.: Named entity recognition in Telugu language using language. Int. J. Comput. Appl. 22, 30–34 (2011) Sasidhar, B., Yohan, P.M.: Named entity recognition in Telugu language using language. Int. J. Comput. Appl. 22, 30–34 (2011)
26.
go back to reference Ameta, J., Joshi, N., Mathur, I.: A lightweight stemmer for Gujarati. In: 46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section Ameta, J., Joshi, N., Mathur, I.: A lightweight stemmer for Gujarati. In: 46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section
27.
go back to reference Mishra, U., Chandra, P.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4, 711–717 (2012) Mishra, U., Chandra, P.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4, 711–717 (2012)
28.
go back to reference Thangarasu, M., Manavalan, R.: Design and development of stemmer for Tamil language: cluster analysis. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 813–818 (2013) Thangarasu, M., Manavalan, R.: Design and development of stemmer for Tamil language: cluster analysis. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 813–818 (2013)
29.
go back to reference Thangarasu, M., Manavalan, R.: A literature review: stemming algorithms for Indian languages. Int. J. Comput. Trends Technol. 4, 2582–2584 (2012) Thangarasu, M., Manavalan, R.: A literature review: stemming algorithms for Indian languages. Int. J. Comput. Trends Technol. 4, 2582–2584 (2012)
30.
go back to reference Thangarasu, M., Manavalan, R.: Stemmers for Tamil language: performance analysis. Int. J. Comput. Sci. Eng. Technol. 4, 902–908 (2012) Thangarasu, M., Manavalan, R.: Stemmers for Tamil language: performance analysis. Int. J. Comput. Sci. Eng. Technol. 4, 902–908 (2012)
Metadata
Title
Analogy Removal Stemmer Algorithm for Tamil Text Corpora
Authors
M. Thangarasu
H. Hannah Inbarani
Copyright Year
2016
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-10-3274-5_6

Premium Partner