Skip to main content
Erschienen in: Artificial Intelligence Review 8/2020

15.04.2020

Empirical evaluation and study of text stemming algorithms

verfasst von: Abdul Jabbar, Sajid Iqbal, Manzoor Ilahi Tamimy, Shafiq Hussain, Adnan Akhunzada

Erschienen in: Artificial Intelligence Review | Ausgabe 8/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text stemming is one of the basic preprocessing step for Natural Language Processing applications which is used to transform different word forms into a standard root form. For Arabic script based languages, adequate analysis of text by stemmers is a challenging task due to large number of ambigious structures of the language. In literature, multiple performance evaluation metrics exist for stemmers, each describing the performance from particular aspect. In this work, we review and analyze the text stemming evaluation methods in order to devise criteria for better measurement of stemmer performance. Role of different aspects of stemmer performance measurement like main features, merits and shortcomings are discussed using a resource scarce language i.e. Urdu. Through our experiments we conclude that the current evaluation metrics can only measure an average conflation of words regardless of the correctness of the stem. Moreover, some evaluation metrics favor some type of languages only. None of the existing evaluation metrics can perfectly measure the stemmer performance for all kind of languages. This study will help researchers to evaluate their stemmer using right methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Built in SMART system.
 
2
Extensively modified version Lovens (1968) included in SMART system.
 
Literatur
Zurück zum Zitat Ababneh M, Al-Shalabi R, Kanaan G, Al-Nobani A (2012) Building an effective rule-based light stemmer for arabic language to improve search effectiveness. Int Arab J Inf Technol 9(4):368–372 Ababneh M, Al-Shalabi R, Kanaan G, Al-Nobani A (2012) Building an effective rule-based light stemmer for arabic language to improve search effectiveness. Int Arab J Inf Technol 9(4):368–372
Zurück zum Zitat Abainia K, Ouamour S, Sayoud H (2017) A novel robust Arabic light stemmer. J Exp Theor Artif Intell 29(3):557–573 Abainia K, Ouamour S, Sayoud H (2017) A novel robust Arabic light stemmer. J Exp Theor Artif Intell 29(3):557–573
Zurück zum Zitat Abu-Errub A, Odeh A, Shambour Q, Hassan OAH (2014) Arabic roots extraction using morphological analysis. Int J Comput Sci Issues (IJCSI) 11(2):128 Abu-Errub A, Odeh A, Shambour Q, Hassan OAH (2014) Arabic roots extraction using morphological analysis. Int J Comput Sci Issues (IJCSI) 11(2):128
Zurück zum Zitat Ali M, Khalid S, Aslam MH (2018) Pattern-based comprehensive Urdu stemmer and short text classification. IEEE Access 6:7374–7389 Ali M, Khalid S, Aslam MH (2018) Pattern-based comprehensive Urdu stemmer and short text classification. IEEE Access 6:7374–7389
Zurück zum Zitat Ali M, Khalid S, Saleemi M (2019) Comprehensive stemmer for morphologically rich urdu language. Int Arab J Inf Technol 16(1):138–147 Ali M, Khalid S, Saleemi M (2019) Comprehensive stemmer for morphologically rich urdu language. Int Arab J Inf Technol 16(1):138–147
Zurück zum Zitat Alotaibi FS, Gupta V (2018) A cognitive inspired unsupervised language-independent text stemmer for Information retrieval. Cognit Syst Res 52:291–300 Alotaibi FS, Gupta V (2018) A cognitive inspired unsupervised language-independent text stemmer for Information retrieval. Cognit Syst Res 52:291–300
Zurück zum Zitat Al-Kabi MN, Kazakzeh SA, Ata BMA, Al-Rababah SA, Alsmadi IM (2015) A novel root based Arabic stemmer. J King Saud Univ-Comput Inf Sci 27(2):94–103 Al-Kabi MN, Kazakzeh SA, Ata BMA, Al-Rababah SA, Alsmadi IM (2015) A novel root based Arabic stemmer. J King Saud Univ-Comput Inf Sci 27(2):94–103
Zurück zum Zitat Al-Omari A, Abuata B (2014) Arabic light stemmer (ARS). J Eng Sci Technol 9(6):702–717 Al-Omari A, Abuata B (2014) Arabic light stemmer (ARS). J Eng Sci Technol 9(6):702–717
Zurück zum Zitat AlSerhan HM, Alqrainy S, Ayesh A (2008, November). Is paice method suitable for evaluating Arabic stemming algorithms? In: International conference on computer engineering & systems, 2008 (ICCES 2008). IEEE, pp 131–135 AlSerhan HM, Alqrainy S, Ayesh A (2008, November). Is paice method suitable for evaluating Arabic stemming algorithms? In: International conference on computer engineering & systems, 2008 (ICCES 2008). IEEE, pp 131–135
Zurück zum Zitat Al-Shammari ET, Lin J. (2008, October). Towards an error-free Arabic stemming. In Proceedings of the 2nd ACM workshop on Improving non English web searching. ACM, pp 9–16 Al-Shammari ET, Lin J. (2008, October). Towards an error-free Arabic stemming. In Proceedings of the 2nd ACM workshop on Improving non English web searching. ACM, pp 9–16
Zurück zum Zitat Al-Sughaiyer IA, Al-Kharashi IA (2004) Arabic morphological analysis techniques: A comprehensive survey. J American Soc Inf Sci Tech 55(3):189–213 Al-Sughaiyer IA, Al-Kharashi IA (2004) Arabic morphological analysis techniques: A comprehensive survey. J American Soc Inf Sci Tech 55(3):189–213
Zurück zum Zitat Alvares RV, Garcia AC, Ferraz I (2005) December) STEMBR: a stemming algorithm for the Brazilian Portuguese language. Portuguese conference on artificial intelligence. Springer, Berlin, pp 693–701 Alvares RV, Garcia AC, Ferraz I (2005) December) STEMBR: a stemming algorithm for the Brazilian Portuguese language. Portuguese conference on artificial intelligence. Springer, Berlin, pp 693–701
Zurück zum Zitat Aronoff M, Fudeman K (2011) What is morphology? vol. 8. Wiley, pp 2–3 Aronoff M, Fudeman K (2011) What is morphology? vol. 8. Wiley, pp 2–3
Zurück zum Zitat Bimba A, Idris N, Khamis N, Noor NF (2016) Stemming Hausa text: using affix-stripping rules and reference look-up. Lang Resour Eval 50(3):687–703 Bimba A, Idris N, Khamis N, Noor NF (2016) Stemming Hausa text: using affix-stripping rules and reference look-up. Lang Resour Eval 50(3):687–703
Zurück zum Zitat Bölücü, Necva and Burcu Can. (2019). Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 18, 3, Article 25 (January 2019), 21 pages. https://doi.org/10.1145/3292398 Bölücü, Necva and Burcu Can. (2019). Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 18, 3, Article 25 (January 2019), 21 pages. https://​doi.​org/​10.​1145/​3292398
Zurück zum Zitat Boudchiche M, Mazroui A (2015, December). Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: statistical study. In: 2015 5th international conference on information and communication technology and accessibility (ICTA). IEEE, pp 1–6 Boudchiche M, Mazroui A (2015, December). Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: statistical study. In: 2015 5th international conference on information and communication technology and accessibility (ICTA). IEEE, pp 1–6
Zurück zum Zitat Boukhalfa I, Mostefai S, Chekkai N (2018, March) A study of graph based stemmer in Arabic extrinsic plagiarism detection. In: Proceedings of the 2nd mediterranean conference on pattern recognition and artificial intelligence. ACM, pp 27–32 Boukhalfa I, Mostefai S, Chekkai N (2018, March) A study of graph based stemmer in Arabic extrinsic plagiarism detection. In: Proceedings of the 2nd mediterranean conference on pattern recognition and artificial intelligence. ACM, pp 27–32
Zurück zum Zitat Brychcín T, Konopík M (2015) HPS: high precision stemmer. Inf Process Manag 51(1):68–91 Brychcín T, Konopík M (2015) HPS: high precision stemmer. Inf Process Manag 51(1):68–91
Zurück zum Zitat Buckley C (1985) Implementation of the smart information retrieval system. Technical report 85–686, Cornell University. Buckley C (1985) Implementation of the smart information retrieval system. Technical report 85–686, Cornell University.
Zurück zum Zitat Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57 Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57
Zurück zum Zitat Chintala DR, Reddy EM (2013) An approach to enhance the CPI using Porter stemming algorithm. Int J Adv Res Comput Sci Softw Eng 3(7):1148–1156 Chintala DR, Reddy EM (2013) An approach to enhance the CPI using Porter stemming algorithm. Int J Adv Res Comput Sci Softw Eng 3(7):1148–1156
Zurück zum Zitat Dahab MY, Ibrahim A, Al-Mutawa R (2015) A comparative study on Arabic stemmers. Int J Comput Appl 125(8):38–47 Dahab MY, Ibrahim A, Al-Mutawa R (2015) A comparative study on Arabic stemmers. Int J Comput Appl 125(8):38–47
Zurück zum Zitat Dang Q, Zhang J, Lu Y, Zhang K (2013) WordNet-based suffix tree clustering algorithm. In: International conference on information science and computer applications (ISCA 2013) Dang Q, Zhang J, Lu Y, Zhang K (2013) WordNet-based suffix tree clustering algorithm. In: International conference on information science and computer applications (ISCA 2013)
Zurück zum Zitat Dey A, Paul A, Purkayastha BS (2014) Named entity recognition for Nepali language: a semi hybrid approach. Int J Eng Innov Technol (IJEIT) 3:21–25 Dey A, Paul A, Purkayastha BS (2014) Named entity recognition for Nepali language: a semi hybrid approach. Int J Eng Innov Technol (IJEIT) 3:21–25
Zurück zum Zitat Dianati MH, Sadreddini MH, Hossein RA, Fakhrahmad SM, Taghi-Zadeh H (2014) Words stemming based on structural and semantic similarity. Comp Eng Appl J 3(2):89–99 Dianati MH, Sadreddini MH, Hossein RA, Fakhrahmad SM, Taghi-Zadeh H (2014) Words stemming based on structural and semantic similarity. Comp Eng Appl J 3(2):89–99
Zurück zum Zitat de Oliveira RAN, Junior MC (2018) Experimental analysis of stemming on jurisprudential documents retrieval. Information 9(2):28 de Oliveira RAN, Junior MC (2018) Experimental analysis of stemming on jurisprudential documents retrieval. Information 9(2):28
Zurück zum Zitat Dukes K, Habash N (2010) Morphological annotation of Quranic Arabic. In Lrec, pp 2530–2536 Dukes K, Habash N (2010) Morphological annotation of Quranic Arabic. In Lrec, pp 2530–2536
Zurück zum Zitat El-Defrawy M, El-Sonbaty Y, Belal NA (2016) A rule-based subject-correlated Arabic stemmer. Arab J Sci Eng 41(8):2883–2891 El-Defrawy M, El-Sonbaty Y, Belal NA (2016) A rule-based subject-correlated Arabic stemmer. Arab J Sci Eng 41(8):2883–2891
Zurück zum Zitat Fattah MA, Ren F, Kuroiwa S (2006) Stemming to improve translation lexicon creation form bitexts. Inf Process Manag 42(4):1003–1016 Fattah MA, Ren F, Kuroiwa S (2006) Stemming to improve translation lexicon creation form bitexts. Inf Process Manag 42(4):1003–1016
Zurück zum Zitat Flores FN, Moreira VP (2016) Assessing the impact of stemming accuracy on information retrieval–a multilingual perspective. Inf Process Manag 52(5):840–854 Flores FN, Moreira VP (2016) Assessing the impact of stemming accuracy on information retrieval–a multilingual perspective. Inf Process Manag 52(5):840–854
Zurück zum Zitat Frakes WB, Fox CJ (2003) Strength and similarity of affix removal stemming algorithms. In ACM SIGIR forum, vol 37, no 1. ACM, pp 26–30. Frakes WB, Fox CJ (2003) Strength and similarity of affix removal stemming algorithms. In ACM SIGIR forum, vol 37, no 1. ACM, pp 26–30.
Zurück zum Zitat Gaidhane MS, Gondhale MD, Talole MP (2015) A comparative study of stemming algorithms for natural language processing. J Eng Educ Technol (ARDIJEET) 3(2):1–6 Gaidhane MS, Gondhale MD, Talole MP (2015) A comparative study of stemming algorithms for natural language processing. J Eng Educ Technol (ARDIJEET) 3(2):1–6
Zurück zum Zitat Giachanou A, Crestani F (2016) Like it or not: a survey of twitter sentiment analysis methods. ACM Comput Surv (CSUR) 49(2):28 Giachanou A, Crestani F (2016) Like it or not: a survey of twitter sentiment analysis methods. ACM Comput Surv (CSUR) 49(2):28
Zurück zum Zitat Hassani K, Lee WS (2016) Visualizing natural language descriptions: a survey. ACM Comput Surv (CSUR) 49(1):17 Hassani K, Lee WS (2016) Visualizing natural language descriptions: a survey. ACM Comput Surv (CSUR) 49(1):17
Zurück zum Zitat Husain MS, Ahamad F, Khalid S (2013) A language independent approach to develop Urdu stemmer. Advances in computing and information technology. Springer, Berlin, pp 45–53 Husain MS, Ahamad F, Khalid S (2013) A language independent approach to develop Urdu stemmer. Advances in computing and information technology. Springer, Berlin, pp 45–53
Zurück zum Zitat Hull DA (1996) Stemming algorithms—a case study for detailed evaluation. J Am Soc Inf Sci 47:70–84 Hull DA (1996) Stemming algorithms—a case study for detailed evaluation. J Am Soc Inf Sci 47:70–84
Zurück zum Zitat Hussain Z, Iqbal S, Saba T, Almazyad AS, Rehman A (2017) Design and development of dictionary-based stemmer for the urdu language. J Theor Appl Inf Technol 95(15):3560–3569 Hussain Z, Iqbal S, Saba T, Almazyad AS, Rehman A (2017) Design and development of dictionary-based stemmer for the urdu language. J Theor Appl Inf Technol 95(15):3560–3569
Zurück zum Zitat Ismailov A, Jalil MA, Abdullah Z, Rahim NA (2016) A comparative study of stemming algorithms for use with the Uzbek language. In: 3rd international conference on computer and information sciences (ICCOINS), 2016. IEEE, pp 7–12 Ismailov A, Jalil MA, Abdullah Z, Rahim NA (2016) A comparative study of stemming algorithms for use with the Uzbek language. In: 3rd international conference on computer and information sciences (ICCOINS), 2016. IEEE, pp 7–12
Zurück zum Zitat Jaafar Y, Namly D, Bouzoubaa K, Yousfi A (2017) Enhancing Arabic stemming process using resources and benchmarking tools. J King Saud Univ-Comput Inf Sci 29(2):164–170 Jaafar Y, Namly D, Bouzoubaa K, Yousfi A (2017) Enhancing Arabic stemming process using resources and benchmarking tools. J King Saud Univ-Comput Inf Sci 29(2):164–170
Zurück zum Zitat Jabbar A, Iqbal S, Khan MUG (2016a) Analysis and development of resources for Urdu text stemming. In: Proceedings of the 6th annual international conference on language and technology, KICS-CLE, UET Lahore Jabbar A, Iqbal S, Khan MUG (2016a) Analysis and development of resources for Urdu text stemming. In: Proceedings of the 6th annual international conference on language and technology, KICS-CLE, UET Lahore
Zurück zum Zitat Jabbar A, Iqbal S, Khan MUG, Hussain S (2018b) A survey on Urdu and Urdu like language stemmers and stemming techniques. Artif Intell Rev 49(3):339–373 Jabbar A, Iqbal S, Khan MUG, Hussain S (2018b) A survey on Urdu and Urdu like language stemmers and stemming techniques. Artif Intell Rev 49(3):339–373
Zurück zum Zitat Jabbar A, Iqbal S, Khan MUG, Hussain S (2018b) A survey on Urdu and Urdu like language stemmers and stemming techniques. Artif Intell Rev 49(3):339–373 Jabbar A, Iqbal S, Khan MUG, Hussain S (2018b) A survey on Urdu and Urdu like language stemmers and stemming techniques. Artif Intell Rev 49(3):339–373
Zurück zum Zitat Jivani AG (2011) A comparative study of stemming algorithms. Int J Comp Tech Appl 2(6):1930–1938 Jivani AG (2011) A comparative study of stemming algorithms. Int J Comp Tech Appl 2(6):1930–1938
Zurück zum Zitat Karaa WBA (2013) A new stemmer to improve information retrieval. Int J Netw Secur Appl 5(4):143MathSciNet Karaa WBA (2013) A new stemmer to improve information retrieval. Int J Netw Secur Appl 5(4):143MathSciNet
Zurück zum Zitat Karimi S, Wang C, Metke-Jimenez A, Gaire R, Paris C (2015) Text and data mining techniques in adverse drug reaction detection. ACM Comput Surv (CSUR) 47(4):56 Karimi S, Wang C, Metke-Jimenez A, Gaire R, Paris C (2015) Text and data mining techniques in adverse drug reaction detection. ACM Comput Surv (CSUR) 47(4):56
Zurück zum Zitat Kastner I (2019) Templatic morphology as an emergent property. Nat Lang Linguist Theory 37(2):571–619 Kastner I (2019) Templatic morphology as an emergent property. Nat Lang Linguist Theory 37(2):571–619
Zurück zum Zitat Khalid A, Hussain Z, Baig MA (2016) Arabic stemmer for search engines information retrieval. Int J Adv Comput Sci Appl 1(7):407–411 Khalid A, Hussain Z, Baig MA (2016) Arabic stemmer for search engines information retrieval. Int J Adv Comput Sci Appl 1(7):407–411
Zurück zum Zitat Khan S, Waqas A, Usama B, Xuan W (2015) Template based affix stemmer for a morphologically rich language. Int Arab J Inf Tech 12(2):146–154 Khan S, Waqas A, Usama B, Xuan W (2015) Template based affix stemmer for a morphologically rich language. Int Arab J Inf Tech 12(2):146–154
Zurück zum Zitat Khoja S, Garside R (1999) Stemming arabic text. Lancaster University, Lancaster, UK, Computing Department Khoja S, Garside R (1999) Stemming arabic text. Lancaster University, Lancaster, UK, Computing Department
Zurück zum Zitat Krovetz R (2000) Viewing morphology as an inference process. Artif intel 118(1–2):277–294MATH Krovetz R (2000) Viewing morphology as an inference process. Artif intel 118(1–2):277–294MATH
Zurück zum Zitat Larkey LS, Ballesteros L, Connell ME (2007) Light stemming for Arabic information retrieval. Arabic computational morphology. Springer, Dordrecht, pp 221–243 Larkey LS, Ballesteros L, Connell ME (2007) Light stemming for Arabic information retrieval. Arabic computational morphology. Springer, Dordrecht, pp 221–243
Zurück zum Zitat Lennon M, Peirce DS, Tarry BD, Willett P (1981) An evaluation of some conflation algorithms for information retrieval. Inf Sci 3(4):177–183 Lennon M, Peirce DS, Tarry BD, Willett P (1981) An evaluation of some conflation algorithms for information retrieval. Inf Sci 3(4):177–183
Zurück zum Zitat Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1–2):22–31 Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1–2):22–31
Zurück zum Zitat Mateen A, Malik MK, Nawaz Z, Danish HM, Siddiqui MH, Abbas Q (2017) A hybrid stemmer of punjabi shahmukhi script. Int J Comput Sci Netw Secur 17(8):90–97 Mateen A, Malik MK, Nawaz Z, Danish HM, Siddiqui MH, Abbas Q (2017) A hybrid stemmer of punjabi shahmukhi script. Int J Comput Sci Netw Secur 17(8):90–97
Zurück zum Zitat Mishra U, Prakash C (2012) MAULIK: an effective stemmer for Hindi language. Int J Comput Sci Eng 4(5):711–717 Mishra U, Prakash C (2012) MAULIK: an effective stemmer for Hindi language. Int J Comput Sci Eng 4(5):711–717
Zurück zum Zitat Mochizuki M, Aizawa K (2000) An affix acquisition order for EFL learners: an exploratory study. System 28(2):291–304 Mochizuki M, Aizawa K (2000) An affix acquisition order for EFL learners: an exploratory study. System 28(2):291–304
Zurück zum Zitat Moghadam FM, MohammadReza K (2015) Comparative study of various Persian stemmers in the field of information retrieval. J Inf Proc Syst 11(3):450–464 Moghadam FM, MohammadReza K (2015) Comparative study of various Persian stemmers in the field of information retrieval. J Inf Proc Syst 11(3):450–464
Zurück zum Zitat Momenipour F, Keyvanpour MR (2016) PHMM: stemming on Persian texts using statistical stemmer based on hidden Markov Model. Int J Inf Sci Manag 14(2):107–117 Momenipour F, Keyvanpour MR (2016) PHMM: stemming on Persian texts using statistical stemmer based on hidden Markov Model. Int J Inf Sci Manag 14(2):107–117
Zurück zum Zitat Mustafa AM, Rashid TA (2018) Kurdish stemmer pre-processing steps for improving information retrieval. J Inf Sci 44(1):15–27 Mustafa AM, Rashid TA (2018) Kurdish stemmer pre-processing steps for improving information retrieval. J Inf Sci 44(1):15–27
Zurück zum Zitat Nguyen, (2013) Nguyen DT, Leveling J (2013) Exploring domain-sensitive features for extractive summarization in the medical domain. International conference on application of natural language to information systems. Springer, Berlin, pp 90–101 Nguyen, (2013) Nguyen DT, Leveling J (2013) Exploring domain-sensitive features for extractive summarization in the medical domain. International conference on application of natural language to information systems. Springer, Berlin, pp 90–101
Zurück zum Zitat Nwesri AFA, Alyagoubi HAH (2015). Applying arabic stemming using query expansion. In 2015 26th international workshop on database and expert systems applications (DEXA) (pp. 299–303). IEEE Nwesri AFA, Alyagoubi HAH (2015). Applying arabic stemming using query expansion. In 2015 26th international workshop on database and expert systems applications (DEXA) (pp. 299–303). IEEE
Zurück zum Zitat Orengo VM, Huyck C (2001) a stemming algorithm for the portuguese language. In; SPIRE '01: Proceedings of eigth symposium on string processing and information retrieval, pp 186–193. Orengo VM, Huyck C (2001) a stemming algorithm for the portuguese language. In; SPIRE '01: Proceedings of eigth symposium on string processing and information retrieval, pp 186–193.
Zurück zum Zitat Paice CD (1990) Another stemmer. SIGIR Forum 24(3):56–61 Paice CD (1990) Another stemmer. SIGIR Forum 24(3):56–61
Zurück zum Zitat Paice CD (1996) Method for evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci 47(8):632–649 Paice CD (1996) Method for evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci 47(8):632–649
Zurück zum Zitat Paice CD (1994) An evaluation method for stemming algorithms. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer, New York, pp 42–50 Paice CD (1994) An evaluation method for stemming algorithms. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer, New York, pp 42–50
Zurück zum Zitat Paik JH, Pal D, Parui SK (2011) A novel corpus-based stemming algorithm using co-occurrence statistics. In: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’11). ACM, New York, pp 863–872 Paik JH, Pal D, Parui SK (2011) A novel corpus-based stemming algorithm using co-occurrence statistics. In: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’11). ACM, New York, pp 863–872
Zurück zum Zitat Patil CG, Patil SS (2013) Use of Porter stemming algorithm and SVM for emotion extraction from news headlines. Int J Electron Commun Soft Comput Sci Eng 2(7):9–13 Patil CG, Patil SS (2013) Use of Porter stemming algorithm and SVM for emotion extraction from news headlines. Int J Electron Commun Soft Comput Sci Eng 2(7):9–13
Zurück zum Zitat Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137 Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Zurück zum Zitat Qureshi AH, Hassan MU, Akhter S (2018) Towards description of derivation in Urdu: morphological perspective. Al-Qalam 23(2):96–100 Qureshi AH, Hassan MU, Akhter S (2018) Towards description of derivation in Urdu: morphological perspective. Al-Qalam 23(2):96–100
Zurück zum Zitat Rani SPR, Ramesh B, Anusha M, Rani SJGR (2015) Evaluation of stemming techniques for text classification. Int J Comput Sci Mobile Comput 4(3):165–171 Rani SPR, Ramesh B, Anusha M, Rani SJGR (2015) Evaluation of stemming techniques for text classification. Int J Comput Sci Mobile Comput 4(3):165–171
Zurück zum Zitat Rashid TA, Mohamad SO (2016) Enhancement of detecting wicked website through intelligent methods. International symposium on security in computing and communication. Springer, Singapore, pp 358–368 Rashid TA, Mohamad SO (2016) Enhancement of detecting wicked website through intelligent methods. International symposium on security in computing and communication. Springer, Singapore, pp 358–368
Zurück zum Zitat Rashidi A, Lighvan MZ (2014) HPS: a hierarchical Persian stemming method. arXiv preprint arXiv:1403.2837. Rashidi A, Lighvan MZ (2014) HPS: a hierarchical Persian stemming method. arXiv preprint arXiv:1403.2837.
Zurück zum Zitat Rehman Z, Anwar W, Bajwa UI, Xuan W, Chaoying Z (2013) Morpheme matching based text tokenization for a scarce resourced language. PLoS ONE 8(8):e68178 Rehman Z, Anwar W, Bajwa UI, Xuan W, Chaoying Z (2013) Morpheme matching based text tokenization for a scarce resourced language. PLoS ONE 8(8):e68178
Zurück zum Zitat Saad MK, Ashour W (2010) Arabic morphological tools for text mining. Corpora 18:19 Saad MK, Ashour W (2010) Arabic morphological tools for text mining. Corpora 18:19
Zurück zum Zitat Saeed AM, Rashid TA, Mustafa AM, Al-Rashid Agha RA, Shamsaldin AS, Al-Salihi NK (2018a) An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification. Iran J Comput Sci 1(2):99–107 Saeed AM, Rashid TA, Mustafa AM, Al-Rashid Agha RA, Shamsaldin AS, Al-Salihi NK (2018a) An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification. Iran J Comput Sci 1(2):99–107
Zurück zum Zitat Saeed AM, Rashid TA, Mustafa AM, Fattah P, Ismael B (2018b) Improving Kurdish web mining through tree data structure and Porter’s Stemmer algorithms. UKH J Sci Eng 2(1):48–54 Saeed AM, Rashid TA, Mustafa AM, Fattah P, Ismael B (2018b) Improving Kurdish web mining through tree data structure and Porter’s Stemmer algorithms. UKH J Sci Eng 2(1):48–54
Zurück zum Zitat Sarma B, Purkayastha BS (2013) An affix based word classification method of assamese text. Int J Adv Res Comput Sci 4(9):213–216 Sarma B, Purkayastha BS (2013) An affix based word classification method of assamese text. Int J Adv Res Comput Sci 4(9):213–216
Zurück zum Zitat Schofield A, Mimno D (2016) Comparing apples to apple: the effects of stemmers on topic models. Trans Assoc Comput Linguist 4:287–300 Schofield A, Mimno D (2016) Comparing apples to apple: the effects of stemmers on topic models. Trans Assoc Comput Linguist 4:287–300
Zurück zum Zitat Setiawan R, Kurniawan A, Budiharto W, Kartowisastro IH, Prabowo H (2016) Flexible affix classification for stemming Indonesian Language. In: 2016 13th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 1–6 Setiawan R, Kurniawan A, Budiharto W, Kartowisastro IH, Prabowo H (2016) Flexible affix classification for stemming Indonesian Language. In: 2016 13th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 1–6
Zurück zum Zitat Singh J, Gupta V (2016) Text stemming: approaches, applications, and challenges. ACM Comput Surv (CSUR) 49(3):45 Singh J, Gupta V (2016) Text stemming: approaches, applications, and challenges. ACM Comput Surv (CSUR) 49(3):45
Zurück zum Zitat Singh J, Gupta V (2017) An efficient corpus-based stemmer. Cognit Comput 9(5):671–688 Singh J, Gupta V (2017) An efficient corpus-based stemmer. Cognit Comput 9(5):671–688
Zurück zum Zitat Sirsat SR, Chavan V, Mahalle HS (2013) Strength and accuracy analysis of affix removal stemming algorithms. Int J Comput Sci Inf Technol 4(2):265–269 Sirsat SR, Chavan V, Mahalle HS (2013) Strength and accuracy analysis of affix removal stemming algorithms. Int J Comput Sci Inf Technol 4(2):265–269
Zurück zum Zitat Sulaiman S, Omar K, Omar N, Murah MZ, Abdul Rahman HD (2014) The effectiveness of a Jawi stemmer for retrieving relevant Malay documents in Jawi characters. ACM Trans Asian Lang Inf Process (TALIP) 13(2):6 Sulaiman S, Omar K, Omar N, Murah MZ, Abdul Rahman HD (2014) The effectiveness of a Jawi stemmer for retrieving relevant Malay documents in Jawi characters. ACM Trans Asian Lang Inf Process (TALIP) 13(2):6
Zurück zum Zitat Suryani AA, Widyantoro DW, Purwarianti A, Sudaryat Y (2018) The rule-based sundanese stemmer. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 17(4):27 Suryani AA, Widyantoro DW, Purwarianti A, Sudaryat Y (2018) The rule-based sundanese stemmer. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 17(4):27
Zurück zum Zitat Taghi-Zadeh H, Sadreddini MH, Diyanati MH, Rasekh AH (2015) A new hybrid stemming method for persian language. Digit Scholarsh Humanit 32(1):209–221 Taghi-Zadeh H, Sadreddini MH, Diyanati MH, Rasekh AH (2015) A new hybrid stemming method for persian language. Digit Scholarsh Humanit 32(1):209–221
Zurück zum Zitat Thangarasu M, Manavalan R (2013) Design and development of stemmer for Tamil language: cluster analysis. Int J Adv Res Comput Sci Softw Eng 3(7):812–818 Thangarasu M, Manavalan R (2013) Design and development of stemmer for Tamil language: cluster analysis. Int J Adv Res Comput Sci Softw Eng 3(7):812–818
Zurück zum Zitat Qunis I, Amati G, Plachouras V, He B, Macdonald C, Lioma C (2006) A high performance and scalable information retrieval plateform. In: SIGR workshop on open source information retrieval Qunis I, Amati G, Plachouras V, He B, Macdonald C, Lioma C (2006) A high performance and scalable information retrieval plateform. In: SIGR workshop on open source information retrieval
Zurück zum Zitat Xer (1994) Xeror linguistic database reference, English version 1.1.4 ed.s Xer (1994) Xeror linguistic database reference, English version 1.1.4 ed.s
Zurück zum Zitat Yadollahi A, Shahraki AG, Zaiane OR (2017) Current state of text sentiment analysis from opinion to emotion mining. ACM Comput Surv (CSUR) 50(2):25 Yadollahi A, Shahraki AG, Zaiane OR (2017) Current state of text sentiment analysis from opinion to emotion mining. ACM Comput Surv (CSUR) 50(2):25
Zurück zum Zitat Zhou D, Mark T, Brailsford T, Wade V, Ashman H (2012) Translation techniques in cross-language information retrieval. ACM Comput Surv (CSUR) 45(1):1 Zhou D, Mark T, Brailsford T, Wade V, Ashman H (2012) Translation techniques in cross-language information retrieval. ACM Comput Surv (CSUR) 45(1):1
Metadaten
Titel
Empirical evaluation and study of text stemming algorithms
verfasst von
Abdul Jabbar
Sajid Iqbal
Manzoor Ilahi Tamimy
Shafiq Hussain
Adnan Akhunzada
Publikationsdatum
15.04.2020
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence Review / Ausgabe 8/2020
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-020-09828-3

Weitere Artikel der Ausgabe 8/2020

Artificial Intelligence Review 8/2020 Zur Ausgabe

Premium Partner