Skip to main content
Top
Published in: Knowledge and Information Systems 3/2015

01-03-2015 | Regular Paper

Novel harmony search-based algorithms for part-of-speech tagging

Authors: Rana Forsati, Mehrnoush Shamsfard

Published in: Knowledge and Information Systems | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As a fast and high-quality tagger algorithm is a crucial task in natural language processing, this paper presents novel language-independent algorithms based on harmony search (HS) optimization method for handling the part-of-speech (PoS) tagging problem. The first proposed algorithm is a framework for applying HS to PoS-tagging which is called HSTAGger. By modifying HS algorithm and proposing more efficient objective functions, two improved versions of the HSTAGger are also introduced. In addition, a novel class of problematic words called erroneous as well as a method of handling them is proposed for the first time to the best of our knowledge. To demonstrate the effectiveness of the proposed algorithms, we have applied them on standard annotated corpus and compare them with other evolutionary-based and classical PoS-tagging approaches. Experimental results indicate that the proposed algorithms outperform the other taggers previously presented in the literature in terms of average precision.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
We have used the values proposed by [6] for crossover and mutation parameters.
 
2
We have used our own implementations of the Random and LexProb algorithms for evaluation.
 
3
Java implementation of this system is freely available at ftp://​ftp.​cis.​upenn.​edu/​pub/​adwait/​jmx/​.
 
Literature
1.
go back to reference Attia M, Rashwan MAA, Al-Badrashiny MASAA (2009) Fassieh (R), a semi-automatic visual interactive tool for morphological, pos-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans Audio Speech Lang Process 17:916–925CrossRef Attia M, Rashwan MAA, Al-Badrashiny MASAA (2009) Fassieh (R), a semi-automatic visual interactive tool for morphological, pos-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans Audio Speech Lang Process 17:916–925CrossRef
2.
go back to reference Baeza-Yate R, Ribeiro BN (1999) Modern information retrieval. ACM Press; Addison-Wesley, New York Harlow, England Baeza-Yate R, Ribeiro BN (1999) Modern information retrieval. ACM Press; Addison-Wesley, New York Harlow, England
3.
go back to reference DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Comput Linguist 14:31–39 DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Comput Linguist 14:31–39
4.
go back to reference Francis WN, Kucera H (1979) Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Brown University, Providence Francis WN, Kucera H (1979) Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Brown University, Providence
5.
go back to reference Brants T (2000) TnT: a statistical part-of-speech tagger, presented at the Proceedings of the sixth conference on Applied natural language processing, Seattle, Washington Brants T (2000) TnT: a statistical part-of-speech tagger, presented at the Proceedings of the sixth conference on Applied natural language processing, Seattle, Washington
7.
go back to reference Geem ZW, Tseng CL, Park Y (2005) Proceedings Harmony search for generalized orienteering problem: best touring in china. Adv Nat Comput Pt 3 3612:741–750 Geem ZW, Tseng CL, Park Y (2005) Proceedings Harmony search for generalized orienteering problem: best touring in china. Adv Nat Comput Pt 3 3612:741–750
9.
go back to reference Geem ZW (2009) Music-inspired harmony search algorithm : theory and applications, 1st edn. Springer, New YorkCrossRef Geem ZW (2009) Music-inspired harmony search algorithm : theory and applications, 1st edn. Springer, New YorkCrossRef
10.
go back to reference Forsati R, Mahdavi M, Haghighat AT, Ghariniyat A (2008) An efficient algorithm for bandwidth-delay constrained least cost multicast routing. Can Conf Elect Comput Eng CCECE 2008:1641–1646 Forsati R, Mahdavi M, Haghighat AT, Ghariniyat A (2008) An efficient algorithm for bandwidth-delay constrained least cost multicast routing. Can Conf Elect Comput Eng CCECE 2008:1641–1646
12.
go back to reference Forsati R, Mahdavi M, Haghigaht AT (2008) Harmony search based algorithms for bandwidth-delay-constrained least-cost multicast routing. Comput Commun 31:2505–2519CrossRef Forsati R, Mahdavi M, Haghigaht AT (2008) Harmony search based algorithms for bandwidth-delay-constrained least-cost multicast routing. Comput Commun 31:2505–2519CrossRef
13.
go back to reference Mahdavi M, Haghir Chehreghani M, Abolhassani H, Forsati R (2008) Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 201:441–451CrossRefMATHMathSciNet Mahdavi M, Haghir Chehreghani M, Abolhassani H, Forsati R (2008) Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 201:441–451CrossRefMATHMathSciNet
14.
go back to reference Mirkhani M, Forsati R, Shahri M, Moayedikia A (2013) A novel efficient algorithm for mobile robot localization. Robot Auton Syst 61:920–931CrossRef Mirkhani M, Forsati R, Shahri M, Moayedikia A (2013) A novel efficient algorithm for mobile robot localization. Robot Auton Syst 61:920–931CrossRef
15.
go back to reference Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902–3933CrossRefMATH Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902–3933CrossRefMATH
16.
go back to reference Forsati R, Shamsfard M, Mojtahedpour P (2010) An Efficient meta heuristic algorithm for POS-tagging. 2010 Fifth International Multi-conference on Computing in the Global Information Technology, Spain Forsati R, Shamsfard M, Mojtahedpour P (2010) An Efficient meta heuristic algorithm for POS-tagging. 2010 Fifth International Multi-conference on Computing in the Global Information Technology, Spain
18.
go back to reference Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21:543–565 Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21:543–565
19.
go back to reference Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging, presented at the Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, USA. Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging, presented at the Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, USA.
20.
go back to reference Teller V (2000) Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Comput Linguistics 26:638–641CrossRef Teller V (2000) Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Comput Linguistics 26:638–641CrossRef
21.
go back to reference Araujo L (2004) Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans Evol Comput 8:14–27CrossRef Araujo L (2004) Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans Evol Comput 8:14–27CrossRef
22.
go back to reference Jurafsky D, Martin JH (2009) Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Pearson Prentice Hall, Upper Saddle River Jurafsky D, Martin JH (2009) Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Pearson Prentice Hall, Upper Saddle River
23.
go back to reference van Halteren H, Daelemans W, Zavrel J (2001) Improving accuracy in word class tagging through the combination of machine learning systems. Comput Linguist 27:199–229CrossRef van Halteren H, Daelemans W, Zavrel J (2001) Improving accuracy in word class tagging through the combination of machine learning systems. Comput Linguist 27:199–229CrossRef
24.
go back to reference Schütze H, Singer Y. (1994). Part-of-speech tagging using a variable memory markov model, presented at the Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico. Schütze H, Singer Y. (1994). Part-of-speech tagging using a variable memory markov model, presented at the Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico.
25.
go back to reference Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20:155–171 Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20:155–171
26.
go back to reference Araujo L, (2002). Part-of-Speech tagging with evolutionary algorithms, presented at the Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing. Araujo L, (2002). Part-of-Speech tagging with evolutionary algorithms, presented at the Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing.
27.
go back to reference Sarikaya R, Afify M, Deng Y, Erdogan H, Gao Y (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans Audio Speech Lang Process 16:1330–1339CrossRef Sarikaya R, Afify M, Deng Y, Erdogan H, Gao Y (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans Audio Speech Lang Process 16:1330–1339CrossRef
28.
go back to reference Schmid H. (1994). Part-of-speech tagging with neural networks, presented at the Proceedings of the 15th conference on Computational linguistics, vol. 1, Kyoto, Japan. Schmid H. (1994). Part-of-speech tagging with neural networks, presented at the Proceedings of the 15th conference on Computational linguistics, vol. 1, Kyoto, Japan.
29.
go back to reference Kudo T, Yamamoto K, Matsumoto Y. (2004). “Applying conditional random fields to Japanese morphological analysis”, presented at the In Proc. of EMNLP’04, Barcelona, Spain. Kudo T, Yamamoto K, Matsumoto Y. (2004). “Applying conditional random fields to Japanese morphological analysis”, presented at the In Proc. of EMNLP’04, Barcelona, Spain.
30.
go back to reference Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, CambridgeMATH Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, CambridgeMATH
31.
go back to reference Charniak E (1993) Statistical language learning. MIT Press, Cambridge Charniak E (1993) Statistical language learning. MIT Press, Cambridge
32.
go back to reference Halteren HV, Zavrel J, Daelemans W. (1998). Improving data driven wordclass tagging by system combination, presented at the Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, Montreal, Quebec, Canada. Halteren HV, Zavrel J, Daelemans W. (1998). Improving data driven wordclass tagging by system combination, presented at the Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, Montreal, Quebec, Canada.
33.
go back to reference Martin Volk GS. (1998). Comparing a statistical and a rule-based tagger for German, presented at the In Proceedings of KONVENS-98, Bonn. Martin Volk GS. (1998). Comparing a statistical and a rule-based tagger for German, presented at the In Proceedings of KONVENS-98, Bonn.
34.
go back to reference Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput Linguistics 21(4):543–565 Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput Linguistics 21(4):543–565
35.
go back to reference Lua KT (1996) Part of speech tagging of Chinese sentences using genetic algorithm, presented at the Conference on Chinese Computing. Lua KT (1996) Part of speech tagging of Chinese sentences using genetic algorithm, presented at the Conference on Chinese Computing.
36.
go back to reference Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging. Genetic and Evolutionary Computation - Gecco 2003, Pt Ii, Proceedings 2724:1951–1962CrossRef Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging. Genetic and Evolutionary Computation - Gecco 2003, Pt Ii, Proceedings 2724:1951–1962CrossRef
37.
go back to reference Jelinek F (1985) Markov source modeling of text generation. Skwirzynski JK (ed) The Impact of Processing Techniques on Communication. Nijhoff, Dordrecht, The, Netherlands Jelinek F (1985) Markov source modeling of text generation. Skwirzynski JK (ed) The Impact of Processing Techniques on Communication. Nijhoff, Dordrecht, The, Netherlands
38.
go back to reference Carlberger J, Kann V (1999) Implementing an efficient part-of-speech tagger. Softw. Pract. Exper. 29:815–832CrossRef Carlberger J, Kann V (1999) Implementing an efficient part-of-speech tagger. Softw. Pract. Exper. 29:815–832CrossRef
39.
go back to reference Lee GG, Cha J, Lee JH (2002) Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguist. 28:53–70CrossRef Lee GG, Cha J, Lee JH (2002) Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguist. 28:53–70CrossRef
40.
go back to reference Pan QK, Suganthan PN, Tasgetiren MF (2010) A local-best harmony search algorithm with dynamic subpopulations. Engineering Optimization 42:101–117CrossRef Pan QK, Suganthan PN, Tasgetiren MF (2010) A local-best harmony search algorithm with dynamic subpopulations. Engineering Optimization 42:101–117CrossRef
41.
go back to reference Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Morgan Kaufmann Publishers Inc, Burlington Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Morgan Kaufmann Publishers Inc, Burlington
43.
go back to reference Araujo L, Luque G, Alba E (2004) Metaheuristics for natural language tagging. In: Deb K et al (eds) Genetic and Evolutionary Computation Conference (GECCO-2004) Seattle, Washington, in: Lecture Notes in Computer Science, vol 3102., Springer, Berlin, pp 889–900 Araujo L, Luque G, Alba E (2004) Metaheuristics for natural language tagging. In: Deb K et al (eds) Genetic and Evolutionary Computation Conference (GECCO-2004) Seattle, Washington, in: Lecture Notes in Computer Science, vol 3102., Springer, Berlin, pp 889–900
44.
go back to reference Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics 19:313–330 Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics 19:313–330
45.
go back to reference Geem ZW (2006) Optimal cost design of water distribution networks using harmony search. Engineering Optimization 38:259–277CrossRef Geem ZW (2006) Optimal cost design of water distribution networks using harmony search. Engineering Optimization 38:259–277CrossRef
46.
go back to reference Kim JH, Geem ZW, Kim ES (2001) Parameter estimation of the nonlinear muskingum model using harmony search. J. Am. Water Resour. Assoc. 37:1131–1138CrossRef Kim JH, Geem ZW, Kim ES (2001) Parameter estimation of the nonlinear muskingum model using harmony search. J. Am. Water Resour. Assoc. 37:1131–1138CrossRef
47.
go back to reference Forsati R, Mahdavi M (2010) Web text mining using harmony search. Recent Advances In Harmony Search Algorithm 2010:51–64CrossRef Forsati R, Mahdavi M (2010) Web text mining using harmony search. Recent Advances In Harmony Search Algorithm 2010:51–64CrossRef
48.
go back to reference Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10:187–228CrossRef Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10:187–228CrossRef
49.
go back to reference Aone C, Hausman K. (1996). “Unsupervised learning of a rule-based Spanish part of speech tagger”, presented at the Proceedings of the 16th conference on Computational linguistics, vol. 1, Copenhagen, Denmark. Aone C, Hausman K. (1996). “Unsupervised learning of a rule-based Spanish part of speech tagger”, presented at the Proceedings of the 16th conference on Computational linguistics, vol. 1, Copenhagen, Denmark.
50.
go back to reference Daelemans W, Zavrel J, Berck P, Gillis S. (1996) “MBT: A memory-based part-of speech tagger generator”, Proceedings 4th Workshop on Very Large Corpora, pp. 14–27. Daelemans W, Zavrel J, Berck P, Gillis S. (1996) “MBT: A memory-based part-of speech tagger generator”, Proceedings 4th Workshop on Very Large Corpora, pp. 14–27.
51.
go back to reference Gao J, Johnson M. (2008) “A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers”, 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 344–352. Gao J, Johnson M. (2008) “A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers”, 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 344–352.
52.
go back to reference Gamback B, Olsson F, Argaw AA, Asker L. (2009). “Methods for amharic part-of-speech tagging”, Proceedings of the First Workshop on Language Technologies for African Languages (AfLaT 2009), Greece: Association for Computational Linguistics, pp. 104–111. Gamback B, Olsson F, Argaw AA, Asker L. (2009). “Methods for amharic part-of-speech tagging”, Proceedings of the First Workshop on Language Technologies for African Languages (AfLaT 2009), Greece: Association for Computational Linguistics, pp. 104–111.
53.
go back to reference Giménez J, Màrquez L. (2004). “SVMTool: A general POS tagger generator based on Support Vector Machines”, In Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43–46, Lisbon, Portugal. Giménez J, Màrquez L. (2004). “SVMTool: A general POS tagger generator based on Support Vector Machines”, In Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43–46, Lisbon, Portugal.
Metadata
Title
Novel harmony search-based algorithms for part-of-speech tagging
Authors
Rana Forsati
Mehrnoush Shamsfard
Publication date
01-03-2015
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2015
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0719-6

Other articles of this Issue 3/2015

Knowledge and Information Systems 3/2015 Go to the issue

Premium Partner