Skip to main content
Top
Published in: Annals of Data Science 5/2023

14-11-2021

Part of Speech Tagging Using Part of Speech Sequence Graph

Authors: Pejman Gholami-Dastgerdi, Mohammad-Reza Feizi-Derakhshi

Published in: Annals of Data Science | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Part of speech tagging is one of the most fundamental needs of intelligent text processing, which is assigning the most appropriate grammatical category to each word on the text. Hence, provision of a tagger with high accuracy for the Persian language is the major priority of this article. Numerous other methods of POS tagging have already been presented in a way that each one has been applied in taggers to achieve high performance and accuracy. Statistical methods known as a primary technique and one of the most important issues in POS tagging systems is identifying unknown words. This paper investigates all tags that the Maximum Likelihood Estimation method assigns the words existing in the text (including known and unknown) by proposing a graph-based method and correcting them. To do so, a graph is created from the training corpus including the part of speech sequence in the sentences. Then, sentences tagged with Maximum Likelihood Estimation will be corrected by traversing the graph. It should be noted that different methods have been proposed, implemented, and evaluated for tagging using graphs. Next, by investigating pros and cons, a method is proposed which tags the unknown words with the accuracy of 86.84% and the known words with the accuracy of 97.54%. In conclusion, the overall accuracy of the method is calculated as 96.78%, which is an improvement in comparison to the Maximum Likelihood Estimation method and consequently, the graph method shows an acceptable performance in part of speech tagging and is more reliable.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Maximum Likelihood Estimation.
 
2
A popular language in India.
 
Literature
1.
go back to reference Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
2.
go back to reference Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
4.
go back to reference Liu F, Shi Y (2020) Investigating laws of intelligence based on AI IQ research. Ann Data Sci 7:399–416CrossRef Liu F, Shi Y (2020) Investigating laws of intelligence based on AI IQ research. Ann Data Sci 7:399–416CrossRef
5.
go back to reference Mirzanezhad Z, Feizi-Derakhshi MR (2016) Using morphological analyzer to statistical POS tagging on Persian text. Int J Comput Sci Inf Secur (IJCSIS) 14(8) Mirzanezhad Z, Feizi-Derakhshi MR (2016) Using morphological analyzer to statistical POS tagging on Persian text. Int J Comput Sci Inf Secur (IJCSIS) 14(8)
6.
go back to reference Dhumal Deshmukh R, Kiwelekar A (2020) Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) Dhumal Deshmukh R, Kiwelekar A (2020) Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA)
7.
go back to reference Heyan H, Xiaofei Z (2009) Part-of-speech tagger based on maximum entropy model. In: International conference on intelligent human-machine systems and cybernetics Heyan H, Xiaofei Z (2009) Part-of-speech tagger based on maximum entropy model. In: International conference on intelligent human-machine systems and cybernetics
8.
go back to reference Mohseni M, Minaei-bidgoli B (2010) A Persian part-of-speech tagger based on morphological analysis. In: European Language Resources Association (ELRA), Valletta, Malta Mohseni M, Minaei-bidgoli B (2010) A Persian part-of-speech tagger based on morphological analysis. In: European Language Resources Association (ELRA), Valletta, Malta
9.
go back to reference Ghayoomi M (2017) A comparative study on the impact of part-of-speech tagging on parsing. Persian Lang Process 13(4):121–132 Ghayoomi M (2017) A comparative study on the impact of part-of-speech tagging on parsing. Persian Lang Process 13(4):121–132
10.
go back to reference Jadidinejad AH, Mahmudi F (2008) Evaluating part-of-speech tags in indexing and precision for Persian text retrieval. In: Second Iran data mining conference, Tehran. Jadidinejad AH, Mahmudi F (2008) Evaluating part-of-speech tags in indexing and precision for Persian text retrieval. In: Second Iran data mining conference, Tehran.
11.
go back to reference Zhao F, Quan B, Yang J, Chen J, Zhang Y, Wang X (2019) Document summarization using word and part-of-speech based on attention mechanism. J Phys Conf Ser 1168(3):032008 Zhao F, Quan B, Yang J, Chen J, Zhang Y, Wang X (2019) Document summarization using word and part-of-speech based on attention mechanism. J Phys Conf Ser 1168(3):032008
12.
go back to reference Suzuki M, Komiya K, Sasaki M, Shinnou H (2018) Fine-tuning for named entity recognition using part-of-speech tagging. In Proceedings of the 32nd pacific asia conference on language, Hong Kong Suzuki M, Komiya K, Sasaki M, Shinnou H (2018) Fine-tuning for named entity recognition using part-of-speech tagging. In Proceedings of the 32nd pacific asia conference on language, Hong Kong
13.
go back to reference Elahimanesh M, Minaei Bidgoli B (2011) Improvement of the Persian texts unknown words enunciations by the help of association rules. In: Seventeenth national conference of iran computer association, Tehran Elahimanesh M, Minaei Bidgoli B (2011) Improvement of the Persian texts unknown words enunciations by the help of association rules. In: Seventeenth national conference of iran computer association, Tehran
14.
go back to reference Assi M (2003) From lingual corpuses to corpus linguistics. In Fifth linguistics conference, Tehran Assi M (2003) From lingual corpuses to corpus linguistics. In Fifth linguistics conference, Tehran
15.
go back to reference BijanKhan M (2004) The role of the corpus in writing a grammar: an introduction to a software. Iran J Linguist 2:19 BijanKhan M (2004) The role of the corpus in writing a grammar: an introduction to a software. Iran J Linguist 2:19
16.
go back to reference Mirdamadi M, Zaree Bidaki A, Rezaeyan M (2012) Persian text statistical tagging for using in search engines. In: First international conference on Persian handwriting and language, Semnan Mirdamadi M, Zaree Bidaki A, Rezaeyan M (2012) Persian text statistical tagging for using in search engines. In: First international conference on Persian handwriting and language, Semnan
17.
go back to reference Rahati Quchani S, Azimizadeh A, Arab M (2007) Persian words part of speech tagging by the help of Markov hidden model. In: Thirteenth Iran national conference on computer, Persian Gulf, Kish Island Rahati Quchani S, Azimizadeh A, Arab M (2007) Persian words part of speech tagging by the help of Markov hidden model. In: Thirteenth Iran national conference on computer, Persian Gulf, Kish Island
18.
go back to reference Hamidi M, Khalili S, Alighardash E, Pilevar A (2011) Persian tagging based on new structural rules. In: Nineteenth conference on electricity engineering, Iran, Tehran, Amir Kabir Industrial University, Persian Hamidi M, Khalili S, Alighardash E, Pilevar A (2011) Persian tagging based on new structural rules. In: Nineteenth conference on electricity engineering, Iran, Tehran, Amir Kabir Industrial University, Persian
19.
go back to reference Altunyurt L, Orhan Z, Güngör T (2007) Towards combining rule-based and statistical part of speech tagging in agglutinative languages. Comput Eng 1(1):66–69 Altunyurt L, Orhan Z, Güngör T (2007) Towards combining rule-based and statistical part of speech tagging in agglutinative languages. Comput Eng 1(1):66–69
20.
go back to reference Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ
21.
go back to reference Sabeti V, Mahoor Z, Palhang M (2007) Token tagging in Persian language in probability and transformational-based method. In: Fifteenth conference on electrical engineering Iran, Tehran Sabeti V, Mahoor Z, Palhang M (2007) Token tagging in Persian language in probability and transformational-based method. In: Fifteenth conference on electrical engineering Iran, Tehran
22.
go back to reference Moghadam M, Jafarpour N (2021) A Survey of part of speech tagging of latin and non-latin script languages: a more vivid view on Persian. Lang Art 6(1):75–90 Moghadam M, Jafarpour N (2021) A Survey of part of speech tagging of latin and non-latin script languages: a more vivid view on Persian. Lang Art 6(1):75–90
23.
go back to reference Tasharofi S, Raja F, Oroumchian F, Rahgozar M (2007) Evaluation of statistical part of speech tagging of Persian text. In: International symposium on signal processing and its application, Sharjah, United Arab Emirates Tasharofi S, Raja F, Oroumchian F, Rahgozar M (2007) Evaluation of statistical part of speech tagging of Persian text. In: International symposium on signal processing and its application, Sharjah, United Arab Emirates
24.
go back to reference Amiri H, Hojjat H, Oroumchian F (2007) Investigation on a feasible corpus for Persian POS tagging. In: Tewelfth International CSI Computer Conference(CSICC), Tehran Amiri H, Hojjat H, Oroumchian F (2007) Investigation on a feasible corpus for Persian POS tagging. In: Tewelfth International CSI Computer Conference(CSICC), Tehran
25.
go back to reference Mohtarami M, Amiri H, Oroumchian F, Rahgozar M (2008) Using heuristic rules to improve Persian part of speech tagging accuracy. In: International conference on information and knowledge engineering, California, USA Mohtarami M, Amiri H, Oroumchian F, Rahgozar M (2008) Using heuristic rules to improve Persian part of speech tagging accuracy. In: International conference on information and knowledge engineering, California, USA
26.
go back to reference Oroumchian F, Tasharofi S, Amiri H, Hojjat H, Raja F (2006) Creating a feasible corpus for Persian POS tagging, UOWD Technical Report, University of Wollongong(Dubai Campus) Oroumchian F, Tasharofi S, Amiri H, Hojjat H, Raja F (2006) Creating a feasible corpus for Persian POS tagging, UOWD Technical Report, University of Wollongong(Dubai Campus)
27.
go back to reference Okhovvat M, Sharifi M, Minaei Bidgoli B (2020) An accurate Persian part-of-speech tagger. Comput Syst Sci Eng 35:423–430CrossRef Okhovvat M, Sharifi M, Minaei Bidgoli B (2020) An accurate Persian part-of-speech tagger. Comput Syst Sci Eng 35:423–430CrossRef
28.
go back to reference Badpeima M, Hourali F, Hourali M (2019) Part of speech tagging of Persian Language using fuzzy network model. Signal and Data Process 15:123–130CrossRef Badpeima M, Hourali F, Hourali M (2019) Part of speech tagging of Persian Language using fuzzy network model. Signal and Data Process 15:123–130CrossRef
29.
go back to reference DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computat Linguist J 14(1):31–39 DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computat Linguist J 14(1):31–39
30.
go back to reference Assi M, Haji Abdolhosseini M (2000) Grammatical tagging of a Persian corpus. Proc Int J Corpus linguist 5:69–82CrossRef Assi M, Haji Abdolhosseini M (2000) Grammatical tagging of a Persian corpus. Proc Int J Corpus linguist 5:69–82CrossRef
31.
go back to reference Brants T (2000) "TnT: a statistical part-of-speech tagger. In: Sixth conference on Applied Natural Language Processing (ANLP), Seattle Brants T (2000) "TnT: a statistical part-of-speech tagger. In: Sixth conference on Applied Natural Language Processing (ANLP), Seattle
32.
go back to reference Jabbari S, Allison B (2007) Persian part of speech tagging. In: CAASL-2 proceedings, London Jabbari S, Allison B (2007) Persian part of speech tagging. In: CAASL-2 proceedings, London
33.
go back to reference Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565 Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565
34.
go back to reference Hepple M (2000) Independence and commitment: assumptions for rapid training and execution of rule-based part-of- speech taggers. In: Proceedings of the 38th annual meeting of the association for computational linguistics, Hong Kong Hepple M (2000) Independence and commitment: assumptions for rapid training and execution of rule-based part-of- speech taggers. In: Proceedings of the 38th annual meeting of the association for computational linguistics, Hong Kong
35.
go back to reference Raja F, Amiri H, Tasharofi S, Sarmadi M, Hojjat H, Oroumchian F (2007) Evaluation of part of speech tagging on Persian text. In: Proceedings of the second workshop on computational approaches to Arabic script-based languages, Stanford, California Raja F, Amiri H, Tasharofi S, Sarmadi M, Hojjat H, Oroumchian F (2007) Evaluation of part of speech tagging on Persian text. In: Proceedings of the second workshop on computational approaches to Arabic script-based languages, Stanford, California
36.
go back to reference Fadaei H, Shamsfard M (2010) Persian POS tagging using probabilistic morphological analysis. Int J Comput Appl Technol 38(4):264–273CrossRef Fadaei H, Shamsfard M (2010) Persian POS tagging using probabilistic morphological analysis. Int J Comput Appl Technol 38(4):264–273CrossRef
37.
go back to reference Keikha M, Mahdikhani F, Oroumchian F, Khansari A (2007) Designing of tree-based POS tagger. In: Fifteenth conference on computer engineering, Tehran Keikha M, Mahdikhani F, Oroumchian F, Khansari A (2007) Designing of tree-based POS tagger. In: Fifteenth conference on computer engineering, Tehran
38.
go back to reference Razi Perjikolaei B, Eshghi M (2012) Designing of a part of speech (POS) tagger based on the neural network for Persian language. In: Twentieth national conference on electricity engineering, Iran, Tehran Razi Perjikolaei B, Eshghi M (2012) Designing of a part of speech (POS) tagger based on the neural network for Persian language. In: Twentieth national conference on electricity engineering, Iran, Tehran
39.
go back to reference Tamadon DYM, Abbasi Dezfuli M (2013) Proposing a method for part of speech tagging in Persian language. In: First national conference on innovation in i computer engineering and information and technology, Tonekabon Tamadon DYM, Abbasi Dezfuli M (2013) Proposing a method for part of speech tagging in Persian language. In: First national conference on innovation in i computer engineering and information and technology, Tonekabon
40.
go back to reference Koochari A, Alavi Gharahbagh A, Hajihashemi V (2020) A Persian part of speech tagging system using the long short-term memory neural network. In 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran Koochari A, Alavi Gharahbagh A, Hajihashemi V (2020) A Persian part of speech tagging system using the long short-term memory neural network. In 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran
Metadata
Title
Part of Speech Tagging Using Part of Speech Sequence Graph
Authors
Pejman Gholami-Dastgerdi
Mohammad-Reza Feizi-Derakhshi
Publication date
14-11-2021
Publisher
Springer Berlin Heidelberg
Published in
Annals of Data Science / Issue 5/2023
Print ISSN: 2198-5804
Electronic ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-021-00359-4

Other articles of this Issue 5/2023

Annals of Data Science 5/2023 Go to the issue

Premium Partner