Skip to main content
Erschienen in: Annals of Data Science 5/2023

14.11.2021

Part of Speech Tagging Using Part of Speech Sequence Graph

verfasst von: Pejman Gholami-Dastgerdi, Mohammad-Reza Feizi-Derakhshi

Erschienen in: Annals of Data Science | Ausgabe 5/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Part of speech tagging is one of the most fundamental needs of intelligent text processing, which is assigning the most appropriate grammatical category to each word on the text. Hence, provision of a tagger with high accuracy for the Persian language is the major priority of this article. Numerous other methods of POS tagging have already been presented in a way that each one has been applied in taggers to achieve high performance and accuracy. Statistical methods known as a primary technique and one of the most important issues in POS tagging systems is identifying unknown words. This paper investigates all tags that the Maximum Likelihood Estimation method assigns the words existing in the text (including known and unknown) by proposing a graph-based method and correcting them. To do so, a graph is created from the training corpus including the part of speech sequence in the sentences. Then, sentences tagged with Maximum Likelihood Estimation will be corrected by traversing the graph. It should be noted that different methods have been proposed, implemented, and evaluated for tagging using graphs. Next, by investigating pros and cons, a method is proposed which tags the unknown words with the accuracy of 86.84% and the known words with the accuracy of 97.54%. In conclusion, the overall accuracy of the method is calculated as 96.78%, which is an improvement in comparison to the Maximum Likelihood Estimation method and consequently, the graph method shows an acceptable performance in part of speech tagging and is more reliable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Maximum Likelihood Estimation.
 
2
A popular language in India.
 
Literatur
1.
Zurück zum Zitat Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
2.
Zurück zum Zitat Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
4.
Zurück zum Zitat Liu F, Shi Y (2020) Investigating laws of intelligence based on AI IQ research. Ann Data Sci 7:399–416CrossRef Liu F, Shi Y (2020) Investigating laws of intelligence based on AI IQ research. Ann Data Sci 7:399–416CrossRef
5.
Zurück zum Zitat Mirzanezhad Z, Feizi-Derakhshi MR (2016) Using morphological analyzer to statistical POS tagging on Persian text. Int J Comput Sci Inf Secur (IJCSIS) 14(8) Mirzanezhad Z, Feizi-Derakhshi MR (2016) Using morphological analyzer to statistical POS tagging on Persian text. Int J Comput Sci Inf Secur (IJCSIS) 14(8)
6.
Zurück zum Zitat Dhumal Deshmukh R, Kiwelekar A (2020) Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) Dhumal Deshmukh R, Kiwelekar A (2020) Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA)
7.
Zurück zum Zitat Heyan H, Xiaofei Z (2009) Part-of-speech tagger based on maximum entropy model. In: International conference on intelligent human-machine systems and cybernetics Heyan H, Xiaofei Z (2009) Part-of-speech tagger based on maximum entropy model. In: International conference on intelligent human-machine systems and cybernetics
8.
Zurück zum Zitat Mohseni M, Minaei-bidgoli B (2010) A Persian part-of-speech tagger based on morphological analysis. In: European Language Resources Association (ELRA), Valletta, Malta Mohseni M, Minaei-bidgoli B (2010) A Persian part-of-speech tagger based on morphological analysis. In: European Language Resources Association (ELRA), Valletta, Malta
9.
Zurück zum Zitat Ghayoomi M (2017) A comparative study on the impact of part-of-speech tagging on parsing. Persian Lang Process 13(4):121–132 Ghayoomi M (2017) A comparative study on the impact of part-of-speech tagging on parsing. Persian Lang Process 13(4):121–132
10.
Zurück zum Zitat Jadidinejad AH, Mahmudi F (2008) Evaluating part-of-speech tags in indexing and precision for Persian text retrieval. In: Second Iran data mining conference, Tehran. Jadidinejad AH, Mahmudi F (2008) Evaluating part-of-speech tags in indexing and precision for Persian text retrieval. In: Second Iran data mining conference, Tehran.
11.
Zurück zum Zitat Zhao F, Quan B, Yang J, Chen J, Zhang Y, Wang X (2019) Document summarization using word and part-of-speech based on attention mechanism. J Phys Conf Ser 1168(3):032008 Zhao F, Quan B, Yang J, Chen J, Zhang Y, Wang X (2019) Document summarization using word and part-of-speech based on attention mechanism. J Phys Conf Ser 1168(3):032008
12.
Zurück zum Zitat Suzuki M, Komiya K, Sasaki M, Shinnou H (2018) Fine-tuning for named entity recognition using part-of-speech tagging. In Proceedings of the 32nd pacific asia conference on language, Hong Kong Suzuki M, Komiya K, Sasaki M, Shinnou H (2018) Fine-tuning for named entity recognition using part-of-speech tagging. In Proceedings of the 32nd pacific asia conference on language, Hong Kong
13.
Zurück zum Zitat Elahimanesh M, Minaei Bidgoli B (2011) Improvement of the Persian texts unknown words enunciations by the help of association rules. In: Seventeenth national conference of iran computer association, Tehran Elahimanesh M, Minaei Bidgoli B (2011) Improvement of the Persian texts unknown words enunciations by the help of association rules. In: Seventeenth national conference of iran computer association, Tehran
14.
Zurück zum Zitat Assi M (2003) From lingual corpuses to corpus linguistics. In Fifth linguistics conference, Tehran Assi M (2003) From lingual corpuses to corpus linguistics. In Fifth linguistics conference, Tehran
15.
Zurück zum Zitat BijanKhan M (2004) The role of the corpus in writing a grammar: an introduction to a software. Iran J Linguist 2:19 BijanKhan M (2004) The role of the corpus in writing a grammar: an introduction to a software. Iran J Linguist 2:19
16.
Zurück zum Zitat Mirdamadi M, Zaree Bidaki A, Rezaeyan M (2012) Persian text statistical tagging for using in search engines. In: First international conference on Persian handwriting and language, Semnan Mirdamadi M, Zaree Bidaki A, Rezaeyan M (2012) Persian text statistical tagging for using in search engines. In: First international conference on Persian handwriting and language, Semnan
17.
Zurück zum Zitat Rahati Quchani S, Azimizadeh A, Arab M (2007) Persian words part of speech tagging by the help of Markov hidden model. In: Thirteenth Iran national conference on computer, Persian Gulf, Kish Island Rahati Quchani S, Azimizadeh A, Arab M (2007) Persian words part of speech tagging by the help of Markov hidden model. In: Thirteenth Iran national conference on computer, Persian Gulf, Kish Island
18.
Zurück zum Zitat Hamidi M, Khalili S, Alighardash E, Pilevar A (2011) Persian tagging based on new structural rules. In: Nineteenth conference on electricity engineering, Iran, Tehran, Amir Kabir Industrial University, Persian Hamidi M, Khalili S, Alighardash E, Pilevar A (2011) Persian tagging based on new structural rules. In: Nineteenth conference on electricity engineering, Iran, Tehran, Amir Kabir Industrial University, Persian
19.
Zurück zum Zitat Altunyurt L, Orhan Z, Güngör T (2007) Towards combining rule-based and statistical part of speech tagging in agglutinative languages. Comput Eng 1(1):66–69 Altunyurt L, Orhan Z, Güngör T (2007) Towards combining rule-based and statistical part of speech tagging in agglutinative languages. Comput Eng 1(1):66–69
20.
Zurück zum Zitat Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ
21.
Zurück zum Zitat Sabeti V, Mahoor Z, Palhang M (2007) Token tagging in Persian language in probability and transformational-based method. In: Fifteenth conference on electrical engineering Iran, Tehran Sabeti V, Mahoor Z, Palhang M (2007) Token tagging in Persian language in probability and transformational-based method. In: Fifteenth conference on electrical engineering Iran, Tehran
22.
Zurück zum Zitat Moghadam M, Jafarpour N (2021) A Survey of part of speech tagging of latin and non-latin script languages: a more vivid view on Persian. Lang Art 6(1):75–90 Moghadam M, Jafarpour N (2021) A Survey of part of speech tagging of latin and non-latin script languages: a more vivid view on Persian. Lang Art 6(1):75–90
23.
Zurück zum Zitat Tasharofi S, Raja F, Oroumchian F, Rahgozar M (2007) Evaluation of statistical part of speech tagging of Persian text. In: International symposium on signal processing and its application, Sharjah, United Arab Emirates Tasharofi S, Raja F, Oroumchian F, Rahgozar M (2007) Evaluation of statistical part of speech tagging of Persian text. In: International symposium on signal processing and its application, Sharjah, United Arab Emirates
24.
Zurück zum Zitat Amiri H, Hojjat H, Oroumchian F (2007) Investigation on a feasible corpus for Persian POS tagging. In: Tewelfth International CSI Computer Conference(CSICC), Tehran Amiri H, Hojjat H, Oroumchian F (2007) Investigation on a feasible corpus for Persian POS tagging. In: Tewelfth International CSI Computer Conference(CSICC), Tehran
25.
Zurück zum Zitat Mohtarami M, Amiri H, Oroumchian F, Rahgozar M (2008) Using heuristic rules to improve Persian part of speech tagging accuracy. In: International conference on information and knowledge engineering, California, USA Mohtarami M, Amiri H, Oroumchian F, Rahgozar M (2008) Using heuristic rules to improve Persian part of speech tagging accuracy. In: International conference on information and knowledge engineering, California, USA
26.
Zurück zum Zitat Oroumchian F, Tasharofi S, Amiri H, Hojjat H, Raja F (2006) Creating a feasible corpus for Persian POS tagging, UOWD Technical Report, University of Wollongong(Dubai Campus) Oroumchian F, Tasharofi S, Amiri H, Hojjat H, Raja F (2006) Creating a feasible corpus for Persian POS tagging, UOWD Technical Report, University of Wollongong(Dubai Campus)
27.
Zurück zum Zitat Okhovvat M, Sharifi M, Minaei Bidgoli B (2020) An accurate Persian part-of-speech tagger. Comput Syst Sci Eng 35:423–430CrossRef Okhovvat M, Sharifi M, Minaei Bidgoli B (2020) An accurate Persian part-of-speech tagger. Comput Syst Sci Eng 35:423–430CrossRef
28.
Zurück zum Zitat Badpeima M, Hourali F, Hourali M (2019) Part of speech tagging of Persian Language using fuzzy network model. Signal and Data Process 15:123–130CrossRef Badpeima M, Hourali F, Hourali M (2019) Part of speech tagging of Persian Language using fuzzy network model. Signal and Data Process 15:123–130CrossRef
29.
Zurück zum Zitat DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computat Linguist J 14(1):31–39 DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computat Linguist J 14(1):31–39
30.
Zurück zum Zitat Assi M, Haji Abdolhosseini M (2000) Grammatical tagging of a Persian corpus. Proc Int J Corpus linguist 5:69–82CrossRef Assi M, Haji Abdolhosseini M (2000) Grammatical tagging of a Persian corpus. Proc Int J Corpus linguist 5:69–82CrossRef
31.
Zurück zum Zitat Brants T (2000) "TnT: a statistical part-of-speech tagger. In: Sixth conference on Applied Natural Language Processing (ANLP), Seattle Brants T (2000) "TnT: a statistical part-of-speech tagger. In: Sixth conference on Applied Natural Language Processing (ANLP), Seattle
32.
Zurück zum Zitat Jabbari S, Allison B (2007) Persian part of speech tagging. In: CAASL-2 proceedings, London Jabbari S, Allison B (2007) Persian part of speech tagging. In: CAASL-2 proceedings, London
33.
Zurück zum Zitat Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565 Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565
34.
Zurück zum Zitat Hepple M (2000) Independence and commitment: assumptions for rapid training and execution of rule-based part-of- speech taggers. In: Proceedings of the 38th annual meeting of the association for computational linguistics, Hong Kong Hepple M (2000) Independence and commitment: assumptions for rapid training and execution of rule-based part-of- speech taggers. In: Proceedings of the 38th annual meeting of the association for computational linguistics, Hong Kong
35.
Zurück zum Zitat Raja F, Amiri H, Tasharofi S, Sarmadi M, Hojjat H, Oroumchian F (2007) Evaluation of part of speech tagging on Persian text. In: Proceedings of the second workshop on computational approaches to Arabic script-based languages, Stanford, California Raja F, Amiri H, Tasharofi S, Sarmadi M, Hojjat H, Oroumchian F (2007) Evaluation of part of speech tagging on Persian text. In: Proceedings of the second workshop on computational approaches to Arabic script-based languages, Stanford, California
36.
Zurück zum Zitat Fadaei H, Shamsfard M (2010) Persian POS tagging using probabilistic morphological analysis. Int J Comput Appl Technol 38(4):264–273CrossRef Fadaei H, Shamsfard M (2010) Persian POS tagging using probabilistic morphological analysis. Int J Comput Appl Technol 38(4):264–273CrossRef
37.
Zurück zum Zitat Keikha M, Mahdikhani F, Oroumchian F, Khansari A (2007) Designing of tree-based POS tagger. In: Fifteenth conference on computer engineering, Tehran Keikha M, Mahdikhani F, Oroumchian F, Khansari A (2007) Designing of tree-based POS tagger. In: Fifteenth conference on computer engineering, Tehran
38.
Zurück zum Zitat Razi Perjikolaei B, Eshghi M (2012) Designing of a part of speech (POS) tagger based on the neural network for Persian language. In: Twentieth national conference on electricity engineering, Iran, Tehran Razi Perjikolaei B, Eshghi M (2012) Designing of a part of speech (POS) tagger based on the neural network for Persian language. In: Twentieth national conference on electricity engineering, Iran, Tehran
39.
Zurück zum Zitat Tamadon DYM, Abbasi Dezfuli M (2013) Proposing a method for part of speech tagging in Persian language. In: First national conference on innovation in i computer engineering and information and technology, Tonekabon Tamadon DYM, Abbasi Dezfuli M (2013) Proposing a method for part of speech tagging in Persian language. In: First national conference on innovation in i computer engineering and information and technology, Tonekabon
40.
Zurück zum Zitat Koochari A, Alavi Gharahbagh A, Hajihashemi V (2020) A Persian part of speech tagging system using the long short-term memory neural network. In 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran Koochari A, Alavi Gharahbagh A, Hajihashemi V (2020) A Persian part of speech tagging system using the long short-term memory neural network. In 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran
Metadaten
Titel
Part of Speech Tagging Using Part of Speech Sequence Graph
verfasst von
Pejman Gholami-Dastgerdi
Mohammad-Reza Feizi-Derakhshi
Publikationsdatum
14.11.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 5/2023
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-021-00359-4

Weitere Artikel der Ausgabe 5/2023

Annals of Data Science 5/2023 Zur Ausgabe

Premium Partner