nach oben

Annals of Data Science

Erschienen in:

14.11.2021

Part of Speech Tagging Using Part of Speech Sequence Graph

verfasst von: Pejman Gholami-Dastgerdi, Mohammad-Reza Feizi-Derakhshi

Erschienen in: Annals of Data Science | Ausgabe 5/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Part of speech tagging is one of the most fundamental needs of intelligent text processing, which is assigning the most appropriate grammatical category to each word on the text. Hence, provision of a tagger with high accuracy for the Persian language is the major priority of this article. Numerous other methods of POS tagging have already been presented in a way that each one has been applied in taggers to achieve high performance and accuracy. Statistical methods known as a primary technique and one of the most important issues in POS tagging systems is identifying unknown words. This paper investigates all tags that the Maximum Likelihood Estimation method assigns the words existing in the text (including known and unknown) by proposing a graph-based method and correcting them. To do so, a graph is created from the training corpus including the part of speech sequence in the sentences. Then, sentences tagged with Maximum Likelihood Estimation will be corrected by traversing the graph. It should be noted that different methods have been proposed, implemented, and evaluated for tagging using graphs. Next, by investigating pros and cons, a method is proposed which tags the unknown words with the accuracy of 86.84% and the known words with the accuracy of 97.54%. In conclusion, the overall accuracy of the method is calculated as 96.78%, which is an improvement in comparison to the Maximum Likelihood Estimation method and consequently, the graph method shows an acceptable performance in part of speech tagging and is more reliable.

Vorheriger Artikel A Novel Test Statistic for Right Censored Validity under a new Chen extension with Applications in Reliability and Medicine

Nächster Artikel Inference on a Multicomponent Stress-Strength Model Based on Unit-Burr III Distributions

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Maximum Likelihood Estimation.

A popular language in India.

Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef

Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York

Màrquez L, Carreras X, Litkowski KC, Stevenson S (2008) Semantic role labeling: an introduction to the special issue. Comput Linguist 34(2):145–159. https://doi.org/10.1162/coli.2008.34.2.145CrossRef

Liu F, Shi Y (2020) Investigating laws of intelligence based on AI IQ research. Ann Data Sci 7:399–416CrossRef

Mirzanezhad Z, Feizi-Derakhshi MR (2016) Using morphological analyzer to statistical POS tagging on Persian text. Int J Comput Sci Inf Secur (IJCSIS) 14(8)

Dhumal Deshmukh R, Kiwelekar A (2020) Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA)

Heyan H, Xiaofei Z (2009) Part-of-speech tagger based on maximum entropy model. In: International conference on intelligent human-machine systems and cybernetics

Mohseni M, Minaei-bidgoli B (2010) A Persian part-of-speech tagger based on morphological analysis. In: European Language Resources Association (ELRA), Valletta, Malta

Ghayoomi M (2017) A comparative study on the impact of part-of-speech tagging on parsing. Persian Lang Process 13(4):121–132

10.

Jadidinejad AH, Mahmudi F (2008) Evaluating part-of-speech tags in indexing and precision for Persian text retrieval. In: Second Iran data mining conference, Tehran.

11.

Zhao F, Quan B, Yang J, Chen J, Zhang Y, Wang X (2019) Document summarization using word and part-of-speech based on attention mechanism. J Phys Conf Ser 1168(3):032008

12.

Suzuki M, Komiya K, Sasaki M, Shinnou H (2018) Fine-tuning for named entity recognition using part-of-speech tagging. In Proceedings of the 32nd pacific asia conference on language, Hong Kong

13.

Elahimanesh M, Minaei Bidgoli B (2011) Improvement of the Persian texts unknown words enunciations by the help of association rules. In: Seventeenth national conference of iran computer association, Tehran

14.

Assi M (2003) From lingual corpuses to corpus linguistics. In Fifth linguistics conference, Tehran

15.

BijanKhan M (2004) The role of the corpus in writing a grammar: an introduction to a software. Iran J Linguist 2:19

16.

Mirdamadi M, Zaree Bidaki A, Rezaeyan M (2012) Persian text statistical tagging for using in search engines. In: First international conference on Persian handwriting and language, Semnan

17.

Rahati Quchani S, Azimizadeh A, Arab M (2007) Persian words part of speech tagging by the help of Markov hidden model. In: Thirteenth Iran national conference on computer, Persian Gulf, Kish Island

18.

Hamidi M, Khalili S, Alighardash E, Pilevar A (2011) Persian tagging based on new structural rules. In: Nineteenth conference on electricity engineering, Iran, Tehran, Amir Kabir Industrial University, Persian

19.

Altunyurt L, Orhan Z, Güngör T (2007) Towards combining rule-based and statistical part of speech tagging in agglutinative languages. Comput Eng 1(1):66–69

20.

Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ

21.

Sabeti V, Mahoor Z, Palhang M (2007) Token tagging in Persian language in probability and transformational-based method. In: Fifteenth conference on electrical engineering Iran, Tehran

22.

Moghadam M, Jafarpour N (2021) A Survey of part of speech tagging of latin and non-latin script languages: a more vivid view on Persian. Lang Art 6(1):75–90

23.

Tasharofi S, Raja F, Oroumchian F, Rahgozar M (2007) Evaluation of statistical part of speech tagging of Persian text. In: International symposium on signal processing and its application, Sharjah, United Arab Emirates

24.

Amiri H, Hojjat H, Oroumchian F (2007) Investigation on a feasible corpus for Persian POS tagging. In: Tewelfth International CSI Computer Conference(CSICC), Tehran

25.

Mohtarami M, Amiri H, Oroumchian F, Rahgozar M (2008) Using heuristic rules to improve Persian part of speech tagging accuracy. In: International conference on information and knowledge engineering, California, USA

26.

Oroumchian F, Tasharofi S, Amiri H, Hojjat H, Raja F (2006) Creating a feasible corpus for Persian POS tagging, UOWD Technical Report, University of Wollongong(Dubai Campus)

27.

Okhovvat M, Sharifi M, Minaei Bidgoli B (2020) An accurate Persian part-of-speech tagger. Comput Syst Sci Eng 35:423–430CrossRef

28.

Badpeima M, Hourali F, Hourali M (2019) Part of speech tagging of Persian Language using fuzzy network model. Signal and Data Process 15:123–130CrossRef

29.

DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computat Linguist J 14(1):31–39

30.

Assi M, Haji Abdolhosseini M (2000) Grammatical tagging of a Persian corpus. Proc Int J Corpus linguist 5:69–82CrossRef

31.

Brants T (2000) "TnT: a statistical part-of-speech tagger. In: Sixth conference on Applied Natural Language Processing (ANLP), Seattle

32.

Jabbari S, Allison B (2007) Persian part of speech tagging. In: CAASL-2 proceedings, London

33.

Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565

34.

Hepple M (2000) Independence and commitment: assumptions for rapid training and execution of rule-based part-of- speech taggers. In: Proceedings of the 38th annual meeting of the association for computational linguistics, Hong Kong

35.

Raja F, Amiri H, Tasharofi S, Sarmadi M, Hojjat H, Oroumchian F (2007) Evaluation of part of speech tagging on Persian text. In: Proceedings of the second workshop on computational approaches to Arabic script-based languages, Stanford, California

36.

Fadaei H, Shamsfard M (2010) Persian POS tagging using probabilistic morphological analysis. Int J Comput Appl Technol 38(4):264–273CrossRef

37.

Keikha M, Mahdikhani F, Oroumchian F, Khansari A (2007) Designing of tree-based POS tagger. In: Fifteenth conference on computer engineering, Tehran

38.

Razi Perjikolaei B, Eshghi M (2012) Designing of a part of speech (POS) tagger based on the neural network for Persian language. In: Twentieth national conference on electricity engineering, Iran, Tehran

39.

Tamadon DYM, Abbasi Dezfuli M (2013) Proposing a method for part of speech tagging in Persian language. In: First national conference on innovation in i computer engineering and information and technology, Tonekabon

40.

Koochari A, Alavi Gharahbagh A, Hajihashemi V (2020) A Persian part of speech tagging system using the long short-term memory neural network. In 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran

Titel: Part of Speech Tagging Using Part of Speech Sequence Graph
verfasst von: Pejman Gholami-Dastgerdi
Mohammad-Reza Feizi-Derakhshi
Publikationsdatum: 14.11.2021
Verlag: Springer Berlin Heidelberg
Erschienen in: Annals of Data Science / Ausgabe 5/2023
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI: https://doi.org/10.1007/s40745-021-00359-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 5/2023

Optimization on the Turning Process Parameters of SS 304 Using Taguchi and TOPSIS

Inference on a Multicomponent Stress-Strength Model Based on Unit-Burr III Distributions

A Novel Test Statistic for Right Censored Validity under a new Chen extension with Applications in Reliability and Medicine

Forecasting Directional Movement of Stock Prices using Deep Learning

Optimization of Tool Life, Surface Roughness and Production Time in CNC Turning Process Using Taguchi Method and ANOVA

Competing Risk Analysis in Constant Stress Partially Accelerated Life Tests Under Censored Information

Premium Partner