Introduction
Methodology
POS tagging approaches
Rule based
Artificial neural network
Hidden Markov Model
Maximum Entropy Markov Model
Artificial intelligence methods for POS tagging
Machine Learning Algorithms
Naive Bayes
Support vector machine
Conditional random field (CRF)
Hidden Markov model (HMM)
Deep learning algorithms
Multilayer perceptron (MLP)
Long short-term memory
Bidirectional long short-term memory
Gate recurrent unit
Feed-forward neural network
Recurrent neural network (RNN)
Deep neural network
Convolutional neural network
Evaluation metrics
-
Precision: The ratio of correctly tagged part of speech to all the samples tagged words:$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$
-
Recall: The ratio of all samples correctly tagged as tagged to all the samples that are tagged by expert (aka a Detection Rate).$${\text{Detection Rate}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
-
False alarm rate: the false positive rate is defined as the ratio of wrongly tagged word samples to all the samples.$${\text{False Alarm Rate = }}\frac{{{\text{FP}}}}{{{\text{FP}} + {\text{TN}}}}$$
-
True negative rate: The ratio of the number of correctly tagged samples to all the samples.$${\text{True Negative Rate}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}$$
-
Accuracy: The ratio of correctly tagged part of speech to the total number of instances (aka Detection accuracy).$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
-
F-Measure: It is the harmonic mean of the Precision and Recall.$${\text{F - Measure}} = 2\frac{{\left( {{\text{Precision}} \times {\text{Recall}}} \right)}}{{{\text{Precision}} + {\text{Recall}}}}$$
Remarks, challenges, and future trends
Observations and state of art
Study | Strength | Weakness |
---|---|---|
Kumar et al. [72] | Propose a deep learning approach for POS tagging and compares the deep learning sequential models to find the suitable method for POS tagging at word level and character level. The tagged corpus was experimented and evaluated with different models like bidirectional LSTM (BLSTM), recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent units (GRU). On the other hand, to get a better result, experiments were conducted using both character and word levels at different hidden states. The experimental result shows that BLSTM achieves the highest evaluation metrics. It achieves 0.8748 for precision, an f1-measure of 0.8739, 0.8757 for recall, and 0.8757 for accuracy | The proposed model is tested with a small corpus size. The proposed model is misclassified when there is the presence of unwanted symbols appended to words in both words and character level embedding. The performance of the proposed model doesn't compare with the state-of-the-art works |
Mohammed [51] | Propose an efficient statistical POS tagger for the Somali language by adopting HMM and CRF and neural network methods of machine learning approaches. They prepare the corpus, which consists of 14,369 tokens representing 1234 sentences and 24 tagsets. All POS tagger scores 87.51% average accuracy using a tenfold cross-validation | The corpus used for the experiment is not a standard corpus. And also, the size of the data is not enough to train algorithms. The accuracy of the tagger is also not good compared to previous works |
Besharati et al. [54] | They proposed a multi-layer perceptron and long short-term memory neural network approaches, which are an efficient approach on their high generality capability, to assign the appropriate tags for both out of vocabulary and in-vocabulary words. This hybrid model is better in improving the prediction accuracy to 97.29% | Since the dataset used is not enough for training a neural network, the proposed approach was not achieved high accuracy to extract word vectors |
Hirpssa and Lehal [39] | A machine learning approach has been proposed to develop the Amharic POS tagger. They compared HMM-based Trigrams'n'Tags (TnT), Conditional Random Field (CRF), and Naive Bays (NB) based tagger. They have used the existing ELRC corpus with 210 K token by incorporating a manually tagged corpus with 31 tags. The experiment result shows that CRF-based Amharic POS tagger achieved an average accuracy of 94.08%, which is a better performance compared to others | However, CRF-based taggers performed better; their performance is not significantly improved compared with state-of-the-art CRF-based POS taggers. The amount and type of feature set are not enough to improve the performance of the tagger |
Anastasyev et al. [59] | A Feedforward neural network method was proposed for character-level word representation to provide better results in terms of speed and performance. And also deployed loss forces as a model to learn the dependencies to make the learning process. The proposed model shows an accuracy of 96.46%, 97.97%, and 95.64% on modern literature, news, and Vkontakte, respectively | The final results achieved by the proposed approach are not significantly better than previous works. And the model also has poorer performance than the best model on the deployed data set |
Mishra [66] | They proposed a machine learning and neural network model to implement a statistical POS tagger for Kannada. The strength of this work is that they have developed a generic POS tagger, then compared with the performances of various modeling techniques, and also both character and word embedding are explored for Kannada POS tagging. The proposed model outperforms the previous Kannada POS tagger by 6% | From the result, it was observed that there are more ambiguities in predictions like ambiguities between finite verbs and common nouns; common nouns and adverbs. These problems are faced due to the inconsistency in the labeling of training data. Although the model outperforms the state of arts POS tagger in Kannada, the performance of the model achieved is 92.31% accuracy which is much less than works in the POS tagging field |
Gashaw and Shashirekha [46] | They have examined and obtained significant performance differences compared to previous works using morphological knowledge, previously used dataset, similar feature extraction, and parameter tuning by deploying a grid search and tagging algorithms. And also used different corpus for experimenting the algorithms. The proposed approach scores an average accuracy of 86.44 for ELRC, 95.87 for ELEC-Extended, and 92.27 for ELRCQB tagsets. The experimental result shows extending the tagset can increase the accuracy by 9.43, which is a significant performance | The developed tagsets are not verified by the linguistic expert. So, the performance of the tagger was affected. For instance, the tagger has a problem in identifying the name of people and places |
Khan et al. [33] | Developed an Urdu POS tagger using both machine and deep learning approaches under language-dependent feature sets with two datasets, which then compared the effectiveness of both approaches. Based on the experiments, the CRF-based model performs better compare to RNNs, SVM, and n-gram techniques on CLE dataset, whereas the DRRN approach outperforms others with BJ dataset | The researchers experimented with the models with labeled datasets and also used simple feature sets, which work easily with the simplest algorithms |
Singh et al. [56] | They proposed deep learning approaches to develop a Hindi POS tagger. They have experimented with a large corpus consisting of 50,000 hind-tagged sentences. Based on the experiment, the proposed model achieved 97.05% average tagging accuracy | The study uses a manually annotated corpus for training and does not compare with previously proposed works |
\Baig et al. [70] | They proposed a statistical data-driven method to design and implement an Urdu POS tagging model using Urdu tweets. They combined the existing annotated tweets corpus with new tagsets constructed for POS tagging. They have also solved a shortage of corpus using a supervised bootstrapping technique. The new POS tagger shows an accuracy of 93.8% precision, 92.9% recall, and 93.3% F-measure | The corpus used in the experiments is not a standard corpus and is prepared from tweeter only. The other limitation is the performance of the new model is not compared with the state of arts |
Bonchanoski and Zdravkova [71] | Proposed an automatic POS tagger for Macedonian language. One of the strengths of the proposed work is that they used a combined dataset of available online lexicon with a self-created crowdsourcing corpus. They implemented and compared TnT tagger, averaged perceptron, cyclic dependency network, and guided learning framework for tagging. But they have not achieved a better result in terms of tagging accuracy. The accuracy that was achieved is 96.37%, which is reaching a result comparable to more experimented languages | They compare only the proposed models, but it would be better to compare previously proposed works. And also, the corpus was created using crowdsourcing, so the dataset needs to be checked by experts |
Sarbin et al. [61] | Long Short-Term Memory (LSTM), and Bi-directional Long Short-Term Memory (BiLSTM), Simple Recurrent Neural Network (RNN), and Gated Recurrent Unit (GRU) based POS tagger for Nepali was implemented and compared. The algorithms are trained and tested on Nepali tagset; accordingly Bi-directional LSTM performs better than the other three algorithms with a testing accuracy of 97.27% | The researchers use small datasets for training and testing sets compared to the previous works. It is not compared with previous works |
Kumar et al. [69] | Proposed a DL-based POS tagging for Malayalam twitter data using sequential deep learning methods such as Bidirectional LSTM (BLSTM), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU). They trained the model to tag tweets both at word-level and character-level. And also, the models are trained by changing the hidden states, in which they found that when the hidden states increase, the performance of the tagger increases. Bidirectional LSTM achieves better tagging accuracy of 92.58% | The researchers use unrefined and rough tagsets, which are previously available tagsets from previous works. Besides, the corpus used is not enough to train deep learning algorithms and the tagger is only developed on the tweeter corpus |
Kabir et al. [73] | They build a Bengali POS tagger using the Deep Learning approach, particularly deep belief network. They have created a word dictionary for POS tagging by using the corpus. The dictionary constructed from POS tagging can minimize the ambiguity of tagging processes. The deep learning-based Bengali POS tagger scores 93.33% accuracy on the corpus | The study uses a corpus prepared by Microsoft Research India as a part of the Indian Language Part-of-Speech Tagset (IL-POST) project. The corpus was prepared based on IL-POST for Indian Languages. The corpus used for experiments is not enough. Since there is a class imbalance in the corpus, there are zero accuracies for some classes, and also, the proposed model didn't compare with previous works |
Alharbi et al. [1] | They proposed a POS tagging for Arabic Gulf Dialect using Bi-LSTM. Support Vector Machine (SVM) classifier and bi-directional Long Short-Term Memory (Bi-LSTM) machine learning methods are applied for sequence modeling purposes. The POS tagging model was improved from a 75% state-of-the-art POS tagger to over 91% accuracy for Gulf dialect using a Bi-LSTM. Also, they prepare a POS tagging dataset and multiple sets of features for testing the models | The models are tested on the existing dataset, which is not suitable for the experiment. And also, dataset was not verified by language experts. The feature sets are constructed without consultation with language experts |
Meftah et al. [65] | They proposed a neural network-based POS tagging for social media content such as Facebook, tweets and forums. They have used the transfer learning technique to alleviate the unavailability of enough annotated corpus created from social media content. The POS tagging model was developed based on five languages, namely English, German, French, Italian, and Spanish. Also, the proposed model used both word-level and character representations by combining pre-trained embedding like GloVe, Word2Vec, and FastText for word-level representation. A cross-task transfer learning on those multiple social media languages was efficient. The proposed approach achieves 91.03%, 90.33%, and 89.66% for Spanish, German and Italian, respectively | The use of rough texts directly taken from social media that might affect the performance of the tagger model. Better to take a thorough pre-processing task on the texts. And the use of one language corpus using transfer learning to develop a POS tagger model for another language might not give an expected result |
Argaw [55] | Develop POS tagging for Amharic language using a deep learning approach. They experimented with three algorithms such as bidirectional Long Short-Term Memory (Bi-LSTM), Long Short-Term Memory (LSTM), and recurrent neural networks (RNNs) to develop the model. An automatically generated neural word embedding is used as a feature to address the use of hand-crafted features for developing a POS tagging model. The empirical result shows 93.67% F-measure using Bi-LSTM recurrent neural network | The study uses the existing corpus used in previous related works in the Amharic language, which is a medium-sized corpus not enough for deep learning approaches. And also, the corpus used in the study is of lower quality. They didn't compare the performance of the model with the previous works experimented with hand-crafted features |
Deshmukh and Kiwelekar [14] | Propose a bidirectional long short-term memory (Bi-LSTM) and deep learning model to develop a POS tagging for Marathi language text. They tried to develop Bi-LSTM and deep learning-based POS tagging models based on three folds validation. Based on the experiment Bi-LSTM and deep learning model achieved an accuracy of 97% and 85%, respectively. And also, the proposed BI-LSTM and deep learning models are compared with machine learning techniques like naïve Bayes, Hidden Markov model, K nearest neighbor (KNN), random forest, conditional random fields, and neural network on the same dataset | The experiments are conducted with 1500 sentences consisting of 10,115 words which are quite smaller for modeling deep learning and Bi-LSTM methods. And the proposed models are not compared with the state-of-art works in the same field of study |
Prabha et al. [60] | Develop a deep learning-based POS tagger for Nepali language using Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU), Recurrent Neural Network (RNN), and their bidirectional variants. They have deployed the word-level representations. Bi-directional versions of the POS tagger model achieved the maximum performance scores, which shows significant improvement and performs better than the previous POS taggers with 99% tagging accuracy | The corpus used for this research is from the Center for Research in Urdu Language Processing (CRULP). The corpus used for developing a Nepali POS tagger is translated from English i.e., PENN Treebank corpus. The use of different language resources for building other languages POS tagger models may not be advisable because of the difference in nature of languages |
Srivastava et al. [63] | Presented an unsupervised DL-based POS tagging for Sanskrit language. Instead of traditional Word2Vec implementations, character level n-grams implementation was used. They use a BiLSTM autoencoder, and a POS tagging accuracy of 93.2% is achieved | They used much less annotated Sanskrit corpus, which is 115,000 words prepared by JNU. The corpus used is not sufficient to experiment unsupervised deep learning approaches |
Attia et al. [62] | Develop Awngi language parts of speech tagger using Hidden Markov Model (HMM). They created 23 hand-crafted tag sets and collected 94,000 sentences. A tenfold cross-validation mechanism was used to evaluate the performance of the Awngi HMM POS tagger. The empirical result shows that uni-gram and bi-gram taggers achieve 93.64% and 94.77% tagging accuracy, respectively | The tagger is trained with only 23 hand-crafted tagsets. And the corpus used was the first manually annotated corpus which needs expert knowledge to come with better results. And the POS tagger model doesn't compare with the previous related works experimented with using HMM |
Patoary et al. [74] | A DL-based POS tagging model for the Bengali language is proposed, basically using suffixes of the language. The experiment is conducted with a labeled corpus containing 2927 words. The proposed DL-based POS tagging model achieved an accuracy of 93.90%. And also, the deep learning model achieved better accuracy compared with previous models like rule-based and global linear models. Moreover, the proposed model is incorporated in python for the open-source Bengali NLP toolkit | One of the shortcomings of this work is that the corpus used for the experiment is not enough for modeling deep learning. The performance of the proposed model is evaluated using accuracy only. Hence the performance of the model may vary when tested with other performance metrics such as f-measure, recall, and precision |
Gopalakrishnan et al. [58] | Implement a deep neural network-based POS tagger for the biomedical domain. The experiment is conducted using LSTM, RNN, and GRU algorithms. The POS tagging is evaluated with three algorithms to come up with a better-performing POS tagging model. And Bi-directional LSTM, Bi-directional, RNN, and Bi-directional GRU were also experimented. As experiment reveals that Bi-directional LSTM, Bi-directional, RNN, and Bi-directional GRU scores better accuracy than simple LSTM, RNN, and GRU deep learning models. Since these algorithms are able to access and understand more context information from the dataset, they achieved better performance. The proposed model has achieved 94.80% of detection accuracy | All experiments are conducted on the same dataset, which is publicly available for researchers. The proposed model is not compared with the previous states-of-art works conducted with a similar domain. Also, it is better to experiment with other algorithms which may achieve a better result than the proposed model |
Bahcevan et al. [57] | Proposed a deep Neural Network Language Models for Turkish to overcome the POS tagging problem. The experiment is conducted using Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN). The performance comparison with the state-of-art methods is conducted. The experiment results reveal that LSTM outperforms RNN with an 88.7% f-measure metric | Though Long Short-Term Memory (LSTM) outperforms Recurrent Neural Network (RNN) with f-measure metric, the performance of the LSTM is not enough. It is better to experiment with other methods and compare them |
Akhil et al. [75] | A POS tagger is proposed using deep learning approaches for Malayalam. The experiments are conducted on a real dataset. The experiments are conducted using Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Recurrent Neural Networks (RNN), and Bi-directional Long Short-Term Memory (BLSTM) for implementing POS tagger. The proposed model compared with previous models and outperformed them. So, the model achieves 0.9878 of precision, 0.9788 of recall, and 0.9832 f-measure | The tagged corpus size is not enough for modeling the deep learning-based tagger. And also, the model is evaluated using precision, recall, and f-measure, but better to evaluate with accuracy also |
Study | Algorithms | Methodology | |
---|---|---|---|
ML | DL | ||
Kumar et al. [69] | √ | Recurrent neural network, long short-term memory (LSTM), gated recurrent unit, and bidirectional LSTM | |
Mohammed [51] | √ | Hidden markov model and CRF with neural network model | |
Besharati et al. [54] | √ | Single-layer and a two-layer MLP and LSTM neural network | |
Hirpssa and Lehal [39] | √ | Conditional Random Field, HMM-based, and Naive Bays | |
Anastasyev et al. [59] | √ | Feedforward Neural Network | |
Mishra [66] | √ | CRF, SVM, structured perceptron and neural network | |
Gashaw and Shashirekha [46] | √ | Machine learning algorithms (Brill, TnT, and CRFSuit Taggers) | |
Khan et al. [33] | √ | √ | CRF, support vector machine (SVM), deep recurrent neural network (DRNN), and n-gram Markov model the bigram hidden Markov model (HMM) |
Singh et al. [56] | √ | LSTM with RNN | |
Baig et al. [70] | √ | √ | TnT tagger, averaged perceptron, cyclic dependency network and guided learning framework for bidirectional sequence classification |
Bonchanoski and Zdravkova [71] | √ | Simple Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bi-directional Long Short-Term Memory (BiLSTM) | |
Sarbin et al. [61] | √ | Sequential deep learning methods (Recurrent Neural Network (RNN), Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM), Bidirectional LSTM (BLSTM) | |
Kumar et al. [72] | √ | Deep Neural Network | |
Alharbi et al. [1] | √ | √ | SVM and Bi-LSTM |
Kabir et al. [73] | √ | Convolutional Neural Network | |
Meftah et al. [65] | √ | Long Short-Term Memory (LSTM) recurrent neural networks and their bidirectional versions (Bi-LSTM RNNs) | |
Deshmukh and Kiwelekar [14] | √ | Bidirectional long short-term memory (Bi-LSTM) and deep learning | |
Argaw [55] | √ | Recurrent Neural Network (RNN), Long Short-term Memory Networks (LSTM), Gated Recurrent Unit (GRU) | |
Prabha et al. [60] | √ | Bidirectional LSTM and auto encoder | |
Srivastava et al. [63] | √ | Deep Learning (BiLSTM autoencoder) | |
Attia et al. [62] | √ | Hidden Markov Model | |
Patoary et al. [74] | √ | Deep Learning | |
Gopalakrishnan et al. [58] | √ | Deep learning (RNN, LSTM, and GRU) | |
Bahcevan et al. [57] | √ | Long Short-term Memory (LSTM) and Recurrent Neural Network (RNN) | |
Akhil et al. [75] | √ | Deep neural network (RNN, GRU, LSTM and BLSTM) |
Study | Evaluation Metrics | ||||||
---|---|---|---|---|---|---|---|
ACC | F-M | REC | PRE | ROC | FP | TP | |
Kumar et al. [72] | √ | √ | √ | √ | |||
Mohammed [51] | √ | √ | √ | ||||
Besharati et al. [54] | √ | ||||||
Hirpssa and Lehal [39] | √ | ||||||
Anastasyev et al. [59] | √ | ||||||
Mishra [66] | √ | √ | √ | ||||
Gashaw and Shashirekha [46] | √ | ||||||
Khan et al. [33] | √ | ||||||
Singh et al. [56] | √ | ||||||
Baig et al. [70] | √ | √ | √ | ||||
Bonchanoski and Zdravkova [71] | √ | ||||||
Sarbin et al. [61] | √ | ||||||
Kumar et al. [69] | √ | √ | √ | √ | |||
Kabir et al. [73] | √ | √ | √ | √ | √ | ||
Alharbi et al. [1] | √ | ||||||
Meftah et al.[65] | √ | ||||||
Argaw [55] | √ | √ | √ | √ | |||
Deshmukh and Kiwelekar [14] | √ | √ | √ | √ | |||
Prabha et al. [60] | √ | √ | √ | √ | |||
Srivastava et al.[63] | √ | ||||||
Attia et al. [62] | √ | ||||||
Patoary et al. [74] | √ | ||||||
Gopalakrishnan et al. [58] | √ | √ | √ | √ | |||
Bahcevan et al. [57] | √ | ||||||
Akhil et al. [75] | √ | √ | √ |
Research challenges
-
Lack of Enough and standard dataset: Most recent research studies indicated the unavailability of enough standard corpus for building better POS taggers for a particular language. The proposed methodologies faced difficulties in getting a balanced corpus size for some part of speech within the corpus. To come up with a better POS tagger, it needs to be trained and tested using a balanced and verified corpus. By incorporating a balanced and maximum number of tokens within a corpus, it should enable the DL and ML-based POS tagger to learn more patterns. Then the POS tagger could label words with an appropriate part of speech. But preparing a suitable language corpus is a tedious process that needs plenty of language resources and language experts' knowledge to verify. Therefore, the research challenge for developing an efficient POS tagging model is the preparation of enough and standard corpus with enough tokens of almost all balanced parts of speech. The corpus should be released publicly to help reduce the resource scarcity of the research community.
-
Lower detection accuracy: It is observed that most of the proposed POS tagging methodologies reveal lower detection accuracy of the POS tagging model as a whole, for some parts of speech tags in particular. This low detection accuracy problem is faced because of the imbalanced nature of the corpus. The ML/DL-based POS tagger trained with less frequent part of speech tags provides low detection accuracy than part of speech with more part of speech. To overcome these problems, it should come up with a balanced corpus and also an efficient technique like Synthetic Minority Over-sampling Technique (SMOTE), RandomOverSampler; which are techniques used to balance unbalanced classes of the corpus. These techniques can be used to increase the number of minority parts of speech tag instances to come up with a balanced corpus. But there is still a research gap to improve accuracy and demands more research effort in this arena.
-
Resource requirement: Most recent POS tagging methodologies proposed are based on very complex models that need high computing resources and time for processing. These can be solved by using a multi-core high-performance GPU to fasten the computation process and reduce time, but it will incur a high amount of money. The deployment of these complex models may experience an extra processing overhead that will affect the performance of the POS tagger. Besides alleviating the overhead of processing units and computational processes, the most important features must be selected to speed up the processing by using an efficient feature selection algorithm. Although various research works have been explored to come up with the best feature selection algorithm, there is still room for improvement in this direction.