In recent years, artificial intelligence (AI) has brought about significant changes in the domain of information technology and other disciplines, such as the use and development of intelligent transportation systems, virtual personal assistants, robotic surgery, and most significantly, natural language processing (NLP) applications [
1]. Accordingly, the world is rapidly changing in technological aspects. The digital world provides several benefits and drawbacks. One of its drawbacks is fake news and hate speech, which is incredibly simple to spread. Fake news and hate speeches are defined as intentionally and verifiably false news [
2‐
4]. Individuals, governments, freedom of speech, news systems, and society are all becoming increasingly vulnerable to it. The rising use of social media and knowledge sharing has benefited humanity considerably. Today, social media platforms have a significant impact on people’s daily lives [
5]. Such social media platforms like Facebook and Twitter have aided in the spread of rumors, conspiracy theories, hatred, xenophobia, racism, and prejudice [
6].
While technology has many advantages, it can also influence public opinion and religious views all over the world. It can be used both directly and indirectly to target people based on race, caste, ethnic origin, religion, ethnicity, nationality, sex, gender identity, sexual orientation, handicap, or sickness.
Governments, the technology industry, and individual researchers have all tried to come up with ways to mitigate the negative impacts of fake news and hate speech. As a result, some governments have attempted to pass legislative declarations that they hope will suppress fake news and hate speech. For example, Ethiopia’s government has enacted the hate speech and disinformation prevention and suppression proclamation No. 1185/2020 [
13]. Ethiopia’s cabinet has approved a notice to combat fake news and hate speech, which includes expanding Facebook’s third-party fact-checking to Ethiopia and other African countries [
14,
15]. According to the proclamation of [
16], article 19, is concerned about the wording and application of Ethiopia’s hate speech and disinformation laws against those who oppose the government’s policies. The proclamation to prevent the spread of hate speech and false information, which went into effect on March 23, 2020, is extremely problematic from the standpoint of human rights and free speech and should be immediately revised. In any case, while the proclamation is still in effect, it must not be abused, and the government must not abuse its power under the guise of dealing with the public health crisis. Ethiopians now have unprecedented civil and political liberties because of the country’s new government. When the press and broadcast media were censored in previous years, social media gave Ethiopians, like many others around the world, the freedom to speak, organize, mobilize, and challenge the government’s narrative. Despite these changes, one thing has remained constant: authorities continue to challenge the relative “freedom” that social media platforms have enabled. While the previous administration surveilled, blocked, and punished dissenting voices online, prime minister Abiy’s administration has enacted the hate speech and disinformation prevention and suppression proclamation, which gives the government the authority to fine and imprison citizens for their social media activities [
17].
As a result, it seems to be less useful, as fake news and hate speech creators conceal their work, leaving no record for the law. Using various methods, Facebook, Google, Twitter, and YouTube tried to take technological precautions.
In recent years, there has been an increase in scientific interest in detecting and combating fake news and hate speech. This was caused by the spread of hatred and other negative emotions on social media platforms. The Amharic language fake news classification and detection on social media have been developed by using the ML approach [
5]. The author has proposed an AI method to develop a solution to fake news on the internet. The review attempted to explicitly create, execute, and consider AI and text highlight extraction techniques for counterfeit news recognition in the Amharic language. The discussion expanded on the current online media administrations for detecting fake news.
Authors [
11] investigated the identification of fake news in the Amharic language using DL approaches, and news content, as well as developing many computational linguistic tools for these “low-resource” African languages.
DL approaches and word embedding were employed by the researchers to develop automatic fake news detection mechanisms. A general-purpose Amharic corpus (GPAC), a novel Amharic fake news detection dataset (ETH FAKE), and Amharic fasttext word embedding are among the contributions. As a result, the Amharic fake news detection model was evaluated using the ETH FAKE dataset and performed exceptionally well when utilizing the Amharic fasttext word embedding (AMFTWE). Using both word embeddings, cc-am-300 and AMFTWE, the fake news detection model performed exceptionally well. When using the 300 and 200 dimension embeddings, the model had a validation accuracy of above 99%. They have included the experimental results of the model performance utilizing the cc-am-300 and AMFTWE embeddings, which were with an accuracy of 99.36%, precision of 99.30%, recall of 99.41%, and an f1-score of 99.35%. Finally, they suggested using other word embedding approaches, such as bidirectional encoder representations from Transformers (BERT), which could help train a word embedding possibly better than AMFTWE if BERT’s data-hungry nature was satisfied, even if creating an Amharic fake news dataset and obtaining many Amharic corpora would be difficult.
The work [
27] has outlined that developing hate speech detection for Afan Oromo social media is essential to eliminate the risk of hate speech on social welfare. They have conducted experiments six times by applying ML approaches such as support vector machine (SVM), multinomial Naïve Bayes (MNB), linear support vector machine (LSVM), logistic regression (LR), and random forest (RF) classifier to build hate speech detection prototypes for Facebook and Twitter platforms.
Even though, they have developed the Afan Oromo hate speech detection model using ML approaches by collecting data from Facebook and Twitter platforms. The study only investigated posts and comments in textual documents. The posts and comments in the form of images or photos, audio, or video data have not been considered. Researchers employed performance criteria like accuracy, precision, recall, and f1-score to evaluate the performance. ML feature selection approaches such as bigram and term frequency-inverse document frequency (TF-IDF) were used. According to the findings, SVM achieved an LSVM performance precision of 66%, a recall of 66%, and an f1-score of 64%. The precision of 60%, recall of 65%, and f1-score of 62% were all reached with the MNB. A precision of 64%, a recall of 64%, and an f1-score of 63% were achieved for the RF classifier. Performance precision was 65%, a recall was 64%, and f1-score was 61% for the LR classifier. Accordingly, the SVM achieved a performance precision of 66%, a recall of 65%, and an f1-score of 63%. They analyzed its performance and discovered that LSVM has the highest precision, recall, and f1-score values of 66%, 66%, and 64%, respectively. Therefore, the researchers agreed to use LSVM to deploy the Afan Oromo hate speech detection model.
The most important limitation of this study lies in applying conventional ML approaches that need manual labeling of the dataset. The experiments conducted on the data were small. They have recommended and concluded the research work as future research can also be conducted by collecting data from other social media platforms. In addition to collecting data from other social media platforms, researchers can consider other modes of data collection for further research to be investigated.
According to those researchers, going beyond conventional ML approaches for experiments can also be the next study. Another research work [
25] came up with the Afan Oromo fake news detection system. The proposed system includes preprocessing such as tokenization, normalization, stop word removal, and abbreviation resolution, feature extraction such as term frequency-inverted document frequency (TF-IDF), term frequency (TF), and hash to determine word importance in the news and the corpus, and N-grams, a powerful natural language processing technique for capturing semantic and syntactic information. With a passive-aggressive classification system, all conceivable combinations of feature extraction techniques and natural language processing approaches were applied. According to the study, passive-aggressive (PA) outperforms ensemble methods like gradient boosting and random forest, as well as linear classifiers like MNB. The PA outperforms with 97.2% and an error of 2.8%. Finally, utilizing the TF-IDF feature extraction using Unigram and PA classification approaches, a Python Django was utilized for the web-based deployment of the model. Despite the dataset’s shortcomings, the linear PA with TF-IDF vector and unigram model outperforms the competition with 97.2% of precision, 97.9% of recall, and 97.5% receiver operating characteristic area under the ROC Curve (ROC AUC) f1-score.
Using a DL system, the work [
28] aimed to detect Amharic language fake news. They employed a newly acquired dataset to complete their research because there were no previously available resources in the area they wanted to investigate. They used the graph application programming interface (API) to collect data from the Facebook platform, and two journalists annotated the dataset. To ensure that the data is uniformly annotated across various annotators, guidelines from the news literacy project were used, resulting in an annotated dataset of 12,000 stories with a binary class. They have used equal-sized class instances, 6,000 for each fake and genuine class, to avoid an issue with an imbalance in the number of instances in each class and to be dependable on classification reports. With an accuracy of 93.92%, a precision of 93%, recall of 95% (which is smaller than bidirectional long short-term memory’s (Bi-LSTM’s) 96%), and an f1-score of 94%, the convolutional neural network (CNN) model outperform all other models. The impact of morphological normalization on Amharic fake news identification was investigated using the top two performing models, and the results demonstrated that normalization harms classification performance, lowering both models’ f1-score from 94 to 92%. Finally, CNN was shown to be the most effective model in the investigation.
Furthermore, contrary to their expectations, the attention mechanism used in the sequential models performs worse than the baseline model. Another finding of the study was that in the Amharic language fake news dataset, morphological normalization was not always helpful in improving model performance. According to this study, evaluating different approaches from other disciplines, such as capsule networks (CN), would be a good idea. The CN is doing better in the world of computer vision, and applying their strength to the NLPA challenges could assist in improving the Amharic language fake news and hate speech detection model. Furthermore, they recommend that researchers interested in this field should have to train their embeddings with domain-specific data to obtain a more semantically strong embedding model, which could lead to better detection. According to [
29], DL approaches have recently gained a lot of attention and have improved the state-of-the-art for many difficulties that artificial intelligence and ML approaches have faced for a long time. The goal of the research was to provide a method for detecting fake news on social media using the DL approach for Afan Oromo news text. A model to predict and classify Afan Oromo news text must be preprocessed and trained on the sample dataset. As a result, the researchers looked at one hot encoder for mapping category integers and used it in the context of word embedding by training it with Bi-LSTM and a cosine similarity measure, which are supplied as input features to the neural network (NN). After the classifier was trained to classify, a 0.5 threshold was applied to the output score to decide whether it was true or fake, and statistical analysis, a confusion matrix was used to compare across different thresholds, and the suggested model necessitated a large amount of data.
However, when compared to the dataset created for the English language, the dataset in the Afan Oromo language is a major concern; the model is trained on very minimal data. Boosting the consistency of the performance by adding data to the news dataset would increase user trust in the system. On a benchmark dataset, the model can predict with an accuracy of 90%, precision of 90%, recall of 89%, and an f1-score of 89%, outperforming the current state of the art applying the Bi-LSTM model. Finally, they concluded that the Bi-LSTM system prototype can be used as a foundation for future work with the Afan Oromo news text datasets and other Ethiopian local languages. Another work on Afan Oromo text content-based fake news detection using MNB [
19] found that the best performing models were an MNB Classifier with word frequency, feature extraction, and unigram, which had a classification accuracy of 96%. The model was tested using 0.7 thresholds, which may not be the most reliable for models with poorly calibrated probability scores. term frequency performs better, yet frequent but not crucial terms have an impact on the outcome. These obstacles limited the scope of the study and prevented it from being more broadly applicable. They used TF, TF-IDF, and TF-IDF) of unigram and bi-grams, and discovered that the term frequency of unigram of this model identifies fake news sources with 96% accuracy, with only minor effects on recall. For real news accuracy, recall, and f1-score, the confusion matrix was computed at 98.6%, 94%, and 96.2%, respectively, and for fake news precision, recall, and f1-score, at 91%, 97.8%, and 94%, respectively. As a result, it was decided that these difficulties, as well as slang phrases, would be addressed in future work. According to [
26], social media platforms’ quick growth and expansion have filled the information-sharing gap in everyday life. The Amharic language fake news dataset was created using verified news sources and social media pages, and six different ML approaches were designed, including Naïve Bayes (NB), SVM, LR, SGD, RF, and PA Classifier. The experimental results show a precision of 100% RF for both TF-IDF and Count Vectorizer (CV), a recall of 95% using the PA classifier for TF-IDF, and an f1-score of 100% in NB and LR classifier for TF-IDF vectorizer using PA classifier. The research has made a substantial contribution to slowing the spread of misinformation in vernacular languages. The work [
30] sought to create, implement, and analyze hate speech detection systems for the Amharic language using ML approaches and text feature extraction. According to the study, it was critical to comprehend and define hate and offensive speech on social media, investigate existing techniques for addressing the issues and comprehend the Amharic language in-depth, as well as the various methods used to implement and design models capable of detecting hate speech. Collecting posts and comments for the dataset, defining annotation rules, preprocessing, features extraction using N-gram, TF-IDF, and word2vec, model training using SVM, NB, and RF, and model testing are some of the approaches used. The experiment produced twenty-one (21) binary and ternary models for each dataset utilizing two datasets. Both SVM and NB were outperformed by binary models that used RF with word2vec. The SVM with word2vec, on the other hand, outperforms NB and RF models in classification with a 73% f1-score. In addition, the ternary SVM model using word2vec produced a 53% f1-score, which is better than the NB and RF models. Finally, in both datasets utilized in this study, models based on SVM employing word2vec performed marginally better than NB and RF models. The work [
31] uses LSTM and GRU with word N-grams for feature extraction and word2vec to represent each unique word by vector representation to construct recurrent neural network (RNN) models for automated hate speech post identification from the Amharic language posts and comments on Facebook. To train the model and identify the optimum hyper-parameters combination for automated hate speech post and comment detection, an experiment was done on the two models, utilizing 80% of the data set for training and 10% for validation. The remaining 10% of the dataset was utilized to test the model after it had been trained. As a result, by training 100 epochs, an LSTM-based RNN with batch size 128, learning rate of 0.1%, RMSProp optimizer, and 0.5 dropouts achieves an accuracy of 97.9% in detecting posts as hate speech or free. This was ensured by applying the model’s performance test and inference on user-generated data to test the models. The RNN-LSTM model produced an improved test accuracy of 97.9% when used with this dataset and different parameters on GRU and LSTM based RNN models by feature representation of word2vec. Finally, they found that using DL neural network models for the Amharic language text data analysis allowed them to detect hate speech posts on the Facebook platform, with LSTM outperforming GRU on their dataset. The accuracy of the DL approach is affected by changes in neural network hyperparameters. The research work [
21] reported on an examination of the first Ethiopic Twitter dataset for the Amharic language, which was aimed at detecting abusive speech. The researchers evaluated the distribution and trend of abusive speech material over time, compared abusive speech content from Twitter and a general reference Amharic language corpus, and gathered 144 abusive speech keywords from five native speakers of the language and classified them as hate and offensive speech.
The research work [
32] created an apache spark (AS) model to categorize the Amharic language Facebook posts and comments into hate and non-hate categories. For learning, the authors used RF and NB, and for feature selection, they used Word2Vec and TF-IDF. The NB classifier with the word2Vec feature model outperformed the Facebook social network for Amharic language posts and comments regarding the accuracy, ROC score, and the area under precision and recall, with 79.83%, 83.05%, and 85.34% of accuracy, ROC score, and area under precision and recall, respectively. For the TF-IDF feature model, the NB achieves better results with 73.02%, 80.53%, and 79.93% for accuracy, ROC score, and area under precision and recall, respectively. The RF with word2vec feature outperforms the TF-IDF with accuracy, ROC score, and area under precision and recall of 65.34%, 70.97%, and 73.07% respectively. The TF-IDF is next with 63.55%, 68.44%, and 69.96% of accuracies, respectively. In [
33], a model for detecting hate speech and identifying vulnerable communities in Amharic texts on the Facebook platform was developed. They gathered the Amharic language postings and comments from questionable public profiles of organizations and individuals on social media. To get a clean corpus, the necessary preprocessing was done according to the language’s requirements. The word embedding (Word2Vec) model was then trained, and human annotators were chosen to label texts using the standards and norms that have been provided. Following that, in the AS environment, feature extraction approaches using Word2Vec word embedding controlled by TF-IDF, TF-IDF alone, and word N-grams were used. In their trials, the RNN-LSTM and RNN-GRU DL approaches were compared to the standard GBT and RF approaches. The best performances were achieved using Word2Vec embedding and RNN-GRU, which had an AUC of 97.85% and an accuracy of 92.56% in the hate speech detection experiments. Finally, they suggest that other inherent problems in the RNN can be solved with a more powerful architecture (that can handle negation and use information throughout the posts and comments), such as tree-LSTM, which can learn meanings from characters and parts of words, rather than word tokens themselves, as they have done. Automatic hate and offensive speech detection framework from social media have been implemented for the Afan Oromo language [
34]. The overall goal of this study was to create a framework for categorizing hate and neutral speech. The researchers recommended using SVM with TF-IDF, N-gram, and W2vec feature extraction to create a binary classifier dataset for detecting hate speech in the Afan Oromo language. To create the dataset for this study, they used Face Pager and Scrap Storm API to scrape data from Facebook posts and comments. Following data gathering, they divided the information into two categories: hatred and neutrality. Additionally, when they compared the outcomes of several ML approaches, accuracy, f-score, recall, and precision measurements were used to evaluate the experiment.
In all evaluation measures, the framework based on SVM with N-gram combination and TF-IDF achieves a performance of 96% (accuracy, f1-score, precision, and recall). A summary of the related work that has been used in this review work is presented in Table
1.
Table 1
Summary of related works for Ethiopian languages
| DL and word embedding | ✓ They have collected and arranged a sizable Amharic corpus for general use. ✓ They’ve developed an Amharic fasttext word embedding system. ✓ They’ve created a brand-new dataset for detecting bogus news in Amharic. ✓ They’ve developed a DL strategy for detecting bogus news in Amharic. ✓ They ran a series of tests to see how well the word embedding and fake news detection models worked. | ✓ Using both word embedding, cc-am-300 and AMFTWE, the fake news detection model performed exceptionally well. ✓ When using the 300 and 200-dimension embedding, the model had a validation accuracy of above 99%. ✓ Finally, they included the experimental results of the model performance utilizing the cc-am-300 and AMFTWE embedding, which were accuracy of 99.36%, precision of 99.30%, recall 99.41%, and f1-score of 99.35%. |
| ML (i.e., SVM, MNB, LSVM, LR, DT, and RF) | ✓ They built a model that detects Afan Oromo hate speech on social media using a combination of n-gram and TF-IDF feature extraction methodologies. ✓ They collected 13,600 comments and posts on respective public pages using Facepager ( https://facepager.software.informer.com/3.6/), of which 7000 and 6600 data were acquired from Twitter and Facebook, respectively, between September 2019 and 2020. | ✓ They analyzed its performance and discovered that the LSVM classifier has the highest precision, recall, and f1-score values of 66%, 66%, and 64%, respectively. |
| NLP and PA | ✓ They’ve created a news corpus called Afan Oromo. ✓ The general architecture of Afan Oromo fake news detection based on text content is provided. ✓ The article addresses the fundamental obstacles in building text content-based false news detection approaches, as well as potential solutions. ✓ The study compares supervised ML methodologies by taking linguistic features and feature extraction methods into account. ✓ The research sets the door for the development of bogus news identification in Afan Oromo, which would boost user confidence. | ✓ Despite the dataset’s shortcomings, the Linear PA with TF-IDF vector and unigram model outperforms the competition with 97.2% of precision, 97.9% of recall, and 97.5% of ROC AUC f1-score. |
| DL (including the Bi-GRU and CNN, and attention-based models) | ✓ They collected and tagged a dataset of 12,000 news stories to create an automated method for detecting false news. | ✓ With an accuracy of 93.92%, a precision of 93%, a recall of 95% (which is smaller than Bi-96 LSTM’s %), and an f1-score of 94%, the CNN model outperforms all other models. ✓ The impact of morphological normalization on Amharic fake news identification was investigated using the top two performing models, and the results demonstrated that normalization harms classification performance, lowering both models’ f1-score from 94–92%. |
| DL (including RNN, Bi-LSTM) | ✓ They implemented DL models and classified them into pre-defined fine-grained categories to resolve social media fake news for the Afan Oromo language. | ✓ On a benchmark dataset, the model can predict with an accuracy of 90%, precision of 90%, recall of 89%, and an f1-score of 89%, outperforming the current state of the art utilizing the Bi-LSTM model. |
| MNB classification approach | ✓ To best exhibit unambiguous distinctions, the researchers gathered News datasets and accurately categorized them as real and fake news on similar topics. | ✓ They used TF, TF-IDF, and TF-IDF of unigram and bi-grams, and discovered that TF of unigram of this model identifies fake news sources with a 96% accuracy, with only minor effects on recall. ✓ For real news accuracy, recall, and f1-score, the confusion matrix was computed at 98.6%, 94%, and 96.2%, respectively, and for fake news precision, recall, and f1-score, at 91%, 97.8%, and 94%, respectively. |
| ML classifiers (including, NB, SVM, LR, SGD, RF, and PA Classifier model) | ✓ The research has made a substantial contribution to slowing the spread of misinformation in vernacular languages. | ✓ The experimental results show a precision of 100% RF for both TF-IDF and Count Vectorizer, a recall of 95% using PA classifier for TF-IDF, and an f1-score of 100% in NB and LR classifier for TF-IDF vectorizer using PA classifier. |
| ML (including SVM, NB, and RF) and text mining feature extraction techniques | ✓ They gathered posts and comments from Facebook using Face pager’s content retrieval techniques to create the dataset for this investigation. | ✓ The experiment produced 21 binary and ternary models for each dataset utilizing two datasets. ✓ Both SVM and NB were outperformed by binary models that used RF with word2vec. ✓ SVM with word2vec, on the other hand, outperforms NB and RF models in classification with a 73% of f1-score, a precision of 76%, and a recall of 75%. ✓ In addition, the ternary SVM model using word2vec produced a 53% of f1-score, which is better than the NB and RF models. ✓ Finally, in both datasets utilized in this study, models based on SVM employing word2vec performed marginally better than NB and RF models. |
| RNN (by using LSTM and GRU with word n-grams for feature extraction and word2vec to represent each unique word by vector representation) | ✓ Researchers created a tagged massive Amharic dataset by gathering posts and comments from activists who actively participated on Facebook sites. | ✓ The RNN-LSTM model produced an improved test of 97.9% for all matrices when used with this dataset and different parameters on GRU and LSTM-based RNN models by feature representation of word2vec. |
| Spark ML | ✓ Thousands of Amharic posts and comments on suspected social network pages of organizations and individual people’s public pages are crawled as a dataset to execute the various experiments. | ✓ The NB approach with the word2Vec feature model outperformed the Facebook social network for Amharic language posts and comments in terms of accuracy, ROC score, and area under precision and recall, with 79.83%, 83.05%, and 85.34% accuracy, ROC score, and area under Precision and Recall, respectively. ✓ For the TF-IDF feature model, the NB achieves better results with 73.02%, 80.53%, and 79.93% for accuracy, ROC score, and area under precision and recall, respectively. ✓ The RF with word2vec feature outperforms the TF-IDF with accuracy, ROC score, and area under precision and recall of 65.34%, 70.97%, and 73.07%, respectively. ✓ TF-IDF is next, with 63.55%, 68.44%, and 69.96%, respectively. |
| Classical GBT, RF, DL, RNN-LSTM, RNN-GRU, and word embedding (Word2Vec) model | ✓ The suggested method looks into how hate speech detection might be applied to identifying susceptible communities. ✓ Using the example of Amharic text data on Facebook, they were able to identify a potentially vulnerable community in terms of social media hatred. ✓ They gathered and annotated Amharic data to detect hate speech in multicultural Ethiopian society. ✓ Since social media data is very noisy and huge, they used the Apache Spark distributed platform for data pre-processing and feature extraction. | ✓ Word2Vec embedding with RNN-GRU had the best performance in the hate speech detection experiments, with an AUC of 97.85%, an accuracy/precision of 92.56%, recall of 97.85%, and an f1-score of 98.42% |
| ML (including SVM with TF-IDF, N-gram, and W2vec feature extraction) | ✓ Create a tagged hate speech dataset from social media for the Afaan Oromo language. ✓ They create standard Afan Oromo stop word lists, as well as a brief word expansion dictionary. ✓ For Afan Oromo text hate speeches, they create an SVM model. ✓ They put their new model to the test on hate speech identification and came out on top. | ✓ Accuracy, f-score, recall, and precision measurements are used to evaluate the experiment. ✓ In all evaluation measures, the framework based on SVM with n-gram combination and TF-IDF achieves 96% (accuracy, f1-score, precision, and recall). |
According to the summarization of the relevant related works of the study, Table
1 indicates DL approaches are currently chosen by researchers over ML approaches because of their efficiency in learning from large-scale corpora in unlabeled text.