Introduction

Fear comes as a new social stigma for people and places that have been associated with the COVID-19 outbreak. The outbreak has sparked with an overwhelming amount of information on news which WHO termed as “infodemic. The recognition of emotions and sentiments is fundamental to human interactions (Cowen et al., 2019). Uncertainty and fear can have dire consequences on mental health (Wells, 2006). This is crucial for many brain functions especially, when the challenge is chronic in nature. Fear mongering has been employed as an instrument of political and psychological warfare for hundreds of years. Most importantly, the impact of uncertainty and fear thrive in the presence of sensationalized half-truths, an unfortunate role often played by media (Friedman et al., 2012). Amidst these psychological threats, the world is currently facing one of the biggest challenges of decade i.e., the coronavirus epidemic. The outbreak has not only taken many lives but poses a threat due to the lack of success in identifying a cure. As of June 5, 2020, this virus has caused 391,001 fatalities and 6,661,985 confirmed cases in 215 countries.Footnote 1

The third outbreak of coronavirus in the last two decades, which is currently termed as coronavirus disease (COVID-19), was noticed and officially announced as a causative agent for a syndrome often resulting in a fever, cough, and shortness of breath by Chinese authorities on January 7, 2020. Two previous outbreaks by similar viruses have occurred. In 2003, SARS-CoV cases were reported first in Guangdong province with a likelihood of earlier infections occurring in Guangdong, Hong Kong, Hanoi, Toronto and Singapore (Tsang et al., 2003; Lee et al., 2003; Hsu et al., 2003) etc. This caused 812 deaths and 8439 people were infected (Liang et al., 2004). In 2012, the Middle East respiratory syndrome (MERS)-CoV was first detected in Saudi Arabia (Zaki et al., 2012; Wang et al., 2020), resulting in 850 deaths and over 2400 global infections. The mortality rate for that disease was approximately 35% (Killerby et al., 2020). The timeline starting from 8th December 2019 when first seven patients (two of which were later diagnosed with COVID-19) were reported till 5th June 2020 when this deadly virus took lives of more than 391,001 in 215 countries, and 6,661,985 confirmed cases of the deadly virus is presented in Fig. 1. It serves as a sequential tracker of major events occurred during half a year of this virus attack.

Fig. 1
figure 1

Timeline of coronavirus outbreak.

The news headlines of the initial outbreak of this new strain of coronavirus seems to have sparked fears amongst the general public. Since late January 2020 it has continued to dominate the news headlines in many countries. It has certainly put the virus at the forefront of much of the mainstream media in the Organization for Economic Cooperation and Development (OECD) countries, which might help to encourage health promotion measures e.g., good hand hygiene. However, does too much media attention result in a disproportionate response of fear and anxiety from the public? There are, after all, several other health and broader social issues that also merit such coverage from the press. Moreover, the death rate from the COVID-19 virus remains proportionately low when compared with other viral infections, such as influenza (flu) or HIV. This does not seem to be reflected by the amount of media attention that is being dedicated to the coronavirus. Secondly, the language to describe the virus is also adding to the fear in many countries with confirmed cases. Phrases such as deadly virus, public health emergency, and outbreak are evoking negative sentiments and emotions among many members of the general public.

Sentiment analysis is one of the primary field of natural language processing (NLP) that helps to classify sentiments in opinions and reviews (Liu, 2012), however, little work has been done on how sentiment and emotion analysis relates to medicinal matters (Zeng-Treitler et al., 2008). Empirical evidence supports the importance of feelings and emotions in heath related fields (Sokolova and Bobicev, 2013). One of the methods that is being used to classify text units (e.g., words, sentences, paragraphs) into sentiment categories is affective lexicons (Taboada et al., 2011). By using the Lexicon method recommended by Saif (Mohammad and Turney, 2013) in which words are associated with emotions, sentiments and opinion categories (Wilson et al., 2005; Strapparava and Mihalcea, 2008; Strapparava et al., 2006). Sentiment analysis can be helpful in determining the positive or negative polarity of words, phrases, or documents whereby, positive polarity explains favorable sentiments and negative polarity determines unfavorable sentiments towards specific events (Turney and Littman, 2003; Pang and Lee, 2008). Emotions induced by the phrase, word or document do not reflect the actual emotions conveyed by the phrase, because it can induce different emotions in dissimilar context. The application of sentiment analysis to the real-time web has a number of challenges (Bermingham and Smeaton, 2010). Due to the dynamic nature of the real-time web, topics of interest are constantly evolving. Lexicon-based approaches involve calculating orientation of a document from the semantic orientation of its words or phrases (Turney, 2002). This classification approach involves building classifiers from labeled instances of texts or sentences (Pang et al., 2002).

Presently, the available information is being utilized and assessed to draw a psychological perspective on the coronavirus disease (COVID-19) outbreak. The sentiment-analysis is conducted to highlight the emotional valence of the epidemic. This study empirically justifies the emotional consequences of the coronavirus disease (COVID-19) outbreak while urging for interventions on emotional wellbeing front. Media is serving as a paradigm for communication across different networks (Stieglitz and Dang-Xuan, 2013), it is thus important to understand the interaction between information and emotional wellbeing. For this, key words which may evoke such sentiments and emotions have been extracted and discussed.

Material and methods

Data sources and preliminary analysis

The data is available at the repository for COVID-19 (https://systems.jhu.edu/research/public-health/ncov/) operated by John Hopkins University (JHU), Center for Systems Science and Engineering supported by Esri Living Atlas Team and Applied Physics Lab of JHU. Live news dashboard is available at https://visualizenow.org/corona-news. For analysis, the data consisting a total of 141,208 news headlines updated till June 3, 2020 on the dashboard was used. These news headlines were published in top news sources including Reuters, BBC, Yahoo News, South China Morning Post, National Post, Daily Mail UK, CNBC, The Guardian, CNN etc. (complete list containing the headlines and websites are available in supplementary information file).

Prior to conducting sentiment analysis, text analytics was used to have a preliminary look at news data. Text analytics applies analytic tools to learn from collections of text data, like political news, government documents, annual reports, social media, books, newspapers, emails, etc. The goal of text analytics is similar to human learning e.g., using automated algorithms we can learn far from massive amounts of text, then rather using human reading. Text analytics summarizes the main themes and compare them.

The following data processing and data cleaning tasks were performed:

  • Conversation of the news headlines into text files.

  • Corpus building: The primary task was to build a corpus of news headlines on which analysis was performed by using R package ‘tm’ (Feinerer et al., 2011).

  • Conversion of the entire document to lower case.

  • Removing punctuation marks (punctuations like periods, commas, hyphens etc. can provide grammatical context which supports understanding). For analysis, we ignore punctuation.

  • Removing stopwords (stopwords are common words found in a language. However, for estimation, they are not very helpful as we would expect them to be evenly distributed across the different texts. To increase computational performance, 174 words like and, or, in, is, for, were removed). The list of English stopwords is available in Package ‘tm’.

  • Removing numbers (Numbers are not relevant to analyses).

  • Removing extra whitespace.

  • Stemming (Stemming uses an algorithm that removes common word endings for English words, such as ‘es’, ‘ed’ and ‘s’. In other words, this analysis reduces the terms in documents to their stem, thus, combining words that have the same root.

  • Creating a Document Term Matrix: This is a matrix with documents as the rows, and terms as the columns and a count of the frequency of words as the cells of the matrix.

In raw form, the news headlines consist of 1,619,987 words, 9,391,485 characters with no space and 11,019,980 characters with spaces. Table 1 shows the distribution of term frequencies after text processing, which includes removal of punctuations, numbers, white spaces and English stopwords. The clean data set consists of 6,488,545 words with 31,130 unique words in coronavirus headlines. There are 13,208 words which appears only once while 17,922 words have at least one repetition. Cumulatively, there are 22,074 words having frequency up to five while 24,683 words have maximum frequency of ten. Likewise, there are 6636 words have appeared less than twenty times in the news headlines. On the highest frequency side, there is only one word having a frequency of greater than 2500 and between 1500 to 2000. There are only two words which appears more than 1000 times but less than 1500 times in the news headlines. Likewise, a concentration of words is found between the frequency of 500 and 1000 with 16 words. A total of 155 words appears in headlines more the 1000 times and in most frequent terms. Finally, there are 3 words having frequency between 4000 to 5000 and only 9 words with have the frequency of greater than 5000.

Table 1 Frequency of frequency.

The most common terms in coronavirus headlines are presented in Fig. 2 including covid (N = 13,388) followed by lockdown (N = 9133), case (N = 8817), trump (N = 8516), death (N = 8047), test (N = 7537), pandem (N = 6818), China (N = 6493) outbreak (N = 4907) and virus (N = 4773). The other keywords include report (N = 4607), home (N = 3908), crisis (N = 3885), fear (N = 3106), die (N = 3786), health (N = 3592), hospital (N = 3211) and spread appears 3011 times. Colors and fonts in word cloud showed the more frequent words in coronavirus headlines in the news (Fig. 3). The larger the font size is the higher its the frequency. It can be observed that the main focus of the news headlines is about lockdowns, outbreak of COVID-19, trump, hospitals and health crises.

Fig. 2
figure 2

Most common words in coronavirus news headlines.

Fig. 3
figure 3

Word cloud based on coronavirus news headlines.

Extraction scheme of sentiments and emotions

For informative results, we calculated and aggregated the text polarity sentiment at the coronavirus heading level and particular word of the phrases by using R package “sentiment”. Sentiment analysis tools rely on lists of words and phrases with positive and negative connotations. The valence shifters (i.e., negators, amplifiers (intensifiers), de-amplifiers (downtoners), and adversative conjunctions) were taking into account because they affect the polarized words. Here, we have used 4 words before and 2 words after the polarized words to extract valence shifters. A simple dictionary lookup by ignoring the valence shifters may not be modeling the sentiment appropriately in case of negators and adversative conjunctions as the entire sentiment of the clause may be reversed or overruled. The equation used by the algorithm to assign value to polarity of each sentence fist utilizes a sentiment dictionary to tag polarized words (Jockers, 2017). The combination of two words “sentence fist” was used by the author (Jockers, 2017) to tag polarized words. The same terminology “sentence fist” is mentioned in the R package “sentiments”, which is used to calculate the polarity score.Footnote 2

Each paragraph Pi composed of sentences Eq. (1);

$$\left( {P_i = \left\{ {S_1,\,S_2, \ldots ,S_n} \right\}} \right)$$
(1)

Every sentence Si s broken into words Eq. (2);

$$\left( {S_{i,j} = \left\{ {W_1,W_2, \ldots ,W_n} \right\}} \right)$$
(2,)

where W = words within sentences

The unbounded polarity score for each sentence is calculated by using the Eq. (3);

$$\delta = \frac{{C^\prime _{i,j}}}{{\sqrt {w_{i,jn}} }}$$
(3)

where

$$C^\prime _{i,j} = {\sum} {\left( {\left( {1 + W_{{\mathrm{amp}}} + W_{d{\boldsymbol{e}}{\mathrm{amp}}}} \right) \cdot w_{i,j,k}^p\left( { - 1} \right)^{2 + W{\mathrm{neg}}}} \right)}$$
$$W_{{\mathrm{amp}}} = \left( {W_b\, > \,1} \right) + {\sum} {W_{{\mathrm{neg}}} \cdot \left( {z \cdot w_{i,j,k}^p} \right)}$$
$$W_{d{\boldsymbol{e}}{\mathrm{amp}}} = {\mathrm{{{max}}}}\left( {W_{d{\boldsymbol{e}}{\mathrm{amp}}^\prime }, - 1} \right)$$
$$W_{d{\boldsymbol{e}}{\mathrm{amp}}^\prime } = \left( {W_b \, < \, 1} \right) + {\sum} {z( { - W_{{\mathrm{neg}}} \cdot w_{i,j,k}^a + w_{i,j,k}^d})}$$
$$W_b = 1 + z_2 \cdot w_{b^\prime }$$
$$w_{b^\prime } = {\sum} {\left( \begin{array}{l}\left| {w_{{\mathrm{adversative \ conjunction}}}} \right|, \ldots ,w_{i,j,k}^p,\,\\ w_{i,j,k}^p, \ldots ,\left| {w_{{\mathrm{adversative \ conjunction}}}} \right| \ast - 1\end{array} \right)}$$
$$w_{{\mathrm{neg}}} \ = \ \left( {{\sum} {w_{i,j,k}^n} } \right){\mathrm{mod}}\,2$$

*Wadversative conjunction represents number of adversative conjunction.

The words can be expressed as an i,j,k notation as Wi,j,k for kth word of jth sentence of the ith paragraph. The words in each sentence (wi, j,k) are searched and compared to a dictionary of polarized words.Footnote 3

Results

The polarity score of each headline from Janaury 15, 2020 to June 3, 2020 is presented in Fig. 4. While in Fig. 5, the headline news is grouped into three sentiment categories of positive, negative and neutral. The red color represents negative news i.e., having sentiment score of less than 1, dark green shows positive headlines i.e., sentiment score of greater than zero while, blue color shows news with sentiment score of 0. It can be seen that major portion of the coronavirus headlines fall in negative sentiment category. In agreement to that, box plot (left) and histogram (right) of sentiment score of all news headlines shows the sentiment score ranging from minimum score of −1.85 to maximum of 1.54 with an average score of −0.08 (Fig. 6). The histogram shows the shape of the sentiment scores. The sentiment score is severely weighted towards the negative side. The frequency distribution of headlines with respect to sentiment category is presented in Fig. 7. A major portion (51.66%) of total news headlines generate negative sentiments. In comparison, a very small portion (30.46%) of the news headlines evoked positive sentiments while the remaining 17.87% are categorized as neutral news.

Fig. 4
figure 4

Polarity score of coronavirus news headlines.

Fig. 5
figure 5

Classification of sentiments of coronavirus news headlines (red: negative, blue: neutral, green: positive).

Fig. 6
figure 6

Box plot (left) and histogram (right) of sentiment score of all news headlines.

Fig. 7
figure 7

Percentage (%) frequency distribution of sentiments.

Overall trajectory of sentiments with the passage of time from January 15, 2020 to June 3, 2020 are given in Fig. 8. The x-axis represents the passage of time from the beginning to the end of the text, and the y-axis shows emotional valence score. News headlines begin in the negative region, increased its strength over time and shows some decline in the middle. However, emotional valence scores never entered into positive and/or neutral region. Finally, the negativity sharply increased, and the peak sentiments negativity can be observed in recent news headlines. This is indicative of higher emotional leaning towards the negative sentiments. The strength of the emotions suggests that much of the content in the news headings have negative emotional content.

Fig. 8
figure 8

Total sentiment trajectory score over time.

By drawing a comparison plot, we have further explored the words which contribute the positive and negative sentiments (Fig. 9). Overall 3833 negative terms contributes to provoke negative sentiments with most common words of “pandemic”, “trump”, “outbreak”, “virus”, “death”, “crisis”, “fear”, “fight”, “government”, “warn”, “die”, “emergency”, “police”, “risk”, “die”, “symptoms”, “hospital”, “isolation”, “infected”, and “ban”. In comparison, 2135 terms generate positive sentiment including the most common words “positive”, “care”, “global”, “work”, “relief”, “aid”, “food”, “free”, “working”, “markets”, “study”, “patient”, “league”, “support”, “star”, “big”, “extend”, “expert”, “protect”, and “fans”.

Fig. 9
figure 9

Comparison plot with respect to sentiments; negative and positive.

The different emotion annotations for a target term were consolidated by determining the majority class of emotion intensities. The NRC emotion lexicon was used to calculate the presence of eight basic emotions (“anger”, “ fear”, “anticipation”, “trust”, “surprise”, “sadness”, “joy”, and “disgust”) and their corresponding valence in coronavirus news headlines (Mohammad and Turney, 2010). The emotion score ranges between 0 (no emotion used) and 1 (all words used were emotional). Frequency distribution of eight emotions evoked by news headlines are shown in Fig. 10. The news headlines mainly evoked the emotions of “fear” (20%), “anticipation” (15%), “sadness” (14%) and, “anger” (11%) and which collectively covers about 61% of the total headlines. However, there are 17% of the news headlines evoking the emotion of “trust” and 9% of “surprise” as well. The emotion of fear is induced by perceived danger or threat and causes physiological and behavioral changes is evoked by headlines having the key terms of “death”, “quarantine” “hospital”, “fight”, “epidemic”, “fear”, “infection”, “disease”, “battle” and “threat”. Even though corona virus news are creating fear and “sadness”, there are few terms like “mother”, “save”, “holiday”, “closure”, “ministry”, “good”, “food”, and “daughter” are evoking pleasant sentiments. As compared to joy sentiments these news are evoking sadness sentiments too. The keywords in headlines including “death”, “isolation”, “fatality”, “disease”, “isolation”, “deport”, “case”, “emergency”, “hospital”, “leave”, “deadly”, “epidemic”, “shortage”, and “victim” are evoking sad emotions. Surprise is an emotion that a person might feel if something unexpected happens. Keywords like “death”, “trump”, “emergency”, “surge” and “scare” are the main contributors to evoke the surprise emotion. Finally, there are news which are evoking the trust emotions having keywords of “hospital”, “united”, “confirmed”, “economy”, “medical”, “formula”, “advisor”, “treat”, “trade”, “official”, “top”, “save”, “school” and, “good”.

Fig. 10
figure 10

Percentage (%) frequency distribution of emotions.

Discussion

The findings of this study can be weaved together into important implications for emotional wellbeing and economic perspective. The results have revealed that the connotation of news headlines have high emotion score with negative polarity. The chronic nature of Corona outbreak and lack of success in treatment and cure is creating an environment that is crucial for mental wellbeing. The fear associated with death cases is a pandemic that has created emergency and panic not only in Wuhan but across the borders of China. The epidemic has not only caused medical fatality but has been a reason for the rise of xenophobia. The deadly disease is responsible for massive evacuation and scare. The important feature related to the present findings is the emergence of same emotions across all the headlines.

It is implicated that the prevalence of negative emotions, death toll and prolonged fatalities is likely to lead to chronic stress, exacerbating disorders such as anxiety, bipolar, depression, personality changes, and cognitive (thinking) problems. Such uncertainty leads to extreme thinking patterns e.g., all or none phenomena. It is important to consider that the proximity to the incident will be associated with higher frequency of negative emotions and panic. Similarly, the mass quarantine and reports of increase in the number of reported cases may result in high community anxiety that will further the dilemma of isolation. The mass quarantine is depicted to prevail the feeling of loss of control and being trapped, which was prominent within the results.

Additional to the epidemic, the effects of the rumor mill need to be considered. According to the media reports, the need for facts will escalate and a deficiency of clear messages will enhance fear and push people to obtain information from less reliable sources. In the previous incident and consequent upon the identification of SARS, cases among the patients’ staff and visitors were quickly and forcibly restricted in their movement for two-weeks quarantine period. In their account of the chaos that followed, (Barbisch et al., 2015) the confinement instigated a sense of collective hysteria, driving the staff to desperate measures. At present high anxiety may have dire implications for other health measures. Although hospitals in Wuhan represent an overwhelming picture with high levels of disease activity however, the larger majority of patients were not found to have the disease. Surges of low risk patients, or the worried, are frequently triggered by high levels of anxiety, catastrophizing, and seeking help for symptoms that prompt little concern.

Another deleterious effect is on those out of the cordon wanting to come to view those residing inside the infested area. The stigma attached can be rampant with reports stating that the affected areas residents were socially shunned, discriminated in their workplace and their property being attacked. The vigilante-imposed isolation can further exaggerate the official quarantine. The probable may exist for annoyance on official reactions, intensified by the effects of deadly outbreak on various sections of the economy, social disruption that might remain for years. The uncertain epidemiology nature of this new mass quarantine outweighs in terms of the psychological costs.

In the past, in 2003 China and Hong Kong were the worst affected areas exhibiting greatest loss due to SARS in terms of investment, retail sales, and tourism. Further, the role of media in the coverage of SARS was often found inaccurate, excessive, and sensationalist (Smith, 2006), which is still the biggest challenge at present. However, they were able to cover the loss within a short span, though the psychological impact was scarcely empirically estimated.

The evoked sentiments and emotions from news headlines identified and discussed in this study can be weaved together into important implications for emotional wellbeing and economic perspective. The effects of sentiments and emotions evoked by COVID-19 news headlines are quite explicit and in line with previous such epidemic outbreaks. For example during the Ebola outbreak, anxiety, economic hardships, social isolation, and other similar fears were seen too (Akroyd et al., 2020). The effect of these emotions is well noted on economy too as conventional economics accepts counterfactual emotions (e.g., regret) that emerges from outcomes and it affects decision making as well (Shiv and Fedorikhin, 1999). Considering such large effect in history, the 2008 financial crisis was a global disaster where millions of people lost their jobs, homes, savings, businesses etc. (Hout et al., 2011). Similar fears and bad emotions are noted in case of COVID-19 and predicted in this study as well. Optimistic or upbeat sentiments encourage consumers to buy and borrow; businesses, on the other hand, are spurred to plan and invest. During recent COVID-19 crises, the worst global economic crisis with −3% contraction in 2020 as predicted by IMF (Währungsfonds, 2020) and a cumulative loss of 9 trillion dollars to GDPs over 2020 and 2021 is said to occur (Gopinath, 2020). Global financial markets fall to astonishingly lower points in a matter of days and hours, which typically take years to reach such low point. For instance, as on 16th March 2020, the Dow Jones Industrial Average (DJIA) dropped by 12.9% and the S&P 500 index lost nearly 12% in a single day. Since the infamous “Black Monday 1987” it was the worst percentage drop (Aslam et al., 2020). In similar context, impact on other sectors is notable too e.g., The Oil Market Report for April 2020 by International Energy Agency (IEA) quoted the sudden sunk below zero (with May futures for World Texas Intermediate oil closing at −37.63$) with the words, “Never before has the oil industry come this close to testing its logistics capacity to the limit”.Footnote 4 Two-thirds of world trade disruption (Dollar et al., 2019) is likely to push toward deglobalization and a decline of world merchandize trade by 13% to 32% in 2020 (Sarkis et al., 2020), hard hit on the travel and tourism sector, lower consumer and firm confidence with decreased consumer spending (Lucas, 2020), is creating fear of global poverty, expected to be highest since 1990 (Sumner et al., 2020). In current environment of ambiguity and uncertainty, investors search for safe heavens to avoid the financial losses and are reluctant to trade, which affects the financial markets adversely (Mukerji and Tallon, 2001; Epstein and Wang, 2004; Levy and Galili, 2006).

This fear and suspicion became a force of disintegration (Rincón-Aznar et al., 2020) and affecting social well-being as well. Governments’ quicker actions can reduce the likelihood of crisis caused by this pandemic as compared to the financial crisis. The 1994 outbreak of plague in India caused very less reported but fear resulted in 20% of the city’s population to flee back to their homes. The pandemic is noticed as serving a breeding ground for direct psychological consequences, leaving people with lives full of health threats, anxiety, and stress. This is especially true for novel threats; the sense of direct threat compels people to take preventive or defensive actions e.g., hoarding food or other items. The results from a meta-analysis revealed that personal risk was the most powerful factor, followed by social pressure that ensured preventive measures (Slovic, 1987). The situation is worsening, and current sentiments can lead to lower demand of products and reduced investments which will ultimately lead to business closures, job losses, as there is heightened uncertainty prevailing among nations. Modern communication especially through electronic and print media enables a more intimate experience with a threat, that may not be entirely real but of intense emotional magnitude. According to a survey, researchers found that the vulnerable one’s were least able to tolerate uncertainty, experienced high anxiety, and perceived helplessness in protecting themselves during pandemic (Taha et al., 2014). This is suggestive of a failure in initializing problem-solving approach when our mental faculties are overwhelmed with challenges beyond our resources. The mental processing is caught into emotional coping where decisions are made on perceived emotional threats than on logical reasoning and the role of news can never be second in creating such situation.