Just over a year ago, South Africa was in the middle of its first and most stringent COVID-19 lockdown (#level4lockdown); a period characterised by uncertainty and negativity. During May 2020, Twitter sentiment was negative as users voiced concerns about the impact of the lockdown on the economy. A year later, the uncertainty remains as South Africa faces the third wave and slow roll-out of the vaccine. To understand South Africa’s sentiments to the vaccine, we turn to Twitter, using a similar approach to the analysis we carried out during the first lockdown.

 

Methodology

During the analysis, we scraped 4100 tweets from 1 September 2020 to 22 June 2021 with hashtags #VaccineforSouthAfrica, #Covid19SA, #VaccineRolloutSouthAfrica, #LockdownSA, as well as the phrase Vaccine South Africa; removed stop words and cleaned up the text; and analysed them using the Bing sentiment method. The Bing method classifies sentiment through classing individual words into a specific category of emotion, such as joy, anger or fear, which are then in turn classed as negative or positive. The Bing sentiment method on these tweets returns a negative sentiment towards vaccines when we assume the algorithm is perfect, and a positive sentiment when we assume it is not and choose to ‘help’ it to some degree.

The importance of context

As the figure below shows, there are a variety of sentiments associated with the vaccine. In sentiment analysis, context matters. The same word can have different sentiments (which may not be understood by machine learning algorithms alone) depending on the context. We see this in some of the tweets. For example, the word ‘patient’ (a homonym) is classified as positive, when in the context of these vaccine tweets, it is mostly referencing an ill person, and not a virtue.

The word ‘toll’ (also a homonym) is also classified as negative when in context, it refers to the count of death cases, not the effect of the virus.  Even the word ‘decline’ was classified as negative. In the context of most tweets, it was mentioned to highlight a decline in Covid cases, which is obviously positive.

 

Further misrepresentation of the data 

Digging further into the data revealed that the negative sentiment result was driven mostly by the appearance of the word ‘death’ in many tweets. The Bing method grades words by degree of positivity or negativity, so some positive words get a higher positivity score than others, and the same goes for negative words.  In the case of the word ‘death’, its contribution to negative sentiment was about 600 negativity units compared to other negative words, like ‘risk’, ‘outbreak’, ‘crisis’, which measured just between 50 and 100 units on the same scale. Context matters here too – in a tweet like, “vaccines will help us overcome death”, the sentiment would be read as negative even though the tweet has a positive sentiment.  The words ‘overcome’, and ‘death’ would be classified separately, with the former being positive and the latter negative, but the positivity units of the former would neither neutralise nor overwhelm the negativity score of the latter, thus making the overall sentiment of the tweet negative.  

When we experiment with removing some words that are mostly contextually misclassified (i.e. death, toll, numb and patient), we observe an overall positive sentiment to vaccines (as shown by the graphs below).  However, this is still problematic because it means we have also removed tweets where those words were used in the right context.  We also recognize that handpicking specific words to ‘correct’ still biases our results; yet combing through each tweet individually to (dis)confirm the algorithm’s classification is akin to not using the algorithm at all. So we revert to using the algorithm as is. 

The word ‘toll’ (also a homonym) is also classified as negative when in context, it refers to the count of death cases, not the effect of the virus.  Even the word ‘decline’ was classified as negative. In the context of most tweets, it was mentioned to highlight a decline in Covid cases, which is obviously positive.

 

Augmenting machine learning, economic or behavioural insights

 

Sentiment analysis can be a great tool to help understand how large events, such as the Covid pandemic and resulting uncertainty, can affect people and decision making. However, it is important to combine the data analysis and machine learning with human insight for a more accurate measure. 

The algorithm has its limitations, some of which are further enforced by human choices.  We chose hashtags and phrases that were popular in the discourse about vaccines, from our own observation, but people could have possibly used other hashtags and phrases we did not include. There is also the fact that some people have tweeted about vaccines in other South African languages, but the algorithm is able to read only English words.  In addition, the sample is most likely unrepresentative of the South African population, so the picture it gives is skew.  This illustrates the importance of understanding not only how machine learning algorithms use data, but also how those results fit into the wider context and what conclusions can and cannot be drawn from the results.

At Predictive Insights, we are helping our clients deal with the added uncertainty brought on by the pandemic, with a number of tools that combine insights about the economy, people, and machine learning. Our recent blog on the impact of Covid-19 on demand forecasting explains some of this in more detail. Get in touch to find out more.

We help clients deal with COVID-19 uncertainty

At Predictive Insights, we are helping our clients deal with the added uncertainty brought on by the pandemic, with a number of tools that combine insights about the economy, people, and machine learning.