1 Introduction
In recent years football analysis has increasingly benefited from Big Data analysis and machine learning methods, in particular in an attempt to understand tactical behaviour and identify success-enhancing strategies (Dick and Brefeld
2019; Grunz et al.
2012; Memmert and Raabe
2018; Rein and Memmert
2016). The present paper puts the approach of Big Data analysis and machine learning into a slightly different context by incorporating Twitter data into the analysis. It focuses on in-play forecasts in football by examining the question whether information becoming available during a match is valuable to forecast the further course of events. This analysis is relevant to better understand football-related Twitter communication, to assess the role of randomness in football and valuable for coaches, match analysts and broadcasters to better understand the influence of in-play events on the further course of a match.
The forecasting literature reflects two important aspects researchers have faced when investigating predictive tasks in football. The first aspect of forecasting is statistical and related to developing team ratings and forecasting models with the best possible ability to derive forecasts from obvious predictors such as prior match results. One of the most prominent approaches is to estimate offensive and defensive strength parameters of the teams and use these as inputs for probability models including Poisson models (Koopman and Lit
2015; Maher
1982), birth process models (Dixon and Robinson
1998) and Weibull count models (Boshnakov et al.
2017). Other researchers have used regression models based on one or various covariates such as Hvattum and Arntzen (
2010) using ELO ratings in combination with an ordered logit regression model or Goddard and Asimakopoulos (
2004) using various covariates in an ordered probit regression model. The present approach is primarily related to the second aspect of forecasting, which is data-based and attempts to identify and investigate further sources of information that prove useful in football forecasting. One source of information obviously is betting odds (Forrest et al.
2005) being interpreted as a forecast and used as a standard benchmark. Further sources include human forecasts (Andersson et al.
2005), prediction markets (Spann and Skiera
2009), ranking systems such as the FIFA World Ranking (Lasek et al.
2013), market values (Peeters
2018) or sets with various explanatory variables including match significance, involvement in cup competitions and geographical distance between teams (Goddard and Asimakopoulos
2004).
In the literature, football forecasting is most prominently associated with forecasting the match result in terms of win, draw or loss. This seems a little one-dimensional, in the light of the wide range of events taking place during a football match. With regard to the common win/draw/loss forecast, Koopman and Lit (
2019) introduced a categorization of methods, namely models indirectly based on modelling the number of goals scored by both teams, indirectly based on modelling the goal difference or modelling the result in terms of win, draw, loss directly. Forecasting the number of goals, in that sense, is not an exotic task as models falling into the first category and often being based on Poisson distributions (Karlis and Ntzoufras
2003; Koopman and Lit
2015; Maher
1982) can easily be reused for goal forecasting. Boshnakov et al. (
2017) pursue this strategy by using a Weibull count model to obtain forecasts for both match result and total number of goals. Wheatcroft (
2020) uses ratings based on match statistics and logistic regression to forecast the number of goals and is—to the best of our knowledge—the only paper focusing in particular on this type of forecasting. Forecasting of total goals thus can be considered a neglected aspect in the forecasting literature, presumably driven by the fact that the match results have stronger emotional and financial consequences for the fans and teams than the total number of goals.
Another research gap in football forecasting is the investigation of forecasts made during the course of a match. This comes as a surprise as so-called in-play betting has gained significant importance for bookmakers (Killick and Griffiths
2019; Lopez-Gonzalez and Griffiths
2016). Moreover, coaches, match analysts and broadcasters are highly interested in analysing matches in-play. In fact, some researchers have put thoughts to the scoring processes during the course of the match in more detail. Dixon and Robinson (
1998) use a birth process model allowing scoring intensities to change during the match and depend on the score to analyse the deviations from constant scoring rates. Similarly, Heuer and Rubner (
2012) use a model-free statistical analysis to investigate in which match situations scoring intensities deviate from a constant rate. Both approaches are mainly focused on understanding the process of a football match and whether certain game situations influence the scoring behaviour. None of these articles investigates in-play forecasts by calculating the effect of scoring deviations on the accuracy of in-play forecasts. To the best of our knowledge, the only paper investigating the role of in-play information in forecasting football is the recent work of Zou et al. (
2020), which, however, is limited to the number of goals as only in-play information. While our paper is limited to football, contributions focused on in-play models and in relation to in-play betting odds have been investigated in other sports such as tennis (Easton and Uylangco
2010; Kovalchik and Reid
2019) and cricket (Akhtar and Scarf
2012; Asif and McHale
2016). Reasons for the little effort made so far on in-play forecasting in football might be a higher model complexity, less availability of in-play betting odds as a benchmark in comparison to pre-game betting odds, and higher effort to gather and handle in-play data.
The difficulty of in-play forecasting of goals in football might be surprising because intuitively fans, experts and commentators commonly argue that they have anticipated a goal; they’ve seen it coming or explain it as the logical consequence of the course of play. This, however, could be a biased perception and it would be quite costly to measure the collaborative human perception of a football match and the collaborative anticipation of the further progress in an experimental approach. For that reason, we make use of an existing source of (big) data: Short textual messages from the microblogging platform Twitter with regard to a certain football match, which can be considered an in-play reflection of collaborative human perception on this match. While traditional dataset and probability models remain a predominant approach in football forecasting (Boshnakov et al.
2017; Koopman and Lit
2019; Wheatcroft
2020), researchers have also started to make use of Big Data (Brown et al.
2017) and machine learning (Berrar et al.
2019; Hubáček et al.
2019) in this domain. Twitter data itself has been used in various domains of forecasting including elections (Huberty
2015; Tumasjan et al.
2010) or stock prices (Bollen et al.
2011; Zhang et al.
2011), but have been discussed very controversial and critically (Gayo-Avello
2013; Huberty
2015; Jungherr et al.
2011). While Twitter certainly provides the possibility to gather massive datasets, the process of actually extracting relevant information is challenging and attempts to use Twitter in football forecasting have reported mixed results (Brown et al.
2017; Godin et al.
2014; Schumaker et al.
2016). In economic and political situations, the theoretical mechanism is viable as Twitter may reflect the opinion of the users and both election results and stock prices are directly influenced by the perception of the public. In football, this mechanism is evidently not present as a team will not succeed in a match only because the public would like to see the team win. In forecasting goals in-play, however, the following mechanism is conceivable: The course of the match influences the perception of the fans that will share their opinion on Twitter. If the course of play is actually a predictor for upcoming goals, Twitter data might indeed have predictive value. Though not considering predictive aspects, some researchers have focused on analysis of in-play Twitter data in relation to football matches. It has been reported that fans’ sentiments reflect reactions to goals of the own or opposing team (Yu and Wang
2015), fans tend to have a higher team identification when the team is leading than when it is trailing (Fan et al.
2020) and communication on the video assistant referee (VAR) is strongly associated with negative sentiment (Kolbinger and Knopp
2020). In contrast to the present study, however, analyses were based on highly limited sample sizes of five or less matches (Fan et al.
2020; Yu and Wang
2015) or on a very specific type of event during the matches, namely the VAR (Kolbinger and Knopp
2020).
The contributions of the present approach are threefold. First, a preliminary analysis sheds light on the general difficulty of in-play forecasting. Second, the topics discussed by Twitter users as well as their perception of the match over the course of football matches and as a reaction to goals are analysed by means of sentiment analysis techniques and further non-semantic tweet characteristics. Third, the possible informative value of Twitter data when used in in-play forecasting models is investigated.
4 Discussion
The results of the present study shed light on three different aspects of in-play forecasting with Twitter data, namely in-play forecasting in general, a detailed analysis of Twitter communication over the course of matches and the value of Twitter in in-play forecasting in football.
The preliminary analysis suggests that in-play forecasting of goals in general is a difficult task. Results are evidence for the limited value of in-play information (i.e. goals) to forecast the further course of a match when compared to betting odds as pre-game information, a fact that football players, coaches, match analysts, broadcasters and fans would probably strongly deny. Possible explanations are the high predictive quality of betting odds in football forecasting (Forrest et al.
2005; Hvattum and Arntzen
2010; Štrumbelj and Šikonja
2010) and the significant role of randomness in goal scoring in football (Brechot and Flepp
2020; Lames
2018, Wunderlich et al.
2021). Moreover, the result is in line with Wunderlich and Memmert (
2018) who showed that betting odds of prior matches possess more predictive value than the results of the matches themselves.
The analysis of tweet intensity revealed that both in-play and pre-game, tweet intensity is predominantly driven by the popularity of the two teams competing. Moreover, tweet intensity is increased in-play in matches with a higher number of total goals scored. The analysis of time dependence and goal analysis reveal how the reactions of Twitter users change over the course of matches and after goals are scored. Before the matches start less, but longer tweets are written, when compared to during the match, which is explainable by a heightened interest and faster sequence of events in-play. In terms of the topics and words contained, pre-game tweets are highly influenced by communication on how to follow broadcasts of the match and discussing which players are playing. Differences between communication in the first and second half are highly limited, while tweets directly following goals are naturally dominated by discussion on the score, the goal itself and its possible causes. The most striking result with regard to time dependence is a steadily increasing negativity and a steadily decreasing positivity while the match evolves, resulting in a clearly decreasing sentiment. It seems that fans (or at least those active on Twitter) tend to be disappointed by football matches, possibly caused by unjustified high expectations before and at the beginning of matches. The use of Twitter data and sentiment analysis techniques enables researchers to investigate perception and psychological reactions of users during football matches. Further research with a psychological focus could investigate which mechanisms drive the disappointment of fans during matches.
The analysis of minutes before and after goals reveals the reaction to goals, in particular a dramatic increase in tweet intensity where tweets are significantly shorter and a resulting lower number of hashtags and emoticons. The most unintuitive and difficult to explain result is the slightly lower negativity and positivity directly after goals. It is important to note that the tweets in our database are assigned to the match and not to a single team, thus emotions of both teams’ fans should be included which makes an unchanged overall sentiment comprehensible. However, even if including fans of the team scoring, the team receiving and even neutral observers, one would at least expect an increased emotionality as a reaction to the goal. One explanation could be neutral tweets that have a descriptive and no evaluative expression (e.g. “Penalty for The Red Devils. Rashford steps up and CONVERTS! Manchester United 1–0 Chelsea.”) or tweets that were potentially written with a lot of emotion, but do not include any words with a clear positive or negative connotation identifiable by a sentiment analysis algorithm (e.g. "GOOO[…]OOOL!!!! Rashford!!! 1–0 United!!!). With regard to the sentiments, although being validated in football, textual data are highly domain-specific and increased accuracy might be achievable if using domain-specific methods such as football-specific lexica of words. A more detailed analysis on what drives this unintuitive result, however, is beyond the scope of this study.
While the Twitter data clearly react to goals scored, a main focus of our approach was to test Twitter data for possible predictive value. The present data clearly do not support the idea that in-play Twitter data have predictive value as forecasts based on pre-game betting odds were not outperformed by a logistic regression model as well as a random forest model including in-play Twitter information. The fact that random forest models did not outperform logistic regression and hyperparameter tuning did have very limited effects on the accuracy suggests that this is actually attributable to the missing informative value of the Twitter data and not to the selection of methods. Put simply, we could not extract information from Twitter data that helps to forecast upcoming goals. Three possible aspects could explain this result. First, the in-play predictability seems to be very limited in general as previously demonstrated. Further studies investigating in-play notational data or positional data could shed more light on the question to which degree in-play forecasting is possible at all. Second, Twitter data might not include information that is relevant for forecasting. In a way, this is surprising as Twitter can be seen as a source of crowd wisdom and such sources have been shown to be highly valuable in forecasting football (Forrest et al.
2005; Peeters
2018; Spann and Skiera
2009). On the other side, Twitter is not a vehicle directly related to forecasting such as the betting market or prediction markets and moreover information is not easily extractable from Twitter. Thus, the third possible aspect is that the information reflected in Twitter data might not have been extracted effectively. Textual data are highly unstructured which makes the extraction of information difficult and leads to a limited degree of accuracy for sentiment analysis techniques (Wunderlich and Memmert
2020). Further progress in this domain can be expected as sentiment analysis is a highly relevant topic in computer science (Mäntylä et al.
2018; Piryani et al.
2017), nevertheless it will remain challenging to algorithmically reproduce human understanding of textual data. The problem of extracting relevant data might be aggravated by the short time intervals of 5 min yielding limited tweet samples and a higher randomness in the features. To account for the issue of short time intervals, we repeated analysis using data from the complete first half of a match to forecast the number of goals in the second half of a match. Despite larger time intervals, results implied the same conclusions, which suggest that the limited in-play predictive value is not attributable to the small time intervals.
In experimental research, the present results could be assessed as a null result as they do not support the notion of predictive in-play value of Twitter data and question the general value of in-play information including goals. Still, this is surprising and valuable information to coaches, match analysts and broadcasters who should question carefully to what extent in-play information can be used at all to draw conclusions on the further course of a match.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.