Skip to main content

2020 | Buch

Machine Learning and Data Mining for Sports Analytics

7th International Workshop, MLSA 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings


Über dieses Buch

This book constitutes the refereed post-conference proceedings of the 7th International Workshop on Machine Learning and Data Mining for Sports Analytics, MLSA 2020, colocated with ECML/PKDD 2020, in Ghent, Belgium, in September 2020. Due to the COVID-19 pandemic the conference was held online.
The 11 papers presented were carefully reviewed and selected from 22 submissions. The papers present a variety of topics within the area of sports analytics, including tactical analysis, outcome predictions, data acquisition, performance optimization, and player evaluation.




Routine Inspection: A Playbook for Corner Kicks
 We present a set of tools for identifying and studying the offensive and defensive strategies used by football teams in corner kick situations: their corner playbooks. Drawing from methods in topic modelling, our tools classify corners based on the runs made by the attacking players, enabling us to identify the distinct corner routines used by individual teams and search tracking data to find corners that exhibit specific features of interest. We use a supervised machine learning approach to identify whether individual defenders are marking man-to-man or zonally and study the positioning of zonal defenders over many matches. We demonstrate how our methods can be used for opposition analysis by highlighting the offensive and defensive corner strategies used by teams in our data over the course of a season.
Laurie Shaw, Sudarshan Gopaladesikan
How Data Availability Affects the Ability to Learn Good xG Models
Motivated by the fact that some shots are better than others, the expected goals (xG) metric attempts to quantify the quality of goal-scoring opportunities in soccer. The metric is becoming increasingly popular, making its way to TV analysts’ desks. Yet, a vastly underexplored topic in the context of xG is how these models are affected by the data on which they are trained. In this paper, we explore several data-related questions that may affect the performance of an xG model. We showed that the amount of data needed to train an accurate xG model depends on the complexity of the learner and the number of features, with up to 5 seasons of data needed to train a complex gradient boosted trees model. Despite the style of play changing over time and varying between leagues, we did not find that using only recent data or league-specific models improves the accuracy significantly. Hence, if limited data is available, training models on less recent data or different leagues is a viable solution. Mixing data from multiple data sources should be avoided.
Pieter Robberechts, Jesse Davis
Low-Cost Optical Tracking of Soccer Players
Sports analytics are on the rise in European football, however, due to the high cost so far only the top tier leagues and championships have had the privilege of collecting high precision data to build upon. We believe that this opportunity should be available for everyone especially for youth teams, to develop and recognize talent earlier. We therefore set the goal of creating a low-cost player tracking system that could be applied in a wide base of football clubs and pitches, which in turn would widen the reach for sports analytics, ultimately assisting the work of scouts and coaches in general. In this paper, we present a low-cost optical tracking solution based on cheap action cameras and cloud-deployed data processing. As we build on existing research results in terms of methods for player detection, i.e., background-foreground separation, and for tracking, i.e., Kalman filter, we adapt those algorithms with the aim of sacrificing as least as possible on accuracy while keeping costs low. The results are promising: our system yields significantly better accuracy than a standard deep learning based tracking model at the fraction of its cost. In fact, at a cost of $2.4 per match spent on cloud processing of videos for real-time results, all players can be tracked with a 11-meter precision on average.
Gabor Csanalosi, Gergely Dobreff, Alija Pasic, Marton Molnar, László Toka
An Autoencoder Based Approach to Simulate Sports Games
Sports data has become widely available in the recent past. With the improvement of machine learning techniques, there have been attempts to use sports data to analyze not only the outcome of individual games but also to improve insights and strategies. The outbreak of COVID-19 has interrupted sports leagues globally, giving rise to increasing questions and speculations about the outcome of this season’s leagues. What if the season was not interrupted and concluded normally? Which teams would end up winning trophies? Which players would perform the best? Which team would end their season on a high and which teams would fail to keep up with the pressure? We aim to tackle this problem and develop a solution. In this paper, we propose UCLData, which is a dataset containing detailed information of UEFA Champions League games played over the past six years. We also propose a novel autoencoder based machine learning pipeline that can come up with a story on how the rest of the season will pan out.
Ashwin Vaswani, Rijul Ganguly, Het Shah, Sharan Ranjit S, Shrey Pandit, Samruddhi Bothara
Physical Performance Optimization in Football
Physical performance optimization is essential for any sport, and it is feasible in today’s data-driven world. In numerous sports, it is a widely spread method to collect complex information about an athlete’s performance and physiological attributes. The collected data allows to create a personalized training program to maximize the athlete’s performance. Using the physiological attributes jointly with the physical load measurements can provide a refined complex picture of sportsmens’, specifically football players’, condition. We analyze a unique dataset that contains more than 600 key performance indicators and important physiological attributes, like the Creatine Kinase enzyme level, i.e., an indicator of muscles damage, the Heart Rate Variability that shows how well the player’s heart can adapt to the exercises, and sleep quality data. We examine the relationship between the physiological factors and the physical performance of the players in training sessions and matches. We obtain the unique intervals for the relevant parameters where performance can be maximized on matchdays. After determining these optimal intervals, we introduce the Minimum Number of Training Groups (MNTG) problem in order to create the minimum number of training groups, i.e., sets of players, that can train together to maximize their performance on matchday. We find that in \(96\%\) of the time three or fewer training groups are required to optimize the performance for matchday, instead of personalized separate training for all players.
Gergely Dobreff, Péter Revisnyei, Gábor Schuth, György Szigeti, László Toka, Alija Pašić
Predicting Player Trajectories in Shot Situations in Soccer
Player behaviors can have a significant impact on the outcome of individual events, as well as the game itself. The increased availability of high quality resolution spatio-temporal data has enabled analysis of player behavior and game strategy. In this paper, we present the implementation and evaluation of an imitation learning method using recurrent neural networks, which allows us to learn individual player behaviors and perform rollouts of player movements on previously unseen play sequences. The method is evaluated using a 2019 dataset from the top-tier soccer league in Sweden (Allsvenskan). Our evaluation provides insights how to best apply the method on movement traces in soccer, the relative accuracy of the method, and how well policies of one player role capture the relative behaviors of a different player role, for example.
Per Lindström, Ludwig Jacobsson, Niklas Carlsson, Patrick Lambrix

Other Team Sports

Stats Aren’t Everything: Learning Strengths and Weaknesses of Cricket Players
Strengths and weaknesses of individual players are understood informally by players themselves, coaches, and team management. However, there is no specific computational method to obtain strengths and weaknesses. The objective of this work is to obtain rules describing the strengths and weaknesses of cricket players. Instead of looking at the traditional statistics, which are nothing but the raw counts of certain events in the game, we focus on cricket text commentaries, which are written narratives giving a detailed description of a minute-by-minute account of the game while it is unfolding.
Swarup Ranjan Behera, Vijaya V. Saradhi
Prediction of Tiers in the Ranking of Ice Hockey Players
Many teams in the NHL utilize data analysis and employ data analysts. An important question for these analysts is to identify attributes and skills that may help predict the success of individual players. This study uses detailed player statistics from four seasons, player rankings from EA’s NHL video games, and six machine learning algorithms to find predictive models that can be used to identify and predict players’ ranking tier (top 10%, 25% and 50%). We also compare and contrast which attributes and skills best predict a player’s success, while accounting for differences in player positions (goalkeepers, defenders and forwards). When comparing the resulting models, the Bayesian classifiers performed best and had the best sensitivity. The tree-based models had the highest specificity, but had trouble classifying the top 10% tier players. In general, the models were best at classifying forwards, highlighting that many of the official metrics are focused on the offensive measures and that it is harder to use official performance metrics alone to differentiate between top tier players.
Timmy Lehmus Persson, Haris Kozlica, Niklas Carlsson, Patrick Lambrix

Individual Sports

A Machine Learning Approach for Road Cycling Race Performance Prediction
Predicting cycling race results has always been a task left to experts with a lot of domain knowledge. This is largely due to the fact that the outcomes of cycling races can be rather surprising and depend on an extensive set of parameters. Examples of such factors are, among others, the preparedness of a rider, the weather, the team strategy, and mechanical failure. However, we believe that due to the availability of historical data (e.g., race results, GPX files, and weather data) and the recent advances in machine learning, the prediction of the outcomes of cycling races becomes feasible. In this paper, we present a framework for predicting future race outcomes by using machine learning. We investigate the use of past performance race data as a good predictor. In particular, we focus on the Tour of Flanders as our proof-of-concept. We show, among others, that it is possible to predict the outcomes of a one-day race with similar or better accuracy than a human.
Leonid Kholkine, Tom De Schepper, Tim Verdonck, Steven Latré
Mining Marathon Training Data to Generate Useful User Profiles
In this work we generate user profiles from the raw activity data of over 12000 marathon runners. We demonstrate that these user profiles capture accurate representations of the fitness and training of a runner, and show that they are comparable to current methods used to predict marathon performance – many of which require many years of prior experience or expensive laboratory testing. We also briefly investigate how these user profiles can be used to help marathon runners in their training and race preparation when combined with current recommender systems approaches.
Jakim Berndsen, Barry Smyth, Aonghus Lawlor
Learning from Partially Labeled Sequences for Behavioral Signal Annotation
Herewith, we present a learning procedure that allows to deal with a partially labeled sequence dataset, i.e. when each sequence in the train dataset may contain labeled as well as unlabeled chunks. In our application case, this occurs when motor activity has been manually annotated (due to the recognition based on the video recording) and independently registered by the measuring system of high precision (touch sensors): human annotation misses some events that have been captured by the sensors. In the general setting, we aim at predicting the labels for a new fully unlabeled movement sequence, while the training has been performed on the partially labeled dataset. For this purpose we propose to use classical sequence model (hidden Markov model) that is furnished with a constrained Viterbi algorithm, which gives us a quick access to the hard approximation of the correct labeling sequences. We demonstrate, that this simple modification that constrained Viterbi provide, allows the HMM model to be trained on sparse data, and overall results in surprisingly high log-likelihood and accuracy level in annotating the partially labeled behavioral sequences in climbing. The same time we show the way to access correct labeling of the unannotated signal that can be helpful in various sport science studies for movement pattern sequential prediction.
Anna Aniszewska-Stępień, Romain Hérault, Guillaume Hacques, Ludovic Seifert, Gilles Gasso
Machine Learning and Data Mining for Sports Analytics
herausgegeben von
Prof. Dr. Ulf Brefeld
Jesse Davis
Jan Van Haaren
Albrecht Zimmermann
Electronic ISBN
Print ISBN

Premium Partner