Understanding draws in Elo rating algorithm

Leszek Szczecinski; Aymen Djebbi

doi:10.1515/jqas-2019-0102

Published by De Gruyter May 4, 2020

Understanding draws in Elo rating algorithm

Leszek Szczecinski and Aymen Djebbi

From the journal Journal of Quantitative Analysis in Sports

https://doi.org/10.1515/jqas-2019-0102

Showing a limited preview of this publication:

Abstract

This work is concerned with the interpretation of the results produced by the well known Elo algorithm applied in various sport ratings. The interpretation consists in defining the probabilities of the game outcomes conditioned on the ratings of the players and should be based on the probabilistic rating-outcome model. Such a model is known in the binary games (win/loss), allowing us to interpret the rating results in terms of the win/loss probability. On the other hand, the model for the ternary outcomes (win/loss/draw) has not been yet shown even if the Elo algorithm has been used in ternary games from the very moment it was devised. Using the draw model proposed by Davidson in 1970, we derive a new Elo-Davidson algorithm, and show that the Elo algorithm is its particular instance. The parameters of the Elo-Davidson are then related to the frequency of draws which indicates that the Elo algorithm silently assumes games with 50% of draws. To remove this assumption, often unrealistic, the Elo-Davidson algorithm should be used as it improves the fit to the data. The behaviour of the algorithms is illustrated using the results from English Premier League.

Keywords: draws; Elo algorithm; rating in sports

Acknowledgement

Many thanks to J.-C. Gregoire (INRS, Canada) and E. V. Kuhn (Federal University of Santa Catarina, Brazil) for critical reading.

funding: The work was supported by NSERC, Canada.

References

Aldous, D. 2017. “Elo Ratings and the Sports Model: A Neglected Topic in Applied Probability?” Statistical Science 32:616–629. https://doi.org/10.1214/17-STS628.10.1214/17-STS628Search in Google Scholar

Bishop, C. 2006. Pattern Recognition and Machine Learning. Springer.Search in Google Scholar

Bradley, R. A. and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: 1 the Method of Paired Comparisons.” Biometrika 39:324–345.10.1093/biomet/39.3-4.324Search in Google Scholar

Caron, F. and A. Doucet. 2012. “Efficient Bayesian Inference for Generalized Bradley–Terry Models.” Journal of Computational and Graphical Statistics 21:174–196. https://doi.org/10.1080/10618600.2012.638220.10.1080/10618600.2012.638220Search in Google Scholar

Cattelan, M. 2012. “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data.” Statistical Science 27:412–433.10.1214/12-STS396Search in Google Scholar

David, H. 1963. The Method of Paired Comparison. Charles Griffin & Co. Ltd.Search in Google Scholar

Davidson, R. R. 1970. “On Extending the Bradley-Terry Model to Accommodate Ties in Paired Comparison Experiments.” Journal of the American Statistical Association 65:317–328. http://www.jstor.org/stable/2283595.10.1080/01621459.1970.10481082Search in Google Scholar

Davidson, R. R. and R. J. Beaver. 1977. “On Extending the Bradley-Terry model to Incorporate Within-Pair Order Effects.” Biometrics 33:693–702.10.2307/2529467Search in Google Scholar

Elo, A. E. 2008. The Rating of Chess Players, Past and Present. Ishi Press International.Search in Google Scholar

Fahrmeir, L. and G. Tutz. 1994. “Dynamic Stochastic Models for Time-Dependent Ordered Paired Comparison Systems.” Journal of the American Statistical Association 89:1438–1449. http://dx.doi.org/10.1093/biomet/39.3-4.324.10.1080/01621459.1994.10476882Search in Google Scholar

FIFA. 2018. “Fédération International de Football Association: Men’s Ranking Procedure.” https://www.fifa.com/fifa-world-ranking/procedure/.Search in Google Scholar

Football-data.co.uk. 2019. “Historical Football Results and Betting Odds Data.” https://www.football-data.co.uk/data.php.Search in Google Scholar

Gelman, A., J. Hwang, and A. Vehtari. 2014. “Understanding Predictive Information Criteria for Bayesian Models.” Statistics and Computing 24:997–1016. https://doi.org/10.1007/s11222-013-9416-2.10.1007/s11222-013-9416-2Search in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48:377–394. http://dx.doi.org/10.1111/1467-9876.00159.10.1111/1467-9876.00159Search in Google Scholar

Glickman, M. 2018. “Paired Comparison Models with tie Probabilities and Order Effects as a Function of Strength.” http://www.fields.utoronto.ca/talks/Paired-Comparison-Models-Tie-Probabilities-and-Order-Effects-Function-Strength.Search in Google Scholar

Herbrich, R. and T. Graepel. 2006. “Trueskill(TM): A Bayesian Skill Rating System.” Technical Report. https://www.microsoft.com/en-us/research/publication/trueskilltm-a-bayesian-skill-rating-system-2/.10.7551/mitpress/7503.003.0076Search in Google Scholar

Joe, H. 1990. “Extended Use of Paired Comparison Models, with Application to Chess Rankings.” Journal of the Royal Statistical Society Series C (Applied Statistics) 39:85–93. http://www.jstor.org/stable/2347814.10.2307/2347814Search in Google Scholar

Király, F. J. and Z. Qian. 2017. “Modelling Competitive Sports: Bradley-Terry-Elo Models for Supervised and On-Line Learning of Paired Competition Outcomes.” arXiv e-prints arXiv:1701.08055.Search in Google Scholar

Koning, R. H. 2000. “Balance in Competition in Dutch Soccer.” Journal of the Royal Statistical Society: Series D (The Statistician) 49:419–431. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9884.00244.10.1111/1467-9884.00244Search in Google Scholar

Langville, A. N. and C. D. Meyer. 2012. Who’s #1, The Science of Rating and Ranking. Princeton University Press.10.1515/9781400841677Search in Google Scholar

Lasek, J., Z. Szlávik, and S. Bhulai. 2013. “The Predictive Power of Ranking Systems in Association Football.” International Journal of Applied Pattern Recognition 1:27–46. https://www.inderscienceonline.com/doi/abs/10.1504/IJAPR.2013.052339, pMID: 52339.10.1504/IJAPR.2013.052339Search in Google Scholar

Rao, P. V. and L. L. Kupper. 1967. “Ties in Paired-Comparison Experiments: A Generalization of the Bradley-Terry Model.” Journal of the American Statistical Association 62:194–204. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1967.10482901.10.1080/01621459.1967.10482901Search in Google Scholar

Thurston, L. L. 1927. “A law of Comparative Judgement.” Psychological Review 34:273–286.10.1037/h0070288Search in Google Scholar

Wikipedia contributors. 2019. “Wikipedia: Elo Rating System.” https://en.wikipedia.org/wiki/Elo_rating_system.Search in Google Scholar

Published Online: 2020-05-04

Published in Print: 2020-09-25

Understanding draws in Elo rating algorithm

Abstract

Acknowledgement

References

Journal and Issue

Articles in the same Issue