Skip to main content
Top

2010 | Book

Sports Data Mining

Authors: Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen

Publisher: Springer US

Book Series : Integrated Series in Information Systems

insite
SEARCH

About this book

Data mining is the process of extracting hidden patterns from data, and it’s commonly used in business, bioinformatics, counter-terrorism, and, increasingly, in professional sports. First popularized in Michael Lewis’ best-selling Moneyball: The Art of Winning An Unfair Game, it is has become an intrinsic part of all professional sports the world over, from baseball to cricket to soccer. While an industry has developed based on statistical analysis services for any given sport, or even for betting behavior analysis on these sports, no research-level book has considered the subject in any detail until now.

Sports Data Mining brings together in one place the state of the art as it concerns an international array of sports: baseball, football, basketball, soccer, greyhound racing are all covered, and the authors (including Hsinchun Chen, one of the most esteemed and well-known experts in data mining in the world) present the latest research, developments, software available, and applications for each sport. They even examine the hidden patterns in gaming and wagering, along with the most common systems for wager analysis.

Table of Contents

Frontmatter
Chapter 1. Sports Data Mining: The Field
Abstract
Incredible amounts of data exist across all domains of sports. This data can come in the form of individual player performance, coaching or managerial decisions, game-based events and/or how well the team functions together. The task is not how to collect the data, but what data should be collected and how to make the best use of it. By finding the right ways to make sense of data and turning it into actionable knowledge, sports organizations have the potential to secure a competitive advantage versus their peers. This knowledge seeking approach can be applied throughout the entire organization. From players improving their game-time performance using video analysis techniques, to scouts using statistical analysis and projection techniques to identify what talent will provide the biggest impact, data mining is quickly becoming an integral part of the sports decision making landscape where manager/coaches using machine learning and simulation techniques can find optimal strategies for an entire upcoming season.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 2. Sports Data Mining Methodology
Abstract
Data Mining involves procedures for uncovering hidden trends and developing new data and information from data sources. These sources can include well-structured and defined databases, such as statistical compilations, or unstructured data in the form of multimedia sources such as video broadcasts and play-by-play narration.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 3. Data Sources for Sports
Abstract
Data, the life-blood of modern sport analysis, has undergone its own revolution. It used to be that data was simply viewed as a record of the game’s events that was kept either by the organizations or the responsible leagues for historical purposes. That data became transformed into a condensed form to provide a brief recap of the game’s events through a newspaper boxscore. It wasn’t until many years later that publishing of data became cheap enough to fill a growing niche of interest. Game data was then expanded upon with comparisons made across different sets. This activity led to refinement as new ideas were introduced of what data should be captured. Then with the advent of the Internet revolution, data rose to the height of accessibility, where sport-related data could be found easily and quickly, oftentimes in searchable form.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 4. Research in Sports Statistics
Abstract
This chapter investigates the role that statistics plays in knowledge creation. While many of these techniques have stood the test of time, some have undergone intense scrutiny while others have experienced transformative processes. All the while we must ask ourselves, are we really measuring what we think we are measuring? This chapter will help to make that distinction.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 5. Tools and Systems for Sports Data Analysis
Abstract
This chapter investigates some of the data mining and scouting tools available for sports analysis. In particular, we analyze the role of these tools and how they can help an organization. Tools such as Advanced Scout, which maintains play-by-play data in an easy to query environment and Inside Edge, which provides pictorial descriptions of player tendencies, will be investigated. Sports fraud detection is another interesting area where sport-related data can be analyzed against historical patterns to identify potential instances of sports fraud from players, corrupt officials or even suspicious bettors.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 6. Predictive Modeling for Sports and Gaming
Abstract
Predictive modeling has long been the goal of many individuals and organizations. This science has many techniques, with simulation and machine learning at its heart. Simulations such as basketball’s BBall can model an entire season and can deduce optimal substitution patterns and scoring potential of players. Should unforeseen events occur such as an unexpected trade or long-term injury, additional simulations can be performed to assess new forms of action. Aside from the potential of simulations, machine learning techniques can uncover hidden data trends. Greyhound racing is one such area that has been explored with many different machine learners. While the choice of algorithms used in each study may differ, they all had one common similarity, they beat the choices human track experts made and were able to use the data to create arbitrage opportunities.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 7. Multimedia and Video Analysis for Sports
Abstract
Sports information and footage are quickly becoming increasingly available in digital form. Using many of the tools previously outfitted for textual searching, video and multimedia searching and retrieval is becoming more commonplace in sports. Automated methods to watch and listen to games are being used to parse video and render it in searchable form.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 8. Web Sports Data Extraction and Visualization
Abstract
How is it that we value data? Is a simple repository of data all that we need? It used to be that carrying a copy of Total Baseball was all that was ever needed, as it provided a historical perspective of player data that was adequate for our needs only a decade ago. Then as sabermetrics began to awaken the sporting world’s desire for more data and consequently new ways of analyzing that data, data itself began to evolve. Data first moved from static pages of written form to online resources. While this step was simply a change of venue, data was still data, but it soon began to become more. Web applications began to sort this data into leaderboards on a whole host of different statistics, thus entered information. From there, the applications evolved further, exploring the graphical realms of presentation, pushing that information into knowledge. It is amazing to think how quaint our memories of carrying a printed copy of Total Baseball are by today’s standards.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 9. Open Source Data Mining Tools for Sports
Abstract
Open source development has become more prominent in recent years in a multitude of software areas. In the domain of data mining tools, several solutions have gained significant acceptance such as Weka and RapidMiner. Both tools share the same underlying learning algorithms, however, their approach to displaying results, are very much different.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 10. Greyhound Racing Using Neural Networks: A Case Study
Abstract
Uncertainty is inevitable in problem solving and decision making. One way to reduce it is by seeking the advice of an expert. When we use computers to reduce uncertainty, the computer itself can become an “expert” in a specific field through a variety of methods. One such method is machine learning, which involves using a computer algorithm to capture hidden knowledge from data. Machine learning usually encompasses different types of solutions, such as decision trees, production rules, and neural networks.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 11. Greyhound Racing Using Support Vector Machines: A Case Study
Abstract
In this chapter we investigate the role of machine learning within the domain of Greyhound Racing. We test a Support Vector Regression (SVR) algorithm on 1,953 races across 31 different dog tracks and explore the role of a simple betting engine on a wide range of wager types.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 12. Betting and Gaming
Abstract
How is it that sports and gambling co-exist so easily together, yet can cause so many problems? We explore the relationship between sports and gambling from a historical perspective and describe the ways that some organizations are trying to keep a safe distance between the two.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Chapter 13. Conclusions
Abstract
Over the next several years, sports data mining practices will be faced with several challenges and obstacles. The most obvious of which is to overcome the years of resistance by the members of sporting organizations that would rather stick with a traditional way of doing things. Aside from the challenges that are faced, sports data mining currently sits at a pivotal junction in history with many opportunities just waiting to be grabbed. Some avenues of opportunity will be pursued quickly, while others may take years or decades to become fruitful. In any event, sports data mining today is still in its infancy. While some first steps were made with pioneers such as Dean Oliver and Bill James, the next few years will become a transition period as the technology begins to mature within the sporting community and become more commonplace. New metrics, algorithms and ways of thinking will begin circulating themselves as the field enters puberty and begins to mature. The coming decades will be fascinating to watch.
Robert P. Schumaker, Osama K. Solieman, Hsinchun Chen
Backmatter
Metadata
Title
Sports Data Mining
Authors
Robert P. Schumaker
Osama K. Solieman
Hsinchun Chen
Copyright Year
2010
Publisher
Springer US
Electronic ISBN
978-1-4419-6730-5
Print ISBN
978-1-4419-6729-9
DOI
https://doi.org/10.1007/978-1-4419-6730-5

Premium Partner