Skip to main content
Top

2017 | Book

Predictive Data Mining Models

insite
SEARCH

About this book

This book reviews forecasting data mining models, from basic tools for stable data through causal models, to more advanced models using trends and cycles. These models are demonstrated on the basis of business-related data, including stock indices, crude oil prices, and the price of gold. The book’s main approach is above all descriptive, seeking to explain how the methods concretely work; as such, it includes selected citations, but does not go into deep scholarly reference. The data sets and software reviewed were selected for their widespread availability to all readers with internet access.

Table of Contents

Frontmatter
Chapter 1. Knowledge Management
Abstract
Knowledge management is an overarching term referring to the ability to identify, store, and retrieve knowledge. Identification requires gathering the information needed and to analyze available data to make effective decisions regarding whatever the organization does. This include research, digging through records, or gathering data from wherever it can be found. Storage and retrieval of data involves database management, using many tools developed by computer science. Thus knowledge management involves understanding what knowledge is important to the organization, understanding systems important to organizational decision making, database management, and analytic tools of data mining.
David L. Olson, Desheng Wu
Chapter 2. Data Sets
Abstract
Data comes in many forms. The current age of big data floods us with numbers accessible from the Web. We have trading data available in real time (which caused some problems with automatic trading algorithms, so some trading sites impose a delay of 20 min or so to make this data less real-time). Wal-Mart has real-time data from its many cash registers enabling it to automate intelligent decisions to manage its many inventories. Currently a wrist device called a Fitbit is very popular, enabling personal monitoring of individual health numbers, which have the ability to be shared in real-time with physicians or ostensibly EMT providers. The point is that there is an explosion of data in our world.
David L. Olson, Desheng Wu
Chapter 3. Basic Forecasting Tools
Abstract
We will present two fundamental time series forecasting tools. Moving average is a very simple approach, presented because it is a component of ARIMA models to be covered in a future chapter. Regression is a basic statistical tool. In data mining, it is one of the basic tools for analysis, used in classification applications through logistic regression and discriminant analysis, as well as prediction of continuous data through ordinary least squares (OLS) and other forms. As such, regression is often taught in one (or more) three-hour courses.
David L. Olson, Desheng Wu
Chapter 4. Multiple Regression
Abstract
Regression models allow you to include as many independent variables as you want. In traditional regression analysis, there are good reasons to limit the number of variables. The spirit of exploratory data mining, however, encourages examining a large number of independent variables. Here we are presenting very small models for demonstration purposes. In data mining applications, the assumption is that you have very many observations, so that there is no technical limit on the number of independent variables.
David L. Olson, Desheng Wu
Chapter 5. Regression Tree Models
Abstract
Decision trees are models that process data to split it in strategic places to divide the data into groups with high probabilities of one outcome or another. It is especially effective at data with categorical outcomes, but can also be applied to continuous data, such as the time series we have been considering. Decision trees consist of nodes, or splits in the data defined as particular cutoffs for a particular independent variable, and leaves, which are the outcome. For categorical data, the outcome is a class. For continuous data, the outcome is a continuous number, usually some average measure of the dependent variable.
David L. Olson, Desheng Wu
Chapter 6. Autoregressive Models
Abstract
Autoregressive models take advantage of the correlation between errors across time periods. Basic linear regression views this autocorrelation as a negative statistical property, a bias in error terms. Such bias often arises in cyclical data, where if the stock market price was high yesterday, it likely will be high today, as opposed to a random walk kind of characteristic where knowing the error of the last forecast should say nothing about the next error. Traditional regression analysis sought to wash out the bias from autocorrelation. Autoregressive models, to the contrary, seek to utilize this information to make better forecasts. It doesn’t always work, but if there are high degrees of autocorrelation, autoregressive models can provide better forecasts.
David L. Olson, Desheng Wu
Chapter 7. Classification Tools
Abstract
Data mining uses a variety of modeling tools for a variety of purposes. Various authors have viewed these purposes along with available tools (see Table 7.1). There are many other specific methods used as well.
David L. Olson, Desheng Wu
Chapter 8. Predictive Models and Big Data
Abstract
Data mining has proven valuable in almost every academic discipline. Understanding business application of data mining is necessary to expose business college students to current analytic information technology.
David L. Olson, Desheng Wu
Backmatter
Metadata
Title
Predictive Data Mining Models
Authors
David L. Olson
Desheng Wu
Copyright Year
2017
Publisher
Springer Singapore
Electronic ISBN
978-981-10-2543-3
Print ISBN
978-981-10-2542-6
DOI
https://doi.org/10.1007/978-981-10-2543-3

Premium Partner