Skip to main content
Top

2013 | Book

Robust Data Mining

Authors: Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis

Publisher: Springer New York

Book Series : SpringerBriefs in Optimization

insite
SEARCH

About this book

Data uncertainty is a concept closely related with most real life applications that involve data collection and interpretation. Examples can be found in data acquired with biomedical instruments or other experimental techniques. Integration of robust optimization in the existing data mining techniques aim to create new algorithms resilient to error and noise.

This work encapsulates all the latest applications of robust optimization in data mining. This brief contains an overview of the rapidly growing field of robust data mining research field and presents the most well known machine learning algorithms, their robust counterpart formulations and algorithms for attacking these problems.

This brief will appeal to theoreticians and data miners working in this field.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
Data mining (DM), conceptually, is a very general term that encapsulates a large number of methods, algorithms, and technologies. The common denominator among all these is their ability to extract useful patterns and associations from data usually stored in large databases. Thus DM techniques aim to provide knowledge and interesting interpretation of, usually, vast amounts of data. This task is crucial, especially today, mainly because of the emerging needs and capabilities that technological progress creates. In this monograph we investigate some of the most well-known data mining algorithms from an optimization perspective and we study the application of robust optimization (RO) in them. This combination is essential in order to address the unavoidable problem of data uncertainty that arises in almost all realistic problems that involve data analysis. In this chapter we provide some historical perspectives of data mining and its foundations and at the same time we “touch” the concepts of robust optimization and discuss its differences compared to stochastic programming.
Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis
Chapter 2. Least Squares Problems
Abstract
In this chapter we provide an overview of the original minimum least squares problem and its variations. We present their robust formulations as they have been proposed in the literature so far. We show the analytical solutions for each variation and we conclude the chapter with some numerical techniques for computing them efficiently.
Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis
Chapter 3. Principal Component Analysis
Abstract
The principal component analysis (PCA) transformation is a very common and well-studied data analysis technique that aims to identify some linear trends and simple patterns in a group of samples. It has application in several areas of engineering. It is popular from computational perspective as it requires only an eigendecomposition or singular value decomposition. There are two alternative optimization approaches for obtaining principal component analysis solution, the one of variance maximization and the one of minimum error formulation. Both start with a “different” initial objective and end up providing the same solution. It is necessary to study and understand both of these alternative approaches. In the second part of this chapter we present the robust counterpart formulation of PCA and demonstrate how such a formulation can be used in practice in order to produce sparse solutions.
Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis
Chapter 4. Linear Discriminant Analysis
Abstract
In this chapter we discuss another popular data mining algorithm that can be used for supervised or unsupervised learning. Linear Discriminant Analysis (LDA) was proposed by R. Fischer in 1936. It consists in finding the projection hyperplane that minimizes the interclass variance and maximizes the distance between the projected means of the classes. Similarly to PCA, these two objectives can be solved by solving an eigenvalue problem with the corresponding eigenvector defining the hyperplane of interest. This hyperplane can be used for classification, dimensionality reduction and for interpretation of the importance of the given features. In the first part of the chapter we discuss the generic formulation of LDA whereas in the second we present the robust counterpart scheme originally proposed by Kim and Boyd. We also discuss the non linear extension of LDA through the kernel transformation.
Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis
Chapter 5. Support Vector Machines
Abstract
In this chapter we describe one of the most successful supervised learning algorithms namely suppor vector machines (SVMs). The SVM is one of the conceptually simplest algorithms whereas at the same time one of the best especially for binary classification. Here we illustrate the mathematical formulation of SVM together with its robust equivalent for the most common uncertainty sets.
Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis
Chapter 6. Conclusion
Abstract
In this work, we presented some of the major recent advances of robust optimization in data mining. Through this monograph, we examined most of the data mining methods from the scope of uncertainty handling with only exception the principal component analysis (PCA) transformation. Nevertheless the uncertainty can be seen as a special case of prior knowledge. In prior knowledge classification, for example, we are given together with the training sets some additional information about the input space. Another type of prior knowledge other than uncertainty is the so-called expert knowledge, e.g., binary rule of the type “if feature a is more than M 1 and feature b less than M 2 then the sample belongs to class x.” There has been significant amount of research in the area of prior knowledge classification [33, 49] but there has not been a significant study of robust optimization on this direction.
Petros Xanthopoulos, Panos M. Pardalos, Theodore B. Trafalis
Backmatter
Metadata
Title
Robust Data Mining
Authors
Petros Xanthopoulos
Panos M. Pardalos
Theodore B. Trafalis
Copyright Year
2013
Publisher
Springer New York
Electronic ISBN
978-1-4419-9878-1
Print ISBN
978-1-4419-9877-4
DOI
https://doi.org/10.1007/978-1-4419-9878-1