main-content

## Über dieses Buch

This book studies mathematical theories of machine learning. The first part of the book explores the optimality and adaptivity of choosing step sizes of gradient descent for escaping strict saddle points in non-convex optimization problems. In the second part, the authors propose algorithms to find local minima in nonconvex optimization and to obtain global minima in some degree from the Newton Second Law without friction. In the third part, the authors study the problem of subspace clustering with noisy and missing data, which is a problem well-motivated by practical applications data subject to stochastic Gaussian noise and/or incomplete data with uniformly missing entries. In the last part, the authors introduce an novel VAR model with Elastic-Net regularization and its equivalent Bayesian model allowing for both a stable sparsity and a group selection.

## Inhaltsverzeichnis

### Chapter 1. Introduction

Abstract
Learning has various definitions based on the context in which it is used and the various entities involved in the learning process. The need for machines to learn and thus adapt to the changes in its surroundings led to the rise of the field aptly called “machine learning.” A machine is expected to learn and predict the future outcomes based on the changes that it notices in the external structure, data/inputs fed that would have to be responded to, and the program/function it was built to perform. This forms the basis for the various complex computational capabilities that any modern artificially intelligent based (AI-based) system would require and includes computations dealing with recognition of patterns, diagnosis of data, controlling the system or environment, planning activities, etc.
Bin Shi, S. S. Iyengar

### Chapter 2. General Framework of Mathematics

Abstract
With the explosive growth of data nowadays, a young and interdisciplinary field, data science , has emerged, which uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured. This data science field is becoming popular and needs to be developed urgently so that it can serve and guide for the industry of the society. Rigorously, applied data science is a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.
Bin Shi, S. S. Iyengar

### Chapter 3. Optimization Formulation

Abstract
Based on the description on the statistics model in the previous section, we formulate the problems that we need to solve from two angles. One is from the field of optimization, the other is from samples of probability distribution. Practically, from the view of efficient algorithms in computers, the representation of the first one is the expectation–maximization (EM) algorithm . The EM algorithm is used to find (local) maximum likelihood parameters of a statistical model in scenarios wherein the equations cannot be solved directly. These models use latent variables along with unknown parameters and known data observations, i.e., either there is a possibility of finding missing values among the data or the model can be formulated in more simple terms by assuming the existence of unobserved data points. A mixture model can be described in simplistic terms with an assumption that each of the observed data points have a corresponding unobserved data point, or latent variable that specifies the mixture component to which each of the data points belong.
Bin Shi, S. S. Iyengar

### Chapter 4. Development of Novel Techniques of CoCoSSC Method

Abstract
This chapter provides an introduction to our main contributions concerning the development of the novel methods of CoCoSSC.
Bin Shi, S. S. Iyengar

### Chapter 5. Necessary Notations of the Proposed Method

Abstract
We define necessary notations and review important definitions that will be used later in our analysis. Let $$C^{2}(\mathbb {R}^{n})$$ be the vector space of real-valued twice-continuously differentiable functions. Let ∇ be the gradient operator and ∇2 be the Hessian operator. Let ∥⋅∥2 be the Euclidean norm in $$\mathbb {R}^{n}$$. Let μ be the Lebesgue measure in $$\mathbb {R}^n$$.
Bin Shi, S. S. Iyengar

### Chapter 6. Related Work on Geometry of Non-Convex Programs

Abstract
Over the past few years, there have been increasing interest in understanding the geometry of non-convex programs that naturally arise from machine learning problems. It is particularly interesting to study additional properties of the considered non-convex objective such that popular optimization methods (such as gradient descent) escape saddle points and converge to a local minimum. The strict saddle property (Definition 5.​6) is one such property, which was also shown to hold in a broad range of applications.
Bin Shi, S. S. Iyengar

### Chapter 7. Gradient Descent Converges to Minimizers: Optimal and Adaptive Step-Size Rules

Abstract
As mentioned in Chap. 3, gradient descent (GD) and its variants provide the core optimization methodology in machine learning problems. Given a C 1 or C 2 function $$f: \mathbb {R}^{n} \rightarrow \mathbb {R}$$ with unconstrained variable $$x \in \mathbb {R}^{n}$$, GD uses the following update rule:
$$\displaystyle x_{t+1} = x_{t} - h_t \nabla f\left (x_t\right ),$$
where h t are step size, which may be either fixed or vary across iterations. When f is convex, $$h_t < \frac {2}{L}$$ is a necessary and sufficient condition to guarantee the (worst-case) convergence of GD, where L is the Lipschitz constant of the gradient of the function f. On the other hand, there is far less understanding of GD for non-convex problems. For general smooth non-convex problems, GD is only known to converge to a stationary point (i.e., a point with zero gradient).
Bin Shi, S. S. Iyengar

### Chapter 8. A Conservation Law Method Based on Optimization

Abstract
This chapter is organized as follows: In Sect. 8.1, we warm up with an analytical solution for simple 1-D quadratic function. In Sect. 8.2, we propose the artificially dissipating energy algorithm, energy conservation algorithm, and the combined algorithm based on the symplectic Euler scheme, and remark a second-order scheme—the Störmer–Verlet scheme. In Sect. 8.3, we propose the locally theoretical analysis for high-speed convergence. Section 8.4 proposes the experimental demonstration. In Sect. 8.4, we propose the experimental result for the proposed algorithms on strongly convex, non-strongly convex, and non-convex functions in high dimension. Finally, we propose some perspective view for the proposed algorithms and two adventurous ideas based on the evolution of Newton’s second law—fluid and quantum.
Bin Shi, S. S. Iyengar

### Chapter 9. Improved Sample Complexity in Sparse Subspace Clustering with Noisy and Missing Observations

Abstract
In this chapter, we show the results of the new CoCoSSC algorithm. The content is organized as follows: The main results concerning CoCoSSC algorithm are shown in Sect. 9.1. Following Sect. 9.1, we show the full proofs in Sect. 9.2. In Sect. 9.3, we show the performance for CoCoSSC algorithm and some related algorithms numerically. Finally, we conclude this work with some future directions.
Bin Shi, S. S. Iyengar

### Chapter 10. Online Discovery for Stable and Grouping Causalities in Multivariate Time Series

Abstract
The content of this chapter is organized as follows: The problem formulation is presented in Sect. 10.1. Section 10.2 introduces the details about our proposed approach and its equivalent Bayesian model. A solution capable of online inference with particle learning is given in Sect. 10.3. Extensive empirical evaluation is demonstrated in Sect. 10.4. Finally, we conclude our work and discuss the future work.
Bin Shi, S. S. Iyengar

### Chapter 11. Conclusion

Abstract
Now more than ever machine learning and embedded AI will be essential in maintaining information assurance for all aspects of our nation’s security and defense, as well as every transaction we make in government and commercial or private business operations. Information is the lifeblood of every business transaction and managing risks related to the use, processing, storage, analysis, and transmission of this information, as well as the enormous “big data” analytics systems upon which we now rely are the vital parts allowing us to process and sustain the flow of information for our modern world. As we increase the numbers of devices and interconnected networks, especially as the Internet of things begins to blossom, enormous risks will continue to emerge. Any disruption to our systems and processes could cause economic collapse for a business, as well as our nation.
Bin Shi, S. S. Iyengar

### Backmatter

Weitere Informationen