A sparse grid based method for generative dimensionality reduction of high-dimensional data

https://doi.org/10.1016/j.jcp.2015.12.033Get rights and content

Abstract

Generative dimensionality reduction methods play an important role in machine learning applications because they construct an explicit mapping from a low-dimensional space to the high-dimensional data space. We discuss a general framework to describe generative dimensionality reduction methods, where the main focus lies on a regularized principal manifold learning variant. Since most generative dimensionality reduction algorithms exploit the representer theorem for reproducing kernel Hilbert spaces, their computational costs grow at least quadratically in the number n of data. Instead, we introduce a grid-based discretization approach which automatically scales just linearly in n. To circumvent the curse of dimensionality of full tensor product grids, we use the concept of sparse grids.

Furthermore, in real-world applications, some embedding directions are usually more important than others and it is reasonable to refine the underlying discretization space only in these directions. To this end, we employ a dimension-adaptive algorithm which is based on the ANOVA (analysis of variance) decomposition of a function. In particular, the reconstruction error is used to measure the quality of an embedding. As an application, the study of large simulation data from an engineering application in the automotive industry (car crash simulation) is performed.

Introduction

In real-world applications, nominally high-dimensional data often resides on a lower-dimensional manifold. Due to this observation, one of the main topics in mathematical learning theory, see e.g. [1], [2], [3], and machine learning algorithms, see e.g. [4], [5], is the dimensionality reduction of a high-dimensional data set. To this end, it is assumed that there exists an m-dimensional manifold M embedded in Rd on which the data resides. The task of dimensionality reduction is to build an approximate representation of the low-dimensional manifold M from the available data in Rd and by that to obtain a description of the data in the new coordinate system TRm governed by the manifold.

While there is a vast amount of algorithms to achieve this, the class of generative algorithms in this context is quite small, see [5], [6]. Most algorithms result in a mapping P:RdT which allows to describe the available data in terms of the manifold coordinates. A generative approach however provides two maps P:RdT and f:TRd. This way, given a generative algorithm, we can project new data points from M into T by means of P, we can interpolate between two data points in the low-dimensional representation of the manifold and thus generate new, meaningful data on M by means of f.

While the principal component analysis (PCA) is the method of choice for linear manifolds M, the nonlinear case is more relevant for practical applications but also more involved. Here, the explicit determination of general manifolds M is quite complicated. To this end, generative methods usually assume that there exists a bijective map ζ:[0,1]mM, i.e. the manifold can be described by one chart. The generative algorithm then takes T=[0,1]m and approximates ζ by f. A generative model is of special interest when the high-dimensional data space Rd represents simulation results, for instance the coefficient vector of a finite element solution of a partial differential equation. Here, the comparison of original data and reconstructed surrogates of the d-dimensional simulations is inevitable for the quality control of the resulting mapping f.

Two well-known examples for nonlinear generative methods are the generative topographic mapping (GTM), see [7], and the regularized principal manifold learning (PML) algorithm, see [8]. Here, the question arises how the mappings f and P are computed. In the original GTM [7] a discretization of f is performed by a full tensor product grid approach. This naturally suffers from the curse of dimensionality, see [9], i.e. the case m>3 cannot be treated computationally. In the PML algorithm [8] a kernel-based ansatz centered in the data points is chosen for f. This relies on the representer theorem for reproducing kernel Hilbert spaces, see e.g. [4]. While this approach is well-suited if the number n of data is moderate (up to several hundred data points), a grid-based approach is more favorable for large n. However, as for the GTM, a full tensor product grid does not work in the case m>3.

To circumvent the curse of dimensionality at least to some extent, sparse grid approaches have been introduced for both the GTM, see [10], [11], and the PML, see [12]. With a sparse grid discretization it is possible to exploit higher regularity of ζ, i.e. if ζ has Sobolev regularity of bounded mixed smoothness we obtain almost the same approximation quality as in the full tensor product grid approach but with significantly lower computational complexity, see [13].

In this paper we will introduce an adaptive sparse grid PML approach which is an enhancement of [12]. To this end, we will present an alternating minimization scheme to solve the PML optimization problem over a generalized sparse grid to obtain f:[0,1]mRd. We suggest error indicators which rely on the so-called hierarchical surplus of a sparse grid basis function. These error indicators are then used to refine the underlying sparse grid space in spatial directions in which the current approximation of f varies the most. The alternating minimization and the refinement of the underlying sparse grid space are then iterated until a given computational complexity threshold of a maximum refinement level is reached.

We will apply our method to a high-dimensional data set of finite element simulations. This specific scenario is based on the virtual product development arising in typical engineering applications, in particular we consider the automotive industry. Here, as part of the development cycle for a new car model, the influence of design parameters on the crash behavior of a car is analyzed by an engineer with the help of numerical crash simulations [14]. Each simulation consists of approximately one million finite element nodes and up to several hundred snapshots in time, and is therefore very high-dimensional, i.e. approximately 108 (time × nodes). The resulting huge data bases of finite element simulations can be used for sophisticated data-driven analysis steps, see [15], [16]. For example, the detection of the number of different effects such as tearing or bending behavior in a set of crash simulation data is of special interest. As an example, we will study the deformation of several beams in the front part of the car with the help of our dimension-adaptive PML method.

The outline of this paper is as follows: In section 2 we will introduce the idea of generative dimensionality reduction in the most general setting followed by a more detailed view of two common generative approaches – the PCA and the PML methods – and how they fit into our general setting. In section 3 we will shortly review the concept of sparse grids and introduce an adaptive version of the sparse grid PML method. Section 4 deals with a state-of-the-art big data manifold learning problem which stems from current demands in the automotive industry. We apply our adaptive sparse grid PML algorithm to this problem and compare the results to the linear PCA method. Section 5 contains some concluding remarks.

Section snippets

Generative dimensionality reduction

Let us assume that we are given n data points {x1,,xn}Rd which have been drawn i.i.d. according to an unknown probability measure ρ on Rd which is absolutely continuous with respect to the Lebesgue measure. Furthermore, let ρ be supported on an unknown compact manifold MRd of dimension m. Since the treatment of general manifolds is very involved we impose an additional condition on the structure of M and assume that there exists a bijective map ζ:[0,1]mM, i.e. the compact manifold M can be

Discretized PML with adaptive sparse grids

As we showed in (9), the application of the representer theorem leads to a finite expansion of g in terms of the point evaluated kernel functions K(tj,t),j=1,,n. Then, it is easy to see that for fixed t1,,tn a minimizer of (8) solves the system of linear equations(K¯+λI)α=x with an n×n kernel block matrix given by K¯i,j=K(ti,tj)Rd×d. Here, I denotes the nd-dimensional identity matrix, α=(α1,,αn)T is the coefficient vector and x=(x1,,xn)T represents the data vector. Since K¯ is usually

Application of the dimension-adaptive PML

In this section we present results for the application of the PML method from section 3. We use the Hmix1-semi norm, i.e. we omit the l=0 term in the sum in (5), to regularize the dimension-adaptive sparse grid PML approach. We start with a toy problem: a two-dimensional S-shaped manifold embedded in three dimensions where we will discuss the performance of the adaptivity. Subsequently, we consider a state-of-the-art engineering problem where we aim to build a generative model for numerical

Conclusion

In this paper, we discussed the idea of generative dimensionality reduction for two examples: Principal Component Analysis (PCA), and Principal Manifold Learning (PML). We explained how the PML method can be discretized in terms of sparse grids and introduced a dimension-adaptive generalization of the algorithm in [12]. Furthermore, we pointed out the relation to the ANOVA decomposition of multivariate functions. As a typical field of application for a generative dimensionality reduction

Acknowledgements

The authors were partially supported by the Hausdorff Center for Mathematics in Bonn, by the Sonderforschungsbereich 1060 The Mathematics of Emergent Effects funded by the Deutsche Forschungsgemeinschaft and by the project SIMDATA-NL funded by the German Federal Ministry of Education and Research under the grant numbers 05M10AAB, 05M10KTD, and 05M10PDA. We cordially thank Rodrigo Iza-Teran for generating the car crash simulation data.

References (27)

  • B. Bohn et al.

    Analysis of car crash simulation data with nonlinear machine learning methods

  • D. Pflüger et al.

    Spatially adaptive sparse grids for high-dimensional data-driven problems

    J. Complex.

    (2010)
  • F. Cucker et al.

    Learning Theory

    (2007)
  • L. Györfi et al.

    A Distribution-Free Theory of Nonparametric Regression

    (2002)
  • V. Vapnik

    Statistical Learning Theory

    (1998)
  • B. Schölkopf et al.

    Learning with Kernels – Support Vector Machines, Regularization, Optimization, and Beyond

    (2002)
  • J. Lee et al.

    Nonlinear Dimensionality Reduction

    (2007)
  • S. Gerber et al.

    Dimensionality reduction and principal surfaces via kernel map manifolds

  • C. Bishop et al.

    GTM: the generative topographic mapping

    Neural Comput.

    (1998)
  • A. Smola et al.

    Regularized principal manifolds

    J. Mach. Learn. Res.

    (2001)
  • R. Bellman

    Adaptive Control Processes: A Guided Tour

    (1961)
  • M. Griebel et al.

    Dimensionality reduction of high-dimensional data with a nonlinear principal component aligned generative topographic mapping

    SIAM J. Sci. Comput.

    (2014)
  • M. Griebel et al.

    A sparse grid based generative topographic mapping for the dimensionality reduction of high-dimensional data

  • Cited by (17)

    • A parallel sparse grid construction algorithm based on the shared memory architecture and its application to flash calculations

      2019, Computers and Mathematics with Applications
      Citation Excerpt :

      Moreover, the relative errors decrease with the increase of the maximal layer. The application of sparse grids covers many fields such as machine learning [16,17], computational finance [18–20], fluid dynamics [21,22] and PDE [23,24]. Since our former research [8,12] is mainly on flash calculations in reservoir simulations, we will continue to study the topic in this work.

    • Sharp interface approaches and deep learning techniques for multiphase flows

      2019, Journal of Computational Physics
      Citation Excerpt :

      Several recent works have presented learning techniques that are broader or more foundational in their applications. For example, [63,62] describe a framework that uses learning techniques including information fusion, multi-level Gaussian process regression, and diffusion maps in order to perform fluid dynamics simulations that can scale better in parallel environments, that can be resilient against faulty hardware in a large cluster, and that can use provided data to improve simulation results. [13] integrates sparse grid techniques with the principal manifold learning algorithm in order to provide a more efficient generative dimensionality reduction technique, demonstrating their method on a finite element simulation of an automotive crash test.

    View all citing articles on Scopus
    View full text