Elsevier

Neural Networks

Volume 75, March 2016, Pages 1-11
Neural Networks

A graph-based N-body approximation with application to stochastic neighbor embedding

https://doi.org/10.1016/j.neunet.2015.11.007Get rights and content

Abstract

We propose a novel approximation technique, bubble approximation (BA), for repulsion forces in an N-body problem, where attraction has a limited range and repulsion acts between all points. These kinds of systems occur frequently in dimension reduction and graph drawing. Like tree codes, the established N-body approximation method, BA replaces several point-to-point computations by one area-to-point computation. Novelty of BA is to consider not only the magnitudes but also the directions of forces from the area. Therefore, its area-to-point approximations are applicable anywhere in the space. The joint effect of forces from inside the area is calculated analytically, assuming a homogeneous mass of points inside the area. These two features free BA from hierarchical data structures and complicated bookkeeping of interactions, which plague tree codes. Instead, BA uses a simple graph to control the computations. The graph provides a sparse matrix, which, suitably weighted, replaces the full matrix of pairwise comparisons in the N-body problem. As a concrete example, we implement a sparse-matrix version of stochastic neighbor embedding (a dimension reduction method), and demonstrate its good performance by comparisons to full-matrix method, and to three different approximate versions of the same method.

Introduction

Problems defined on a matrix of pairwise comparisons, so called N-body problems (Gray & Moore, 2001), are ubiquitous in machine learning. Solving them on large data sets is challenging, because memory requirements and running times scale quadratically with the number of data points n.

One commonly occurring N-body problem is to find coordinates yr and yc that minimize a force system F=r=1n(ArRr),Ar={c|(r,c)L}g(yryc)(yryc),Rr=c=1nf(yryc)(yryc). A point at coordinates yr experiences attractive forces g from a limited set of points (collected in L) and repulsive forces f from all points. Typical examples occur in force systems of graph drawing methods (Eades, 1984, Fruchterman and Reingold, 1991, Hu, 2006, Noack, 2007, Quigley and Eades, 2001, Walshaw, 2001), where edges connect a node to a limited set of other nodes, and in gradients of cost functions in dimension reduction (Carreira-Perpiñán, 2010, Hinton and Roweis, 2003, Lee and Verleysen, 2009b, van der Maaten and Hinton, 2008, Yang et al., 2009), where the attractive forces outside of a local neighborhood are small enough to be ignored.

The set L  of local relationships is small, so the attractive forces can be evaluated exactly. Straightforward evaluation of the repulsion term Eq. (3), on the contrary, would lead to quadratic scaling with n. We propose a novel type of approximation, bubble approximation (BA), for computing the repulsions. It is applicable in situations, where the overall computation is organized as in Eq. (1). In this work we derive formulas for the two-dimensional yr. We demonstrate the practical usefulness of BA by showing how the cost function gradient of stochastic neighbor embedding (SNE) (Hinton and Roweis, 2003, van der Maaten and Hinton, 2008), a much-used nonlinear dimension reduction method, can be approximated with BA.

BA replaces several point-to-point computations by one area-to-point computation, as do tree codes (Appel, 1985, Barnes and Hut, 1986, Callahan and Kosaraju, 1995, Gray and Moore, 2003, Greengard and Rokhlin, 1987, Greengard and Strain, 1991, Yang et al., 2003), the established N-body approximation technique. Tree codes use hierarchical data structures to manage space: to determine, when an area is far enough (compared to its size) from a point to qualify for approximation, and to find the points yr to be replaced. BA considers not only the magnitudes but also the directions of forces from the area. Therefore, its area-to-point approximations are applicable for any areas, not only for those far away. The joint effect of forces from inside the area is calculated analytically, assuming homogeneous mass of points inside the area, so bookkeeping over individual points is not needed.

Because of these two features, BA does not need hierarchical division of space, but uses a simple graph to control its computations. The graph is turned into a sparse matrix, which replaces the full matrix of pairwise comparisons in the computations. Randomly picked global links (G-links) of the graph encode the area-to-point relationships. As global links represent forces from several points, they give rise to weighted entries. The graph is complemented by local links (L-links). They cover the local regions where the attractive forces are strong, and where exact point-to-point interactions are used.

Resembling divisions in local and global graph links have been used to improve quality of graph embeddings (Andersen, Chung, & Lu, 2007), as heuristics to reduce the amount of computation in dimension reduction (Chalmers, 1996, Parviainen, 2011), and to introduce long-range interactions to speed up convergence of image processing algorithms (Grady & Schwarz, 2004).

We start by introducing the general idea of BA in Section  2. The main contribution of the work is Section  3, where formulas for the bubble approximation are derived. In Section  4 we apply BA to build a large-scale dimension reduction method sparse t-SNE. Section  5 introduces some earlier approximations that serve as comparison methods. Experiments in Section  6 establish the good performance of sparse t-SNE in comparison to the original method, which uses exact computation, and to three earlier approximation methods. Some limitations and directions for future development of BA are discussed in Section  7.

Section snippets

Bubble approximation

In this section we give an overall view of approximating the repulsive force Rr in Eq. (3). We define the sets of local and global links, and discuss their respective roles in BA.

Fig. 1 illustrates the type of area-to-point interaction we consider here. The interaction happens between a spherical bubble and a point. Forces from points that fall inside the bubble B all act on A, but their directions differ. Their resultant gives the expected force which a point from B exerts on A. Instead of

Derivation of the expected force from a bubble

In this section we derive in detail the expected force Irc, which a point inside a yc-centered bubble exerts on point yr. The derivation is for two-dimensional yr. For readability, we mostly omit subscripts r  and c  and other explicit references to the points.

Sparse t-SNE

In this section we give a practical example of how to apply the bubble approximation. We implement a sparse-matrix version of dimension reduction method SNE (Hinton & Roweis, 2003) (stochastic neighbor embedding). Sparse t-SNE is based on an SNE-variant called t-distributed SNE (van der Maaten & Hinton, 2008). The idea of sparse t-SNE was first presented in Parviainen (2011), but the version developed here is so different that this work should be considered a replacement rather than extension

Other approximations

Tree codes have in recent years entered the field of dimension reduction, and, due to their good performance, can be considered the state-of-the-art of approximate nonlinear dimension reduction. Besides tree codes, approximate large-scale dimension reduction has been done with landmark methods and methods that learn continuous functions. In this section we discuss these three types of approximations, concentrating on their use with SNE but including references to some earlier use in dimension

Experiments

In this section we compare sparse t-SNE to the original method (referred to as full t-SNE), and to three other types of approximate SNE: a tree code method, a neural network trained on a subsample of data points, and a landmark approximation. We also study the effect of changing G, and create a series of embeddings with growing n  to guide us in choosing dependence of G  on n.

Discussion and conclusions

We presented a novel approximation method, bubble approximation, for N-body problems that involve minimization of a system of attractive and repulsive forces. As an example of applying the approximation, we implemented a large-scale dimension reduction method, sparse t-SNE, verified that it can produce as good visualizations as the full method, and compared its performance to three other approximation methods: BH-SNE, landmarks, and a neural network approximation. Of these, sparse t-SNE and

References (57)

  • G. Biswas et al.

    Evaluation of projection algorithms

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1981)
  • K. Bunte et al.

    A general framework for dimensionality-reducing data visualization mapping

    Neural Computation

    (2012)
  • P.B. Callahan et al.

    A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields

    Journal of the Association for Computing Machinery

    (1995)
  • Carreira-Perpiñán, M.Á (2010). The elastic embedding algorithm for dimensionality reduction. In Proc. of ICML, vol. 10...
  • Chalmers, M. (1996). A linear iteration time layout algorithm for visualising high-dimensional data. In Proc. of IEEE...
  • C.L. Chang et al.

    A heuristic relaxation method for nonlinear mapping in cluster analysis

    IEEE Transactions on Systems, Man and Cybernetics (Part B)

    (1973)
  • de Freitas, N., Wang, Y., Mahdaviani, M., & Lang, D. (2005). Fast Krylov methods for N-body learning. In Proc. of NIPS...
  • V. de Silva et al.

    Sparse multidimensional scaling using landmark points, Tech. rep.

    (2004)
  • P. Eades

    A heuristic for graph drawing

    Congressus Numerantium

    (1984)
  • Faloutsos, C., & Lin, K.-I. (1995). FastMap: a fast algorithm for indexing, data-mining and visualization of...
  • T.M.J. Fruchterman et al.

    Graph drawing by force-directed placement

    Software: Practice and Experience

    (1991)
  • E.R. Gansner et al.

    A maxent-stress model for graph layout

    IEEE Transactions on Visualization and Computer Graphics

    (2013)
  • A.S. Georghiades et al.

    From few to many: Illumination cone models for face recognition under variable lighting and pose

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • Gisbrecht, A., Mokbel, B., & Hammer, B. (2012). Linear basis-function t-SNE for fast nonlinear dimensionality...
  • I.S. Gradshteyn et al.

    Table of integrals, series and products

    (2007)
  • Grady, L., & Schwarz, E.L. (2004). Faster graph-theoretic image processing via small-world and quadtree topologies. In...
  • Gray, A.G., & Moore, A.W. (2001). N-body’ problems in statistical learning. In Proc. of NIPS, vol. 4, (pp....
  • Gray, A.G., & Moore, A.W. (2003). Nonparametric density estimation: toward computational tractability. In Proc. of SDM...
  • Cited by (6)

    • A novel CC-tSNE-SVR model for rapid determination of diesel fuel quality by near infrared spectroscopy

      2020, Infrared Physics and Technology
      Citation Excerpt :

      At this time, the introduction of manifold learning dimension reduction algorithm can achieve twice the result with half the effort. In the area of machine learning, various dimensionality reduction techniques are proposed for nonlinear processes, such as kernel principal component analysis (KPCA) [22], isometric feature mapping (Isomap) [23], stochastic neighbor embedding (SNE) [24] and so on. T-distributed stochastic neighbor embedding (tSNE) is considered to be a very popular technique for manifold dimensionality reduction, which has been widely proved in various fields [25,26].

    • Drawing clustered graphs by preserving neighborhoods

      2017, Pattern Recognition Letters
      Citation Excerpt :

      Both time and memory requirements of t-SNE scale quadratically with |V|, so for large graphs we need an approximation. Sparse SNE [31] was developed for vector data, but uses a graph as its internal representation, so it is straightforward to modify it for graphs. Other large-scale t-SNE variants have been created with tree-codes [32,33].

    • ivga: A fast force-directed method for interactive visualization of complex networks

      2017, Journal of Computational Science
      Citation Excerpt :

      In LargeVis, unlike in rn-mds, before mapping to 2D the high-dimensional data are represented by knn (k-nearest neighbors) graph. Then the graph is visualized by employing principled probabilistic model [8], which is an approximation of the stochastic neighbor embedding (SNE) approach used for high-dimensional data visualization [1,31–34]. The cost function is defined as the divergence (such as the Kullback-Leibrer divergence) between the probability distribution functions of the neighbors in V and v.

    • Symmetric generative methods and tSNE: A short survey

      2018, VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
    • t-distributed Stochastic Neighbor Embedding with inhomogeneous degrees of freedom

      2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View full text