A graph-based N-body approximation with application to stochastic neighbor embedding

doi:10.1016/j.neunet.2015.11.007

Neural Networks

Volume 75, March 2016, Pages 1-11

https://doi.org/10.1016/j.neunet.2015.11.007 Get rights and content

Abstract

We propose a novel approximation technique, bubble approximation (BA), for repulsion forces in an N-body problem, where attraction has a limited range and repulsion acts between all points. These kinds of systems occur frequently in dimension reduction and graph drawing. Like tree codes, the established N-body approximation method, BA replaces several point-to-point computations by one area-to-point computation. Novelty of BA is to consider not only the magnitudes but also the directions of forces from the area. Therefore, its area-to-point approximations are applicable anywhere in the space. The joint effect of forces from inside the area is calculated analytically, assuming a homogeneous mass of points inside the area. These two features free BA from hierarchical data structures and complicated bookkeeping of interactions, which plague tree codes. Instead, BA uses a simple graph to control the computations. The graph provides a sparse matrix, which, suitably weighted, replaces the full matrix of pairwise comparisons in the N-body problem. As a concrete example, we implement a sparse-matrix version of stochastic neighbor embedding (a dimension reduction method), and demonstrate its good performance by comparisons to full-matrix method, and to three different approximate versions of the same method.

Introduction

Problems defined on a matrix of pairwise comparisons, so called N-body problems (Gray & Moore, 2001), are ubiquitous in machine learning. Solving them on large data sets is challenging, because memory requirements and running times scale quadratically with the number of data points $n$ .

One commonly occurring N-body problem is to find coordinates $y_{r}$ and $y_{c}$ that minimize a force system $F = \sum_{r = 1}^{n} (A_{r} - R_{r}),$ $A_{r} = \sum_{{c | (r, c) \in L}} g (‖ y_{r} - y_{c} ‖) (y_{r} - y_{c}),$ $R_{r} = \sum_{c = 1}^{n} f (‖ y_{r} - y_{c} ‖) (y_{r} - y_{c}) .$ A point at coordinates $y_{r}$ experiences attractive forces $g$ from a limited set of points (collected in $L$ ) and repulsive forces $f$ from all points. Typical examples occur in force systems of graph drawing methods (Eades, 1984, Fruchterman and Reingold, 1991, Hu, 2006, Noack, 2007, Quigley and Eades, 2001, Walshaw, 2001), where edges connect a node to a limited set of other nodes, and in gradients of cost functions in dimension reduction (Carreira-Perpiñán, 2010, Hinton and Roweis, 2003, Lee and Verleysen, 2009b, van der Maaten and Hinton, 2008, Yang et al., 2009), where the attractive forces outside of a local neighborhood are small enough to be ignored.

The set $L$ of local relationships is small, so the attractive forces can be evaluated exactly. Straightforward evaluation of the repulsion term Eq. (3), on the contrary, would lead to quadratic scaling with $n$ . We propose a novel type of approximation, bubble approximation (BA), for computing the repulsions. It is applicable in situations, where the overall computation is organized as in Eq. (1). In this work we derive formulas for the two-dimensional $y_{r}$ . We demonstrate the practical usefulness of BA by showing how the cost function gradient of stochastic neighbor embedding (SNE) (Hinton and Roweis, 2003, van der Maaten and Hinton, 2008), a much-used nonlinear dimension reduction method, can be approximated with BA.

BA replaces several point-to-point computations by one area-to-point computation, as do tree codes (Appel, 1985, Barnes and Hut, 1986, Callahan and Kosaraju, 1995, Gray and Moore, 2003, Greengard and Rokhlin, 1987, Greengard and Strain, 1991, Yang et al., 2003), the established N-body approximation technique. Tree codes use hierarchical data structures to manage space: to determine, when an area is far enough (compared to its size) from a point to qualify for approximation, and to find the points $y_{r}$ to be replaced. BA considers not only the magnitudes but also the directions of forces from the area. Therefore, its area-to-point approximations are applicable for any areas, not only for those far away. The joint effect of forces from inside the area is calculated analytically, assuming homogeneous mass of points inside the area, so bookkeeping over individual points is not needed.

Because of these two features, BA does not need hierarchical division of space, but uses a simple graph to control its computations. The graph is turned into a sparse matrix, which replaces the full matrix of pairwise comparisons in the computations. Randomly picked global links (G-links) of the graph encode the area-to-point relationships. As global links represent forces from several points, they give rise to weighted entries. The graph is complemented by local links (L-links). They cover the local regions where the attractive forces are strong, and where exact point-to-point interactions are used.

Resembling divisions in local and global graph links have been used to improve quality of graph embeddings (Andersen, Chung, & Lu, 2007), as heuristics to reduce the amount of computation in dimension reduction (Chalmers, 1996, Parviainen, 2011), and to introduce long-range interactions to speed up convergence of image processing algorithms (Grady & Schwarz, 2004).

We start by introducing the general idea of BA in Section 2. The main contribution of the work is Section 3, where formulas for the bubble approximation are derived. In Section 4 we apply BA to build a large-scale dimension reduction method sparse t-SNE. Section 5 introduces some earlier approximations that serve as comparison methods. Experiments in Section 6 establish the good performance of sparse t-SNE in comparison to the original method, which uses exact computation, and to three earlier approximation methods. Some limitations and directions for future development of BA are discussed in Section 7.

Section snippets

Bubble approximation

In this section we give an overall view of approximating the repulsive force $R_{r}$ in Eq. (3). We define the sets of local and global links, and discuss their respective roles in BA.

Fig. 1 illustrates the type of area-to-point interaction we consider here. The interaction happens between a spherical bubble and a point. Forces from points that fall inside the bubble B all act on A, but their directions differ. Their resultant gives the expected force which a point from B exerts on A. Instead of

Derivation of the expected force from a bubble

In this section we derive in detail the expected force $I_{r c}$ , which a point inside a $y_{c}$ -centered bubble exerts on point $y_{r}$ . The derivation is for two-dimensional $y_{r}$ . For readability, we mostly omit subscripts $r$ and $c$ and other explicit references to the points.

Sparse t-SNE

In this section we give a practical example of how to apply the bubble approximation. We implement a sparse-matrix version of dimension reduction method SNE (Hinton & Roweis, 2003) (stochastic neighbor embedding). Sparse t-SNE is based on an SNE-variant called t-distributed SNE (van der Maaten & Hinton, 2008). The idea of sparse t-SNE was first presented in Parviainen (2011), but the version developed here is so different that this work should be considered a replacement rather than extension

Other approximations

Tree codes have in recent years entered the field of dimension reduction, and, due to their good performance, can be considered the state-of-the-art of approximate nonlinear dimension reduction. Besides tree codes, approximate large-scale dimension reduction has been done with landmark methods and methods that learn continuous functions. In this section we discuss these three types of approximations, concentrating on their use with SNE but including references to some earlier use in dimension

Experiments

In this section we compare sparse t-SNE to the original method (referred to as full t-SNE), and to three other types of approximate SNE: a tree code method, a neural network trained on a subsample of data points, and a landmark approximation. We also study the effect of changing $G$ , and create a series of embeddings with growing $n$ to guide us in choosing dependence of $G$ on $n$ .

Discussion and conclusions

We presented a novel approximation method, bubble approximation, for N-body problems that involve minimization of a system of attractive and repulsive forces. As an example of applying the approximation, we implemented a large-scale dimension reduction method, sparse t-SNE, verified that it can produce as good visualizations as the full method, and compared its performance to three other approximation methods: BH-SNE, landmarks, and a neural network approximation. Of these, sparse t-SNE and

References (57)

D. de Ridder et al.
Sammon’s mapping using neural networks: A comparison
Pattern Recognition Letters
(1997)
L. Greengard et al.
A fast algorithm for particle simulations
Journal of Computational Physics
(1987)
H. Khosravi et al.
Introducing a very large dataset of handwritten Farsi digits and a study on their varieties
Pattern Recognition Letters
(2007)
J.A. Lee et al.
Quality assessment of dimensionality reduction: Rank-based criteria
Neurocomputing
(2009)
J.K. Salmon et al.
Skeletons from the treecode closet
Journal of Computational Physics
(1994)
D.K. Agrafiotis et al.
Nonlinear mapping networks
Journal of Chemical Information and Computer Sciences
(2000)
R. Andersen et al.
Drawing power law graphs using a local/global decomposition
Algorithmica
(2007)
A.W. Appel
An efficient program for many-body simulation
SIAM Journal of Scientific and Statistical Computing
(1985)
Bae, S.-H., Choi, J.Y., Qiu, J., & Fox, G.C. (2010). Dimension reduction and visualization of large high-dimensional...
J. Barnes et al.
A hierarchical $O (N log N)$ force-calculation algorithm
Nature
(1986)

G. Biswas et al.

Evaluation of projection algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1981)

K. Bunte et al.

A general framework for dimensionality-reducing data visualization mapping

Neural Computation

(2012)

P.B. Callahan et al.

A decomposition of multidimensional point sets with applications to $k$ -nearest-neighbors and $n$ -body potential fields

Journal of the Association for Computing Machinery

(1995)

Carreira-Perpiñán, M.Á (2010). The elastic embedding algorithm for dimensionality reduction. In Proc. of ICML, vol. 10...

Chalmers, M. (1996). A linear iteration time layout algorithm for visualising high-dimensional data. In Proc. of IEEE...

C.L. Chang et al.

A heuristic relaxation method for nonlinear mapping in cluster analysis

IEEE Transactions on Systems, Man and Cybernetics (Part B)

(1973)

de Freitas, N., Wang, Y., Mahdaviani, M., & Lang, D. (2005). Fast Krylov methods for N-body learning. In Proc. of NIPS...

V. de Silva et al.

Sparse multidimensional scaling using landmark points, Tech. rep.

(2004)

P. Eades

A heuristic for graph drawing

Congressus Numerantium

(1984)

Faloutsos, C., & Lin, K.-I. (1995). FastMap: a fast algorithm for indexing, data-mining and visualization of...

T.M.J. Fruchterman et al.

Graph drawing by force-directed placement

Software: Practice and Experience

(1991)

E.R. Gansner et al.

A maxent-stress model for graph layout

IEEE Transactions on Visualization and Computer Graphics

(2013)

A.S. Georghiades et al.

From few to many: Illumination cone models for face recognition under variable lighting and pose

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2001)

Gisbrecht, A., Mokbel, B., & Hammer, B. (2012). Linear basis-function t-SNE for fast nonlinear dimensionality...

I.S. Gradshteyn et al.

Table of integrals, series and products

(2007)

Grady, L., & Schwarz, E.L. (2004). Faster graph-theoretic image processing via small-world and quadtree topologies. In...

Gray, A.G., & Moore, A.W. (2001). N-body’ problems in statistical learning. In Proc. of NIPS, vol. 4, (pp....

Gray, A.G., & Moore, A.W. (2003). Nonparametric density estimation: toward computational tractability. In Proc. of SDM...

Cited by (6)

A novel CC-tSNE-SVR model for rapid determination of diesel fuel quality by near infrared spectroscopy
2020, Infrared Physics and Technology
Citation Excerpt :
At this time, the introduction of manifold learning dimension reduction algorithm can achieve twice the result with half the effort. In the area of machine learning, various dimensionality reduction techniques are proposed for nonlinear processes, such as kernel principal component analysis (KPCA) [22], isometric feature mapping (Isomap) [23], stochastic neighbor embedding (SNE) [24] and so on. T-distributed stochastic neighbor embedding (tSNE) is considered to be a very popular technique for manifold dimensionality reduction, which has been widely proved in various fields [25,26].
It is particularly important to test the quality of diesel oil. In order to improve the reliability and rapidity of the determination, a novel CC-tSNE-SVR model combined with near infrared spectroscopy (NIRS) was proposed for simultaneous determination of diesel multi parameter properties (density, viscosity and freezing point). The model was established by combining the method of correlation coefficient (CC), t-distributed stochastic neighbor embedding (tSNE) and support vector regression (SVR), which achieved the combination of wavelength selection algorithm and manifold learning technology. In order to test the validity of the method, several other models based on the same diesel data set of NIRS were compared. As a result, the CC-tSNE-SVR model presented in this paper not only had the best root mean square error of prediction (RMSE) and determinant coefficient (R²) in the analysis of diesel properties, but also had the fastest execution speed. Therefore, the combination of NIRS and the proposed CC-tSNE-SVR model open up a new way for the rapid and accurate monitoring of diesel quality.
Drawing clustered graphs by preserving neighborhoods
2017, Pattern Recognition Letters
Citation Excerpt :
Both time and memory requirements of t-SNE scale quadratically with |V|, so for large graphs we need an approximation. Sparse SNE [31] was developed for vector data, but uses a graph as its internal representation, so it is straightforward to modify it for graphs. Other large-scale t-SNE variants have been created with tree-codes [32,33].
Weighted graphs with presumed cluster structure are challenging to many existing graph drawing methods, even though ways of visualizing such graphs would be much needed in complex networks research. In the field of dimension reduction, t-distibuted stochastic neighbor embedding (t-SNE) has proven successful in visualizing clustered data. Here, we extend t-SNE into graph-SNE (GSNE). Our method builds on the sensitivity of random walks to cluster structure in graphs. We use random walks to define a neighborhood probability that realizes the properties behind the success of t-SNE in visualizing clustered data sets: Gaussian-like behavior of neighborhood probabilities, adaptation to local edge density, and an adjustable granularity scale. We show that GSNE correctly visualizes artificial graphs where ground-truth cluster structure is known. Using real-world networks, we show that GSNE is able to produce meaningful visualizations that display plausible cluster structure which is not captured by state-of-the-art visualization methods.
ivga: A fast force-directed method for interactive visualization of complex networks
2017, Journal of Computational Science
Citation Excerpt :
In LargeVis, unlike in rn-mds, before mapping to 2D the high-dimensional data are represented by knn (k-nearest neighbors) graph. Then the graph is visualized by employing principled probabilistic model [8], which is an approximation of the stochastic neighbor embedding (SNE) approach used for high-dimensional data visualization [1,31–34]. The cost function is defined as the divergence (such as the Kullback-Leibrer divergence) between the probability distribution functions of the neighbors in V and v.
Complex networks play a very important role in various fields of science as data structures, which aggregate information about mutual relationships between numerous objects. The structural properties of these large graphs can be scrutinized throughout their interactive visualization. However, visual analysis of complex networks consisting of |V| ∼ 10⁶⁺ vertices represents a great challenge for nowadays computer systems both from computational and storage perspective. Therefore, the existing graph drawing methods involving greater than O(|V|) time and space complexity cannot be regarded as promising tools in the advent of the Big Data era. We present here a new and very fast graph drawing method with O(|V|) time and space complexity – ivga (interactive visualization of graphs). We evaluate its usefulness and performance by testing ivga on the large complex networks from the Stanford Large Network Dataset Collection. We demonstrate that ivga allows for very fast interactive visualization of large graphs consisting of up to a few million vertices on a regular laptop what makes it very competitive to other state-of-art graph drawing methods. Particularly, we recommend ivga method for interactive visualization of large non-planar complex networks such as small-world and scale-free social networks. The main concept of ivga can be seriously considered in developing tools for visualization and analysis of really huge networks, with billions of vertices and edges, on Big Data systems.
Software defect prediction based on stacked contractive autoencoder and multi-objective optimization
2020, Computers, Materials and Continua
Symmetric generative methods and tSNE: A short survey
2018, VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
t-distributed Stochastic Neighbor Embedding with inhomogeneous degrees of freedom
2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View full text

A graph-based N-body approximation with application to stochastic neighbor embedding

Abstract

Introduction

Section snippets

Bubble approximation

Derivation of the expected force from a bubble

Sparse t-SNE

Other approximations

Experiments

Discussion and conclusions

Pattern Recognition Letters

Journal of Computational Physics

Pattern Recognition Letters

Neurocomputing

Journal of Computational Physics

Nonlinear mapping networks

Journal of Chemical Information and Computer Sciences

Drawing power law graphs using a local/global decomposition

Algorithmica

An efficient program for many-body simulation

SIAM Journal of Scientific and Statistical Computing

A hierarchical O(NlogN) force-calculation algorithm

Nature

Evaluation of projection algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence

A general framework for dimensionality-reducing data visualization mapping

Neural Computation

A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields

Journal of the Association for Computing Machinery

A heuristic relaxation method for nonlinear mapping in cluster analysis

IEEE Transactions on Systems, Man and Cybernetics (Part B)

Sparse multidimensional scaling using landmark points, Tech. rep.

A heuristic for graph drawing

Congressus Numerantium

Graph drawing by force-directed placement

Software: Practice and Experience

A maxent-stress model for graph layout

IEEE Transactions on Visualization and Computer Graphics

From few to many: Illumination cone models for face recognition under variable lighting and pose

IEEE Transactions on Pattern Analysis and Machine Intelligence

Table of integrals, series and products

A hierarchical $O (N log N)$ force-calculation algorithm

A decomposition of multidimensional point sets with applications to $k$ -nearest-neighbors and $n$ -body potential fields