Nonnegative tensor factorization of Poisson data
Background references in tensor computation with definitions, basic tensor math, and aspects of numerical algorithms include [
29‐
33]. This section summarizes the nonnegative tensor factorization of a data tensor
\({\mathbf{D }}\) of independent Poisson samples.
In contemporary numerical math, an \(n{\text{th}}\)-order tensor is an n-dimensional array of real numbers. In this terminology, a 1D vector is an order-1 tensor and an \(M_1\times M_2\) matrix is an order-2 tensor. If tensor \({\mathbf{D }}\) is an \(M_1\times M_2 \times \cdots \times M_n\) array, then its mode-1 index is the integer range \(1:M_1\), its mode-2 index is \(1:M_2\), ..., its mode-n index is \(1:M_n\).
Classic matrix computation is a starting point for developments in tensor computation [
30]. In particular, many aspects of
Singular Value Decomposition (SVD) of matrices generalize to
Higher-Order SVD (HOSVD) of
\(n\text{{th}}\)-order tensors [
29,
33]. Further results in tensor math concern factorizations for least-squares error (LSE); specifically, given tensor
\({\mathbf{D }}\), the tensor
\(\hat{\mathbf{D }}\) is computed in a designated factorized form to minimize the norm
\(||{\mathbf{D }} - \hat{\mathbf{D }}||_F\) [
32,
33]. (The Frobenius norm
\(||\mathbf{Y }||_F\) of tensor
\(\mathbf{Y }\) is the square root of the sum
\(\sum _i y_i^2\) over all the values
\(y_i\) in
\(\mathbf{Y }\) [
30]).
Implicit in using LSE for a rank-1 tensor factorization of
\({\mathbf{D }}\) is an assumption of Gaussian variation in the data [
24]. An alternative is to compute a Poisson model
\(\mathbf{M }\) for tensors of nonnegative counts which might be sparse [
24,
25]. This includes nonnegative arrays of radiation data
\({\mathbf{D }}\) which is Poisson in distribution.
Given order-
n tensor
\({\mathbf{D }}\) of nonnegative integer counts, the tensor
\(\mathbf{M }\) is computed to maximize the conditional Poisson likelihood
\(P({\mathbf{D }}|\mathbf{M })\).
\(\mathbf{M }\) is a tensor of nonnegative real numbers the same size as
\({\mathbf{D }}\). Letting index
i denote a 3D-location in
\({\mathbf{D }}\) and
\(x_i\) [or
\(m_i\)] the value in
\({\mathbf{D }}\) [or
\(\mathbf{M }\)] at
i, then the likelihood
\(P({\mathbf{D }}|\mathbf{M })\) is maximized by minimizing the function
$$\sum _i m_i - x_i \text{ log } m_i,$$
the negative log of
\(P({\mathbf{D }}|\mathbf{M })\) except for an additive constant that depends only on the dataset
\({\mathbf{D }}\). The set
\(\{m_i\}\) of local Poisson parameters in
\(\mathbf{M }\) optimizes
\(P({\mathbf{D }}|\mathbf{M })\) assuming independent samples of Poisson random variables in
\({\mathbf{D }}\).
The iterative computation developed in [
24,
25] computes the optimal tensor
\(\mathbf{M }\) in a rank-1 factorized form. For an
\(n\text{{th}}\)-order tensor
\({\mathbf{D }}\) and a user-specified number of components
R, the method computes scalors
\(\alpha _r\) for
\(1 \le r \le R\) and matrices
\(A^{(i)}\) of respective sizes
\(M_i\times R\) for
\(1 \le i \le n\); then the tensor
\(\mathbf{M }\) is
$$\mathbf{M } = \sum _{r=1}^R \alpha _r a_r^{(1)}\circ a_r^{(2)}\circ \cdots \circ a_r^{(n)}$$
where
\(a_r^{(i)}\) is the
\(r\text{th}\) column of matrix
\(A^{(i)}\) and
\(\circ\) is outer product. The outer product of
n matrices creates an
\(n\text{th}\)-order tensor.
Convergence of the iteration to a tensor
\(\mathbf{M }\) maximizing
\(P({\mathbf{D }}|\mathbf{M })\) has been proved for mild conditions on the nonzero values in
\({\mathbf{D }}\) (roughly, both the density of nonzero values and their spread with respect to the size of the component matrices
\(A^{(i)}\) must be adequate) [
24]. The implementation in MATLAB Tensor Toolbox version 2.6 [
34] is a version of the alternating Poisson regression in [
25]. We used default values of the control parameters (see [
34]) except that the stopping tolerance was 0.5e−03. Random number generation was reset to its default seed at the start of each run so that the iteration began with the same “randomized” initial tensor. As noted in “
Nonnegative tensor factorization: Poisson data” section, the number of components
\(R = 10\) for these datasets is an empirical value where, for
\(R > 10\), the computed tensor
\(\mathbf{M }\) has relatively insignificant changes in numerical values.
Tensor factorization reveals multilinear relationships in a multidimensional numerical dataset. There are well-documented numerical methods to maximize conditional Poisson likelihood without tensor math (cf. [
21‐
23,
38,
40]), but a mathematical model of the source-to-detector projection of data is required.
Phase congruency in 2D grids
Phase congruency in 2D finds boundaries between subgrids of higher mean values and adjacent neighborhoods of lower mean values, or vice versa. When the grid is sample mean values from a spatial Poisson process like dataset
\({\mathbf{D }}\) or from an array of Poisson parameters like computed tensor
\(\mathbf{M }\), the boundaries delineate clusters of samples
iid at higher Poisson intensities, or potential hot spots, embedded in background samples
iid at a lower Poisson intensity. See [
26‐
28] for details of phase congruency in 2D using wavelets. This section is a high level description and includes the control parameter settings recommended in the literature [
26].
The input matrix is an
\(X\times Y\) grid. For each grid location, and each scale and orientation, the initial energy is found by convolving the 2D array with even-odd quadrature filters, then the results are processed in these steps:
1.
Randomness in phase due to noise is estimated. Noise-suppression parameter \(k=2\) indirectly sets a reference for noise in phase that is subtracted locally.
2.
To insure that the spread of frequencies is adequate, a sigmoid weighting function penalizes too few in-phase frequencies. Sigmoid function parameter \(c=0.5\) is the cut-off below which deemphasis kicks in, and parameter \(g=10\) controls the sharpness of the function.
3.
The adjusted energies are summed over all filter orientations and divided by the sum of the response amplitudes of the wavelets over all orientations and scales. This paper uses 4 wavelet scales and 6 filter orientations in 2D. To limit the spatial extent of the phase analysis in 2D, frequencies with wavelengths larger than \(\lambda _{min}=3\) are suppressed.
The maximum and minimum moments of phase congruency covariance are
\(X\times Y\) matrices of phase information from all scales and orientations [
28]. These matrices are computed in a suite of downloadable MATLAB programs [
26] and their average is used in this paper. Phase congruency has been implemented for 3D grids [
48] like dataset
\({\mathbf{D }}\); however, the 3D computation and visualization are more complicated than 2D, and a preliminary evaluation did not show substantial changes in the outcomes for the 3D datasets in this paper.