Recent developments in CANDECOMP/PARAFAC algorithms: a critical review

https://doi.org/10.1016/S0169-7439(02)00089-8Get rights and content

Abstract

Several recently proposed algorithms for fitting the PARAFAC model are investigated and compared to more established alternatives. Alternating least squares (ALS), direct trilinear decomposition (DTLD), alternating trilinear decomposition (ATLD), self-weighted alternating trilinear decomposition (SWATLD), pseudo alternating least squares (PALS), alternating coupled vectors resolution (ACOVER), alternating slice-wise diagonalization (ASD) and alternating coupled matrices resolution (ACOMAR) are compared on both simulated and real data. For the recent algorithms, only unconstrained three-way models can be fitted. In contrast, for example, ALS allows modeling of higher-order data, as well as incorporating constraints on the parameters and handling of missing data. Nevertheless, for three-way data, the newer algorithms are interesting alternatives. It is found that the ALS estimated models are generally of a better quality than any of the alternatives even when overfactoring the model, but it is also found that ALS is significantly slower. Based on the results (in particular the poor performance of DTLD), it is advised that (a slightly modified) ASD may be a good alternative to ALS when a faster algorithm is desired.

Introduction

The parallel factor analysis model (PARAFAC) is being increasingly used in chemometrics [1], [2], [3], [4], [5], [6], [7]. PARAFAC is used for curve-resolution and also for quantitative analysis under the name second-order calibration. Second-order calibration is often performed using the algorithm generalized rank annihilation method (GRAM) for fitting the PARAFAC model [8], [9], [10], [11], [12].

In recent years, several new algorithms for fitting PARAFAC models have appeared in the literature. Even though many of them have been developed partly by the same researchers, they have not been compared with the prior algorithms. Consequently, an overview of the relative merits of these new algorithms is lacking, which is highly undesirable. In this paper, an extensive comparison is made leading to suggestions as to which algorithms to use in practical analysis.

The remainder of the paper is organized as follows. First, the PARAFAC model is shortly described and some essential aspects of the algorithms are provided. No detailed description of the algorithms will be given in this paper because these are available in the literature. Considerable attention, however, will be paid to misunderstandings that are used as a basis for some of the algorithms. A detailed study on simulated data is then conducted and the results of this study are compared to applications on two measured data sets. The focus in the comparison is on the quality of the solution in terms of estimated parameters, in terms of quantitative calibration results and in terms of robustness against over-factoring the model.

Section snippets

The CANDECOMP/PARAFAC trilinear model

The PARAFAC model was developed in 1970. It was suggested independently by Carroll and Chang [13] under the name CANDECOMP (canonical decomposition) and by Harshman [14] under the name PARAFAC (parallel factor analysis). In the following, the model will be termed PARAFAC as is common in the chemometric literature. The PARAFAC model has gained increasing attention in chemometrics due to (i) its structural resemblance with many physical models of common instrumental data and (ii) its uniqueness

Calculations

Calculations are performed using Matlab (Mathworks, Natick, MA). The code for DTLD and ALS is taken from the N-way Toolbox [29]. The function for ACOVER is adapted from Jiang et al. [38]. ATLD, ASD, ACOMAR, SWATLD and PALS are performed using in-house functions. The following alterations have been found to be necessary:

  • 1.

    Using the general recipe outlined in Appendix A, improved updating formulas are derived for ACOMAR (Appendix B) and PALS (Appendix C).

  • 2.

    The original implementation of ASD exhibits

Simulated data set I

First, the relative performance of the competing methods is assessed when selecting the correct model dimensionality. It is expected that ALS, which involves the pseudo-inverses of Khatri-Rao products (see Table 1), yields more stable results than the alternatives, because they all work in one way or another with pseudo-inverses of X and Y (DTLD, PALS, ACOVER, ASD and ACOMAR) or X, Y and Z (ATLD and SWATLD). This is confirmed by the median MSE-values (Table 6). To facilitate the interpretation,

Conclusions

A thorough investigation of several algorithms has revealed a number of important aspects:

  • ALS does not seem to suffer as much as recently claimed from over-factoring.

  • DTLD, while being fast, is a surprisingly poor method for fitting the PARAFAC model.

  • All the recent algorithms for PARAFAC described here suffer from lack of proper understanding of their working.

  • However, while none of them turn out to perform better than ALS with respect to the quality of the estimated model and parameters, they

Acknowledgements

R. Bro acknowledges support provided by STVF (Danish Research Council) through project 1179, as well as the EU-project Project GRD1-1999-10337, NWAYQUAL.

References (61)

  • S. Bijlsma et al.

    Estimating rate constants and pure UV–VIS spectra of a two-step reaction using trilinear models

    J. Chemom.

    (1999)
  • W.P. Gardner et al.

    Application of quantitative chemometric analysis techniques to direct sampling mass spectrometry

    Anal. Chem.

    (2001)
  • R.D. Jiji et al.

    Application of PARAFAC for calibration with excitation–emission matrix fluorescence spectra of three classes of environmental pollutants

    J. Chemom.

    (2000)
  • A. Marcos et al.

    Application of a multi-way method to study long-term stability in ICP-AES

    J. Anal. At. Spectrom.

    (2001)
  • K.S. Booksh et al.

    A 2nd-order standard addition method with application to calibration of a kinetics-spectroscopic sensor for quantitation of trichloroethylene

    J. Chemom.

    (1995)
  • N.M. Faber et al.

    Generalized rank annihilation method: I. Derivation of eigenvalue problems

    J. Chemom.

    (1994)
  • N.M. Faber et al.

    Generalized rank annihilation method: II. Bias and variance in the estimated eigenvalues

    J. Chemom.

    (1994)
  • N.M. Faber et al.

    Generalized rank annihilation: III. Practical implementation

    J. Chemom.

    (1994)
  • E. Sanchez et al.

    Generalized rank annihilation factor analysis

    Anal. Chem.

    (1986)
  • J.D. Carroll et al.

    Analysis of individual differences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition

    Psychometrika

    (1970)
  • R.A. Harshman

    Foundations of the PARAFAC procedure: models and conditions for an ‘explanatory’ multi-modal factor analysis

    UCLA Work. Pap. Phon.

    (1970)
  • R.A. Harshman

    Determination and proof of minimum uniqueness conditions for PARAFAC1

    UCLA Work. Pap. Phon.

    (1972)
  • J.B. Kruskal

    Rank, decomposition, and uniqueness for 3-way and N-way arrays

  • N.D. Sidiropoulos et al.

    On the uniqueness of multilinear decomposition of N-way arrays

    J. Chemom.

    (2000)
  • R. Bro et al.

    Maximum likelihood fitting using simple least squares algorithms

    J. Chemom.

    (2002)
  • H.A.L. Kiers

    Weighted least squares fitting using ordinary least squares algorithms

    Psychometrika

    (1997)
  • R. Bro, Multi-way analysis in the food industry. Models, algorithms, and applications. PhD thesis, University of...
  • R.A. Harshman

    Substituting statistical for physical decomposition: are there application for parallel factor analysis (PARAFAC) in non-destructive evaluation

  • R.A. Harshman et al.

    The PARAFAC model for three-way factor analysis and multidimensional scaling

  • S.E. Leurgans et al.

    A decomposition for three-way arrays

    Siam J. Matrix Anal. Appl.

    (1993)
  • Cited by (223)

    • On four-way CP model estimation efficiency

      2024, Computational Statistics
    View all citing articles on Scopus
    View full text