Abstract
When the partial least squares estimation methods, the “modes,” are applied to the standard latent factor model against which methods are designed and calibrated in PLS, they will not yield consistent estimators without adjustments. We specify a different model in terms of observables only, that satisfies the same rank constraints as the latent variable model, and show that now mode B is perfectly suitable without the need for corrections. The model explicitly uses composites, linear combinations of observables, instead of latent factors. The composites may satisfy identifiable linear structural equations, which need not be regression equations, estimable via 2SLS or 3SLS. Each time practitioners contemplate the use of PLS’ basic design model the composites model is a viable alternative. The chapter is conceptual mainly, but a small Monte Carlo study exemplifies the feasibility of the new approach.
This chapter “continues” a sometimes rather spirited discussion with Wold, that started in 1977, at the Wharton School in Philadelphia, via my PhD thesis, Dijkstra (1981), and a paper Dijkstra (1983). There was a long silence, until about 2008, when Peter M. Bentler (UCLA) rekindled my interest in PLS, one of the many things for which I owe him my gratitude. Crucial also is the collaboration with Joerg Henseler (Twente), that led to a number of papers on PLS and on ways to get consistency without the need to increase the number of indicators, PLSc, as well as to a software program ADANCO for composites. I am very much in his debt too. The present chapter expands on Dijkstra (2010) by avoiding unobservables much as possible while still adhering to Wold’s fundamental principle of soft modeling.
Working paper version of a chapter in “Recent Developments in PLS SEM,” H. Latan and R. Noonan (eds.) to be published in 2017/2018.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
“Soft modeling” indicates that PLS is meant to perform “substantive analysis of complex problems that are at the same time data-rich and theory-primitive” (Wold 1982).
- 3.
I am not saying here that methods that are not well-calibrated are intrinsically “bad.” This would be ludicrous given the inherent approximate nature of statistical models. Good predictions typically require a fair amount of misspecification, to put it provocatively. But knowing what happens when we apply a statistical method to “the population” helps answering what it is that it is estimating. Besides, consistency, and to a much lesser extent “efficiency,” was very important to Wold.
- 4.
It should be pointed out that I see PLS’ mode B as one of a family of generalized canonical variables estimation methods (Sect. 4.3.1), to be treated on a par with the others, without necessarily claiming that mode B is the superior or inferior method. None of the methods will be uniformly superior in every sensible aspect.
- 5.
Vectors and matrices will be distinguished from scalars by printing them in boldface.
- 6.
A random sample of indicator-vectors and the existence of second order moments is sufficient for the consistency of the estimators to be developed below; with the existence of fourth-order moments we also have asymptotic normality.
- 7.
- 8.
The estimation method based on these observations is called 2SLS, two-stage-least-squares, for obvious reasons, and was developed by econometricians in the 1950s of the previous century.
- 9.
Kettenring (1971) is the reference for generalized canonical variables.
- 10.
These statements are admittedly a bit nonchalant if not cavalier, but there seems little to gain by elaborating on them.
- 11.
With \(\boldsymbol{\Sigma }\) one does not really need an iterative routine of course: \(\boldsymbol{\Sigma }_{ij} = r_{ij}\boldsymbol{\Sigma }_{ii}\mathbf{w}_{i}\mathbf{w}_{j}^{\intercal }\boldsymbol{\Sigma }_{jj}\) can be solved directly for the weights (and the correlation). But in case we just have an estimate, an algorithm comes in handy.
- 12.
See chapter two of Dijkstra (1985/1981).
- 13.
This is true when applied to the estimate for \(\boldsymbol{\Sigma }\) as well. With an estimate the other methods will usually require more than just one iteration (and all programs will produce different results, although the differences will tend to zero in probability).
- 14.
A working paper version of this paper said that the elements of the mode A loading vector would always be “larger” than the corresponding true values. I am obliged to Michel Tenenhaus for making me realize that the statement was not true.
- 15.
See Dijkstra (2014) for further discussion of Wold’s approach to modeling. There is a subtle issue here. One could generate a sample from a system with B lower-triangular, a full matrix C and a full, non-diagonal covariance matrix for z. Then no matter how large the sample size, we can never retrieve the coefficients (apart from those of the first equation which are just regression coefficients). The regressions for the other equations would yield values different from those we used to generate the observations, since the zero correlation between their equation-residuals would be incompatible with the non-diagonality of cov(z).
- 16.
What follows will be old hat for econometricians, but since non-recursive systems are relatively new for PLS-practitioners, some elaboration could be meaningful.
- 17.
As an example consider a square B with units on the diagonal but otherwise unrestricted, and a square C of the same dimensions, containing zeros only except the last row, where all entries are free. The order condition applies to all equations but the last, but none of the coefficients can be retrieved from \(\boldsymbol{\Pi }\). This matrix is, however, severely restricted: it has rank one. How to deal with this and similar situations is handled by Bekker et al. (1994).
- 18.
With 2SLS c endo,2 in the first equation is in the first stage replaced by its regression on the four exogenous variables. In the second stage we regress c endo,1 on the replacement for c endo,2 and two exogenous variables. So the regression matrix with three columns in this stage is spanned by four exogenous columns, and we should be fine in general. If there were four exogenous variables on the right-hand side, the regression matrix in the second stage would have five columns, spanned by only four exogenous columns, the matrix would not be invertible and 2SLS (and all other methods aiming for consistency) would break down.
- 19.
For more general models one could ask MATLAB, say, to calculate the rank of the matrices, evaluated for arbitrary values. A very pragmatic approach would be to just run 2SLS. If it breaks down and gives a singularity warning, one should analyze the situation. Otherwise you are fine.
- 20.
This is in fact, see below: \(\left (\text{vec}\left [\left (\mathbf{BP - C}\right )^{\intercal }\right ]\right )^{\intercal }\).
- 21.
For the standard approach and the classical formulae, see, e.g., Ruud (2000)
- 22.
One might as well have used mode B of course, or any of the other canonical variables approaches. There is no fundamental reason to prefer one to the other. MAXVAR was available, and is essentially non-iterative.
- 23.
The whole exercise takes about half a minute on a slow machine: 4CPU 2.40 Ghz; RAM 512 MB.
- 24.
It is remarkable that the accuracy of the 2SLS and 3SLS estimators is essentially as good, in three decimals, as those reported by Dijkstra and Henseler (2015a,b) for Full Information Maximum Likelihood (FIML) for the same model in terms of latent variables, i.e., FIML as applied to the true latent variable scores. See Table 2 on p. 18 there. When the latent variables are not observed directly but only via indicators, the performance of FIML clearly deteriorates (stds are doubled or worse).
- 25.
“Capitalization on chance” is sometimes used when “small-sample-bias” is meant. That is quite something else.
- 26.
Freedman gives the following example. Let the 100 × 51 matrix \(\left [\mathbf{y,X}\right ]\) consists of independent standard normals. So there is no (non-) linear relationship whatsoever. Still, a regression of y on X can be expected to yield an R-square of 0. 50. On the average there will be 5 regression coefficients that are significant at 10%. If we keep the corresponding X-columns in the spirit of “exploratory research” and discard the others, a regression could easily give a decent R-square and “dazzling t-statistics” (Freedman 2009, p.75). Note that here the “dedicated” model search consisted of merely two regression rounds. Just think of what one can accomplish with a bit more effort, see also, e.g., Dijkstra (1995).
- 27.
At one point I thought that “a way out” would be to condition on the set of samples that favor the chosen model using the same search procedure (Dijkstra and Veldkamp, 1988): if the model search has led to the simplest true model, the conditional estimator distribution equals, asymptotically, the distribution that the practitioner reports. This conditioning would give substance to the retort given in practice that “we always condition on the given model.” But the result referred to says essentially that we can ignore the search if we know it was not needed. So much for comfort. It is even a lot worse: Leeb and Pötscher (2006) show that convergence of the conditional distribution is only pointwise, not uniform, not even on compact subsets of the parameter space. The bootstrap cannot alleviate this problem, Leeb and Pötscher (2006), Dijkstra and Veldkamp (1988).
- 28.
- 29.
The estimators based on minimization of these distances are asymptotically equivalent. The value of the third derivative of f appears to affect the bias: high values tend to be associated with small residual variances. So the first example, “GLS,” with \(f^{^{{\prime\prime\prime}} }(1) = 0\), will tend to underestimate these variances more than the second example, “LISREL,” with \(f^{^{{\prime\prime\prime}} }(1) = -4\). See Swain (1975).
- 30.
- 31.
- 32.
One can verify directly that the regression yields \(\boldsymbol{\Lambda }\). Also note that here \(\mathbf{F}\boldsymbol{\Lambda }\mathbf{= I}\).
- 33.
One may wonder about the “best linear predictor” of f in terms of y: E\(\left (\mathbf{f\vert \,y}\right )\). Since f equals E\(\left (\mathbf{f\vert \,y}\right )\) plus an uncorrelated error vector, cov\(\left (E\left (\mathbf{f\vert \,y}\right )\right )\) is not “larger” but “smaller” than cov\(\left (\mathbf{f}\right )\). So E\(\left (\mathbf{f\vert \,y}\right )\) satisfies neither of the two desiderata.
- 34.
Dijkstra (2015b).
- 35.
PLSc exploits the lack of correlation between some of the measurement errors within blocks. It is sometimes equated to a particular implementation (e.g., assuming all errors are uncorrelated, and a specific correction), but that is selling it short. See Dijkstra (2011, 2013a,b) and Dijkstra and Schermelleh-Engel (2014).
References
Bekker, P. A., & Dijkstra, T. K. (1990). On the nature and number of the constraints on the reduced form as implied by the structural form. Econometrica, 58(2), 507–514
Bekker, P. A., Merckens, A., & Wansbeek, T. J. (1994). Identification, equivalent models and computer algebra. Boston: Academic.
Bentler, P. M., & Dijkstra, T. K. (1985). Efficient estimation via linearization in structural models. In P. R. Krishnaiah (Ed.), Multivariate analysis (Chap 2, pp. 9–42). Amsterdam: North-Holland.
Bentler, P. M. (2006). EQS 6 structural equations program manual. Multivariate Software Inc.
Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer.
Boardman, A., Hui, B., & Wold, H. (1981). The partial least-squares fix point method of estimating interdependent systems with latent variables. Communications in Statistics-Theory and Methods, 10(7), 613–639.
DasGupta, A. (2008). Asymptotic theory of statistics and probability. New York: Springer,
Dijkstra, T. K. (1983). Some comments on maximum likelihood and partial least squares methods. Journal of Econometrics, 22(1/2), 67–90 (Invited contribution to the special issue on the Interfaces between Econometrics and Psychometrics).
Dijkstra, T. K. (1985). Latent variables in linear stochastic models. (2nd ed. of 1981 PhD thesis). Amsterdam: Sociometric Research Foundation.
Dijkstra, T. K. (1989). Reduced Form estimation, hedging against possible misspecification. International Economic Review, 30(2), 373–390.
Dijkstra, T. K. (1990). Some properties of estimated scale invariant covariance structures. Psychometrika 55(2), 327–336.
Dijkstra, T. K. (1995). Pyrrho’s Lemma, or have it your way. Metrika, 42(1), 119–125.
Dijkstra, T. K. (2010). Latent variables and indices: Herman Wold’s basic design and partial least squares. In V. E. Vinzi, W. W. Chin, J. Henseler & H. Wang (Eds.), Handbook of partial least squares, concepts, methods and applications (Chap, 1, pp. 23–46). Berlin: Springer.
Dijkstra, T. K. (2011). Consistent partial least squares estimators for linear and polynomial factor models. Technical Report. Research Gate. doi:10.13140/RG.2.1.3997.0405.
Dijkstra, T. K. (2013a). A note on how to make PLS consistent. Technical Report. Research Gate, doi:10.13140/RG.2.1.4547.5688.
Dijkstra, T. K. (2013b). The simplest possible factor model estimator, and successful suggestions how to complicate it again. Technical Report. Research Gate. doi:10.13140/RG.2.1.3605.6809.
Dijkstra, T. K. (2013c) Composites as factors, generalized canonical variables revisited. Technical Report. Research Gate, doi:10.13140/RG.2.1.3426.5449.
Dijkstra, T. K. (2014). PLS’ Janus face. Long Range Planning, 47(3), 146–153.
Dijkstra, T. K. (2015a). PLS & CB SEM, a weary and a fresh look at presumed antagonists. In Keynote Address at the Second International Symposium on PLS Path Modeling, Sevilla.
Dijkstra, T. K. (2015b). All-inclusive versus single block composites. Technical Report. Research Gate. doi:10.13140/RG.2.1.2917.8082.
Dijkstra, T. K., & Henseler, J. (2011). Linear Indices in nonlinear structural equation models: Best fitting proper indices and other composites. Quality and Quantity, 45, 1505–1518.
Dijkstra, T. K., & Henseler, J. (2015a). Consistent and asymptotically normal PLS estimators for linear structural equations. Computational Statistics and Data Analysis, 81, 10–23.
Dijkstra, T. K., & Henseler, J. (2015b). Consistent partial least squares path modeling. MIS Quarterly, 39(2), 297–316.
Dijkstra, T. K., & Schermelleh-Engel, K. (2014). Consistent partial least squares for nonlinear structural equation models. Psychometrika, 79(4), 585–604 [published online (2013)].
Dijkstra, T. K., & Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In T. K. Dijkstra (Ed.), On model uncertainty and its statistical implications (Chap. 2, pp. 17–38). Berlin: Springer.
Freedman, D. A. (2009). Statistical models, theory and practice. Cambridge: Cambridge University Press. Revised ed.
Freedman, D. A., Navidi, W., & Peters, S. C. (1988). On the impact of variable selection in fitting regression equations. In T. K. Dijkstra (Ed.), On model uncertainty and its statistical implications (Chap. 1, pp. 1–16). Berlin: Springer.
Haavelmo, T. (1944). The probability approach in econometrics. PhD-thesis Econometrica 12(Suppl.), 118pp. http://cowles.econ.yale.edu/.
Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34(5), 2554–2591.
Ruud, P. A. (2000). Classical econometric theory. New York: Oxford University Press.
Shmueli, G., Ray, S., Velasquez Estrada, J. M., & Chatla, S. (2016). The elephant in the room: Predictive performance of PLS models. Journal of Business Research, 69, 4552–4564.
Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58(3), 433–451.
Pearl, J. (2009). Causality—models, reasoning and inference. Cambridge: Cambridge University Press.
Swain, A. J. (1975). A class of factor analysis estimation procedures with common asymptotic sampling properties. Psychometrika, 40, 315–335.
Wansbeek, T. J. & Meijer, E. (2000). Measurement error and latent variables in econometrics. Amsterdam: North-Holland.
Wold, H. (1966). Nonlinear estimation by iterative least squares procedures. In F. N. David (Ed.), Research papers in statistics. Festschrift for J. Neyman (pp. 411–444). New York: Wiley.
Wold, H. (1975). Path models with latent variables: The NIPALS approach. In H. M. Blalock et al. (Eds.), Quantitative sociology (Chap. 11, pp. 307–358). New York: Academic.
Wold, H. (1982). Soft modeling: The basic design and some extensions. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part II (Chap. 1, pp. 1–54). Amsterdam: North-Holland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Here we will prove that \(\boldsymbol{\Sigma }\) is positive definite when and only when the correlation matrix of the composites, R c , is positive definite. The “only when”-part is trivial. The proof that {R c is p.d.} implies {\(\boldsymbol{\Sigma }\) is p.d.} is a bit more involved. It is helpful to note for that purpose that we may assume that each \(\boldsymbol{\Sigma }_{ii}\) is a unit matrix (pre-multiply and post-multiply by a block-diagonal matrix with \(\boldsymbol{\Sigma }_{ii}^{-\frac{1} {2} }\) on the diagonal, and redefine w i such that \(\mathbf{w}_{i}^{\intercal }\mathbf{w}_{i} = 1\) for each i). So if we want to know whether the eigenvalues of \(\boldsymbol{\Sigma }\) are positive it suffices to study the eigenvalue problem \(\widetilde{\boldsymbol{\Sigma }}\mathbf{x =}\gamma \mathbf{x}\):
with obvious implied definitions. Observe that every nonzero solution of
corresponds with γ = 1, and there are \(\mathop{\sum }_{i=1}^{N}p_{i} - N\) linearly independent solutions. The multiplicity of the root γ = 1 is therefore \(\mathop{\sum }_{i=1}^{N}p_{i} - N\) and we need to find N more roots. By assumption R c has N positive roots. Let u be an eigenvector with eigenvalue μ, so R c u = μ ⋅u. We have
In other words, the remaining eigenvalues are those of R c , and so all eigenvalues of \(\widetilde{\boldsymbol{\Sigma }}\) are positive. Therefore \(\boldsymbol{\Sigma }\) is p.d., as claimed.
Note for the determinant of \(\boldsymbol{\Sigma }\) that
and so the Kullback–Leibler’ divergence between the Gaussian density for block-independence and the Gaussian density for the composites model is \(-\frac{1} {2}\) log(det\(\left (\mathbf{R}_{c}\right )\)). It is well known that 0 ≤ det\(\left (\mathbf{R}_{c}\right ) \leq 1\), with 0 in case of a perfect linear relationship between the composites, so Kullback–Leibler divergence is infinitely large, and 1 in case of zero correlations between all composites, with zero divergence.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Dijkstra, T.K. (2017). A Perfect Match Between a Model and a Mode. In: Latan, H., Noonan, R. (eds) Partial Least Squares Path Modeling. Springer, Cham. https://doi.org/10.1007/978-3-319-64069-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-64069-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64068-6
Online ISBN: 978-3-319-64069-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)