In some applications, simple linear restrictions may be imposed on the elements of matrix A. For instance, some of the probabilities in the joint distribution of Q and X may be set equal to zero, for example for combinations of Q and X that cannot occur in practice. After imposing such zero constraints, all the non-zero cell probabilities should still add to one. The quadratic loss function φ can be minimized under equality constraints on the unknown elements of matrix A by applying the method of Lagrangian multipliers.
We first rewrite the quadratic loss function
φ in the following way using vectorization operations on matrices (see Schott
1997, pp. 261–266). For the vector of residuals
r, we obtain
$$\begin{array}{@{}rcl@{}} {\mathbf{r}} &= &\text{vec}({\mathbf{AD}}-{\mathbf{E}}) \\ &=& \text{vec}({\mathbf{I}}_{n\times n}{\mathbf{AD}})-\text{vec}({\mathbf{E}}), \end{array} $$
where
In×n is an
n ×
n identity matrix. Applying Theorem 7.15 from Schott (
1997, p. 263) yields
$${\mathbf{r}} =({\mathbf{D}}^{\prime}\otimes{\mathbf{I}}_{n\times n})\cdot\text{vec}({\mathbf{A}})-\text{vec}({\mathbf{E}}), $$
in which ⊗ is the Kronecker product of two matrices (Graham
1982). Defining
\({\mathbf {P}}={\mathbf {D}}^{\prime }\otimes {\mathbf {I}}_{n\times n}\),
a = vec(
A) and
e = vec(
E), we are able to write
$$\mathbf{r} = {\mathbf{Pa}}-{\mathbf{e}}, $$
so that the least squares function becomes
$$\begin{array}{@{}rcl@{}} \varphi &=& \frac{1}{2}{\mathbf{r}}^{\prime}{\mathbf{r}}\\ &=& \frac{1}{2}({\mathbf{a}}^{\prime}{\mathbf{P}}^{\prime}{\mathbf{P}} {\mathbf{a}}-2{\mathbf{e}}^{\prime}{\mathbf{Pa}}+{\mathbf{e}}^{\prime}{\mathbf{e}}). \end{array} $$
The completely unconstrained solution to the minimization problem is given by
$$\mathbf{a}_{0} = ({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}\cdot{\mathbf{P}}^{\prime}{\mathbf{e}}. $$
Now suppose that the
S linear equality constraints can be represented by a matrix equation
$$\mathbf{Ha} = {\mathbf{c}}. $$
The matrix
H is of order
S ×
N,
N being the number of cells in matrices
A and
E. We may assume that
H is of rank
S; otherwise, the linear equality constraints would not be linearly independent. To minimize the least square function
φ under a set of
S linear constraints on the elements of
A, the Lagrangian is defined as
$$ {\mathbf{L}} = \varphi-{\mathbf\lambda}^{\prime}({\mathbf{Ha}}-{\mathbf{c}}). $$
(2)
Setting the first derivatives of
L with respect to
a equal to the zero vector, and solving for
a yields:
$$\mathbf{a} = ({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}({\mathbf{P}}^{\prime}{\mathbf{e}}+{\mathbf{H}}^{\prime}\boldsymbol{\lambda}), $$
which can be rewritten as:
$$\mathbf{a} ={\mathbf{a}}_{0}+({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}\boldsymbol{\lambda}. $$
Solving for the unknown Lagrangian multipliers by taking the derivative of the Lagrangian (Eq.
2), and setting it to zero, or equivalently by imposing linear constraints
Ha −
c = 0 yields:
$$\boldsymbol{\lambda} = [{\mathbf{H}}({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}]^{-1}({\mathbf{c}}-{\mathbf{Ha}}_{0}). $$
So that the final solution for
a is:
$${\mathbf{a}} = {\mathbf{a}}_{0}+({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}[{\mathbf{H}}({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}]^{-1}({\mathbf{c}}-{\mathbf{Ha}}_{0}). $$
Note that the vector
c −
Ha0 represents the deviations of the unconstrained solution from the linear equality constraints. Again a consistent estimate of
a can be obtained by replacing
P and
a0 with their sample estimates.