Prediction and confidence intervals for nonlinear measurement error models without identifiability information
Introduction
A measurement error model is one in which the explanatory variable is not observed exactly. Instead it is observed with an error, a case often involved in applications. This problem has a long history and dated back to Adcock (1878). It still continues to be an important one both in theory and application. See Carroll et al. (1995), Fuller (1987), and Cheng and Van Ness (1999) and references cited therein.
Having error in the explanatory variable, however, causes a tremendous problem in theory and in application due to unidentifiability. A model is said to be unidentifiable when there are more than two sets of parameters that govern the same distribution of the observations. Consequently, given the observations or even the exact distribution of these observations, it is impossible to provide any indication as to which set of parameters really governs the data. Usually, statisticians who apply measurement error models would then need to have some additional information, i.e., that some of the parameters involved are known or estimable. In a few situations, these parameters are known. In some situations, especially when there are validation data, it is possible to estimate some parameters by the validation data. However, in many situations, no such information is available and the model remains unidentifiable. This seems to make measurement error models useless.
In the linear measurement error model, it is known that if the objective is prediction, it is not necessary to adjust for measurement errors (see Fuller, 1987, p. 74). The same rationale holds in the probit measurement error model (Buzas and Stefanski, 1996). Hence, in the above two models we can directly construct statistical intervals aiming at predicting for the future observation or its mean as if we have observed the true explanatory variable (i.e. treat the surrogate variable as the true explanatory variable) and there is no need to modify the models to adjust for measurement errors. Nevertheless, in this paper we investigate another two nonlinear measurement error models, the exponential and loglinear models. We find that one does not need to have any identifiability information if the goal is prediction, but has to do some modification to the models to adjust for measurement errors. We show that after some modification to the model, one can use pseudo-likelihood estimation of variance functions in the weighted least squares method to construct prediction interval for Yn+1 and confidence interval for , the conditional expectation of the future observation Yn+1 given Xn+1.
In conclusion, unidentifiable measurement error models can be useful if the goal is prediction. In some models (for example, linear and probit models), one does not need to make adjustment for measurement errors. In other models (such as exponential and loglinear models), some modification to the model is required to adjust for measurement errors. However, it is interesting in many situations that additional information is not needed in prediction.
Section snippets
General approach
Assume a measurement error model where we observe Yi and Xi satisfying(i.e., the conditional p.d.f. of Yi given Ui and Xi is g(ui,Θ), where Θ is the unknown vector of parameters) and
In this paper, we mainly focus on the univariate Xi's. However, the idea obviously can be generalized to the multivariate Xi's in a straightforward manner. We assume that Ui and δi are independently normally distributed,Consequently, the conditional p.d.f. of
Exponential model
Considerwhere Ui and δi are distributed as in (2.1) has a distribution with mean 0 and variance σε2. Assume that Ui and δi are independent of εi. By (2.2), we can write Yi as a form of (2.3),whereSince Xi and are uncorrelated, we can proceed as if we deal with an ordinary exponential regression model. However, by
Loglinear model
In this section, we consider the modelwhere Ui and δi are independently normally distributed as in (2.1). By (2.2), we can write Yi as a form of (2.3),where f(Xi,Θ)=eβ0+β1Xi, β0=b0+b1(1−r)mU+b12rσδ2/2, β1=rb1, and . Again, Xi and are uncorrelated. Note that although the conditional expectation of Yi given Xi has the same form as that of , the conditional distribution of Yi given Xi is not a
Conclusion
In this paper, we discuss how to construct valid statistical intervals in two nonlinear measurement error models without additional information. The problem is important since additional information is often unavailable in practice. By using pseudo-likelihood estimation of variance functions in the weighted least squares method, this is possible if the target is the future response variable Yn+1 or the conditional mean of Yn+1 given Xn+1, where Xn+1 is the observed surrogate variable
References (6)
Note on the method of least squares
Analyst
(1878)- et al.
Instrumental variable estimation in generalized linear measurement error models
J. Amer. Statist. Assoc.
(1996) - et al.
Transformation and Weighting in Regression
(1987)
Cited by (7)
A note on the closed-form identification of regression models with a mismeasured binary regressor
2008, Statistics and Probability LettersCitation Excerpt :Our identification is constructive in the sense that it can directly lead to a consistent estimator. Other examples of obtaining identification in measurement error models without additional sample information include exploiting model restrictions as in Huwang and Hwang (2002) or the use of higher moment error restrictions as in Lewbel (1997) and Erickson and Whited (2002). This note is organized as follows: Section 2 provides the main identification results and Section 3 summarizes the note and discusses extensions.
Pseudo-likelihood estimation of multidimensional response models: Polytomous and dichotomous items
2017, Springer Proceedings in Mathematics and StatisticsEffective utilization for data of natural environment corrosion of materials
2015, Data Science JournalLikelihood prediction for generalized linear mixed models under covariate uncertainty
2014, Communications in Statistics - Theory and MethodsEstimation of models in a Rasch family for polytomous items and multiple latent variables
2007, Journal of Statistical Software