Fuzzy estimates of regression parameters in linear regression models for imprecise input and output data

https://doi.org/10.1016/S0167-9473(02)00116-0Get rights and content

Abstract

The method for obtaining the fuzzy estimates of regression parameters with the help of “Resolution Identity” in fuzzy sets theory is proposed. The α-level least-squares estimates can be obtained from the usual linear regression model by using the α-level real-valued data of the corresponding fuzzy input and output data. The membership functions of fuzzy estimates of regression parameters will be constructed according to the form of “Resolution Identity” based on the α-level least-squares estimates. In order to obtain the membership degree of any given value taken from the fuzzy estimate, optimization problems have to be solved. Two computational procedures are also provided to solve the optimization problems.

Introduction

In the real world, the data sometimes cannot be recorded or collected precisely. For instance, the water level of a river cannot be measured in an exact way because of the fluctuation, and the temperature in a room is also not able to be measured precisely because of the similar reason. Therefore, the fuzzy sets theory is naturally to be an appropriate tool in modeling the statistical models when the fuzzy data have been observed. The more appropriate way to describe the water level is to say that the water level is around 30m. The phrase “around 30m” can be regarded as a fuzzy number 30. This is the main concern of this paper.

Since Zadeh (1965) introduced the concept of fuzzy sets, the applications of considering fuzzy data to the regression models have been proposed in the literature. Tanaka et al. (1982) initiated this research topic. They also generalized their approaches to the more general models in Tanaka and Warada (1988), Tanaka et al. (1989), Tanaka and Ishibuchi (1991). The collection of papers edited by Kacprzyk and Fedrizzi (1992) gave an insightful survey.

In the approach of Tanaka et al. (1982), they considered the L-R fuzzy data and minimized the index of fuzziness of the fuzzy linear regression model. Yager (1982) used a linguistic variable to represent imprecise information for the regression models. Moskowitz and Kim (1993) also proposed a method to assess the H-value in a fuzzy linear regression model proposed by Tanaka et al. (1982). Redden and Woodall (1994) compared various fuzzy regression models and gave the differences between the approaches of fuzzy regression analysis and usual regression analysis. They also pointed out some weakness of the approaches proposed by Tanaka et al. Chang and Lee (1994) also pointed out another weakness of the approaches proposed by Tanaka et al. Wang and Tsaur (2000) also proposed a new model to improve the predictability of Tanaka's model. Bárdossy (1990) proposed many different measures of fuzziness which must be minimized with respect to some suggested constraints. Peters (1994) introduced a new fuzzy linear regression model based on Tanaka's approach by considering the fuzzy linear programming problem. Diamond (1988) introduced a metric on the set of fuzzy numbers by invoking the Hausdorff metric on the compact α-level sets, and used this metric to define a least-squares criterion function as in the usual sense, which must be minimized. Ma et al. (1997) generalized Diamond's approach by embedding the set of fuzzy numbers into a Banach space isometrically and isomorphically. Näther 1997, Näther 2000, Näther and Albrecht (1990) and Körner and Näther (1998) introduced the concept of random fuzzy sets (fuzzy random variables) into the linear regression model, and developed an estimation theory for the parameters. Chang and Ayyub (2001) gave the differences between the fuzzy regression and ordinary regression analysis and also Kim et al. (1996) compared both fuzzy regression and statistical regression conceptually and empirically. Chang (2001) proposed a method for hybrid fuzzy least-squares regression by defining the weighted fuzzy-arithmetic and using the well-accepted least-squares fitting criterion. Celminš 1987, Celminš 1991 proposed a methodology for the fitting of differentiable fuzzy model function by minimizing a least-squares objective function. Chang and Lee (1996) proposed a fuzzy regression technique based on the least-squares approach to estimate the modal value and the spreads of L-R fuzzy number. Dunyak and Wunsch (2000) described a method for nonlinear fuzzy regression using a special training technique for fuzzy number neutral networks. D'Urso and Gastaldi (2000) proposed a doubly linear adaptive fuzzy regression model based on a core regression model and a spread regression model. D'Urso (2002) also developed the unconstrained and constrained least-squares estimation procedures. Jajuga (1986) calculated the linear fuzzy regression coefficients using a generalized version of the least-squares method by considering the fuzzy classification of a set of observations and obtaining the homogeneous classes of observations. Kim and Bishu (1998) used a criterion of minimizing the difference of the membership degrees between the observed and estimated fuzzy numbers. Sakawa and Yano (1992) introduced three indices for equalities between fuzzy numbers. From these three indices, three types of multiobjective programming problems were formulated. Tanaka and Lee (1998) used the quadratic programming approach to obtain the possibility and necessity regression models simultaneously. The advantage of adopting a quadratic programming approach is to be able to integrate both the property of central tendency in least squares and the possibilistic property in fuzzy regression.

In this paper, we will first obtain the α-level least-squares estimates from the usual linear regression model by using the α-level real-valued data of the corresponding fuzzy input and output data, and then construct the fuzzy estimates of regression parameters according to the form of “Resolution Identity” in fuzzy sets theory which was introduced by Zadeh (1975). In order to obtain the membership degree of any value taken from the fuzzy estimate, the optimization problems have to be solved. We also develop two computational procedures to solve the optimization problems.

In Section 2, we give some properties of fuzzy numbers. In Section 3, we obtain the α-level least-squares estimates from the usual linear regression model by using the α-level real-valued data of the corresponding fuzzy input and output data. The membership functions of fuzzy estimates will be constructed according to the form of “Resolution Identity” from the α-level least-squares estimates obtained above. In Section 4, we develop two computational procedures to obtain the membership degree of any given value taken from the fuzzy estimates. We also provide the methodology to transact the predicted fuzzy output data. In Section 5, the numerical examples are given to clarify the theoretical results, and show the possible applications in linear regression analysis for imprecise data.

Section snippets

Fuzzy numbers

Let X be a universal set. Then a fuzzy subset à of X is defined by its membership function ξÃ:X→[0,1]. We denote by Ãα={x:ξÃ(x)⩾α} the α-level set of Ã, where Ã0 is the closure of the set {x:ξÃ(x)≠0}. à is called a normal fuzzy set if there exists an x such that ξÃ(x)=1. à is called a convex fuzzy set if ξÃ(λx+(1−λ)y)⩾minÃ(x),ξÃ(y)} for λ∈[0,1] (That is, ξà is a quasi-concave function.)

In this paper, the universal set X is assumed to be a real number system; that is, X=R. Let f be a

Fuzzy estimates of regression parameters

The linear regression model is displayed as follows:Yi01Xi12Xi2+⋯+βp−1Xi,p−1ifor i=1,…,n, where ϵi are the errors. LetX=1X11X1,p−11X21X2,p−11Xn1Xn,p−1andY=Y1Y2Yn.It is well known that the least-squares estimates areβ̂=(XtX)−1XtY,where β̂=(β̂0,β̂1…,β̂p−1), for the following linear model:Yi01Xi12Xi2+⋯+βp−1Xi,p−1for i=1,…,n. Now let us consider the following two α-level linear models:(Ỹi)αLLL(X̃i1)αLL(X̃i2)αL+⋯+βp−1,αL(X̃i,p−1)αLand(Ỹi)αUUU(X̃i1)αUU(X

Computational procedures

Given an estimate r of the fuzzy estimate β̂j, we plan to know its membership degree α. If the decision-makers are comfortable with this membership degree α, then it will be reasonable to take the value r as the estimate of βj. In this case, the decision-makers can accept the value r as the estimate of βj with confidence degree α.

Now from (5), the membership value of any given value r of β̂j can be obtained by solving the following optimization problem:(MP1)maxαs.t.min{β̂L,β̂U}⩽r⩽max{β̂L,β

Numerical examples

The membership function of a triangular fuzzy number ã is defined byξã(r)=(r−a1)/(a2−a1)ifa1⩽r⩽a2,(a3−r)/(a3−a2)ifa2<r⩽a3,0otherwise,which is denoted by ã=(a1,a2,a3). The triangular fuzzy number ã can be expressed as “around a2” or “being approximately equal to a2”. a2 is called the core value of ã, and a1 and a3 are called the left and right spread values of ã, respectively. The α-level set (a closed interval) of ã is then ãα=[(1−α)a1+αa2,(1−α)a3+αa2]; that is, ãαL=(1−α)a1+αa2 and ãαU

Conclusions

We have obtained the fuzzy estimates of regression parameters with the help of “Resolution Identity”. That is to say, the fuzzy estimates are constructed from the α-level least-squares estimates using the α-level real-valued data of the corresponding fuzzy input and output data. In order to obtain the membership degree of any given value taken from the fuzzy estimates of regression parameters, we have to solve the optimization problems. We also propose two computational procedures to solve the

Acknowledgements

The author would like to thank the anonymous referees for their valuable comments and suggestions.

References (33)

Cited by (0)

View full text