2.1 Technical efficiency estimation with DEA approach
DEA is a non-parametric approach that uses linear programming technique to measure the efficiency of decision-making units (DMU) such that the observed input–output vectors are enveloped as tightly as possible (Lee et al.
2009). In DEA, multiple inputs and output can be considered at the same time, with no assumption of the empirical specification or functional form and data distribution. There are two types of DEA, namely input-oriented and the output-oriented DEA. The input-oriented DEA minimizes the input requirements for at least a given output level while the output-oriented maximizes the output levels for a given input (without requiring additional inputs). DEA can further be sub-divided in terms of return-to-scale through the addition of weight constraints. Originally, Charnes et al. (
1978) proposed a constant return-to-scale (CRS) for the measurement of efficiency where all DMUs operate at their optimal scale level. Later, efforts were made to break down the efficiency of the DMUs into components in other to identify specific sources of inefficiencies. Banker et al. (
1984) introduced the concept of a variable return-to-scale (VRS) measurement of efficiency where the overall efficiency can be subdivided into technical and scale efficiencies. Technical efficiency refers to the ability of DMU to produce a given output with a minimum set of inputs (input-oriented) or using a given set of inputs to produce a maximum output level (an output-oriented) (Coelli et al.
2005). A DMU is said to achieve scale efficiency when it operates at an optimal scale. The fundamental principles of calculating the efficiency of DMUs is to construct a frontier, where all efficient DMUs lie, and those found below the frontier are considered inefficient. The efficiency index ranges between zero (fully inefficient) and one (fully efficient).
DEA initially emerged as a practical technique of measuring efficiency and sustainability of industrial production systems (Ullah and Perret
2014). However, recently, DEA has been used in agricultural production sectors following the pioneer works of De Koeijer et al. (
2002) and Reig-Martı́nez and Picazo-Tadeo (
2004). This was motivated by the heterogeneity of smallholder farmers, particularly in developing countries, who are less likely to exhibit a uniform production function rendering stochastic frontier technique of estimating efficiency less appropriate. In this vein, the study used DEA to generate technical efficiency index for smallholder cocoa farmers in Ghana. Agricultural production in general and cocoa production, in particular, is such that farmers do have less control over output but more control over the quantities of inputs used. Hence, the input-oriented efficiency model was applied in this study. As noted by Coelli et al. (
2005), the decision on which orientation to use depends on which side of the production system (input or output) the DMU has more control over. Previous studies such as; Davidova and Latruffe (
2003), Krasachat (
2004), and Reig-Martı́nez and Picazo-Tadeo (
2004), and recent studies such as Ogada et al. (
2014b), Ullah and Perret (
2014) and Rahman and Awerije (
2015) applied input-oriented DEA in agricultural production systems.
The input-oriented DEA is also called CCR-model named after the initials of Charnes et al. (
1978) who developed the model. In the CCR-model, a
jth cocoa farm household uses the input vector
\(X = (1,2, \ldots Z), \in R_{ + }^{Z}\) to produce a desirable output,
\(Y = (1,2 \ldots M), \in R_{ + }^{M}\). Following Cooper et al. (
2007), the technical efficiency (TE) can be calculated using the DEA model;
Minimize
\(\theta\) subject to;
$$\begin{aligned} \theta \varvec{X}_{\varvec{j}} - \varvec{X}^{/} \lambda \ge 0, \hfill \\ \varvec{Y}^{/} \lambda \ge \varvec{Y}_{\varvec{j}} , \hfill \\ \lambda \ge 0, \hfill \\ \end{aligned}$$
(1)
where
\(\theta\) is a scalar and represents the TE score [also known as the overall technical efficiency (OTE)] of the
jth cocoa farm household, and
\(\lambda\) denotes the intensity vector of the weight of the efficient cocoa farmer. The
\(\lambda\) is an indicator that helps to project an inefficient cocoa farmer to an efficient frontier. The inputs and output used and produce by the farmer, respectively, are represented by
\(z \times n\) input matrix
\(X^{/}\) and
\(m \times n\) output matrix
\(Y^{/},\) where
\(X_{j}\) is an input vector of the
jth cocoa farm and
\(Y_{j}\) is an output vector of the
jth cocoa farm. If the above model satisfies the three basic assumptions (convexity, scalability, and free disposability) of the DEA, then the model exhibits a constant return-to-scale. If the assumption of scalability is not satisfied, then the model exhibits variable return-to-scale. Nonetheless, farming is an activity that exhibits a variable return-to-scale (VRS) due to its potential economies of scale (Ullah and Perret
2014). Following Banker et al. (
1984), adding extra constraint of
\(\sum {\lambda_{j} = 1}\) to Eq. (
1) leads to a VRS frontier. The VRS is also referred to as
pure technical efficiency (PTE) and the model is known as BCC model named after the initials of the authors who suggested it (Banker et al.
1984). Unfortunately, BCC model fails to indicate whether an efficient farmer is exhibiting an increasing or a decreasing return-to-scale. To resolve this problem, Cooper et al. (
2007) introduced a non-increasing return-to-scale (NIRS) model where a constraint
\(\sum {\lambda \le 1}\) is added to Eq. (
1). Comparing technical efficiency under constant return-to-scale (TE
CRS) and technical efficiency under non-decreasing return-to-scale (TE
NIRS) specify whether a production unit exhibits increasing return-to-scale or decreasing return-to-scale. If TE
CRS < 1 and TE
CRS = TE
NIRS, then the farmer produces at an inefficiently small output level, and the inefficiency emanates from increasing return-to-scale. On the other hand, if TE
CRS < 1 and TE
NIRS > TE
CRS, then the farmers’ inefficiency results from decreasing return-to-scale (Wossink and Denaux
2006).
2.2 The two-way impact of welfare and technical efficiency
The study performs a regression analysis on the second stage of the efficiency analysis to estimate the impact of welfare (proxied by consumption expenditure per capita) on farm-level technical efficiency. Technical efficiency indices were regressed on a set of socioeconomic (including the welfare variable), farm-specific, and policy factors to explain the variation in technical efficiency. In many studies, a Tobit regression model has been used due to the censoring nature of the efficiency score as recently done by Ogada et al. (
2014a) and previously by Wossink and Denaux (
2006). A simplified farm-level technical efficiency model is specified as:
$$\theta_{i} = f(W_{i} ,S_{i} ,F_{i} ,I_{i} ,L_{i} ),$$
(2)
where
\(\theta_{i}\) denotes the
ith farm efficiency score estimated from the DEA approach,
\(W_{i}\) is welfare indicator,
\(S_{i}\) is other socioeconomic variables,
\(F_{i}\) is farm-specific variables,
\(I_{i}\) is institutional variables, and
\(L_{i}\) is location-specific variables. Given the nature of the efficiency score, the following conditions will be observed:
$$\left\{ {\begin{array}{*{20}c} {\theta_{i} = \theta_{i}^{*} } & \quad {{\text{if}}\,\,0 < \theta_{i} < 1} \\ {\theta_{i} = 0} & {{\text{if}}\,\,\theta_{i} \le 0} \\ {\theta_{i} = 1} & {{\text{if}}\,\,\theta_{i} \ge 1} \\ \end{array} } \right..$$
(3)
Thus, Tobit regression model could be used to estimate Eq. (
2) given the conditions in Eq. (
3). Again, to find out whether farmers with improved welfare are technically efficient or not, a simplified OLS model could be used where the welfare indicator (
\(W_{i}\)) is now the dependent variable and the technical efficiency score (
\(\theta\)) as an explanatory variable as specified in Eq. (
4):
$$W_{i} = f(\theta_{i} ,S_{i} ,F_{i} ,I_{i} ,P_{i} ,L_{i} ),$$
(4)
where variables are as defined earlier.
However, the problem with estimating Eq. (
2) directly with Tobit model is that welfare as one of the explanatory variables is assumed to be exogenous while it is potentially endogenous. Thus, while welfare explains variation in technical efficiency, it is by itself been explained by other variables. Hence, estimating without accounting for such endogeneity leads to biased and inconsistent estimates. Likewise, Eq. (
4), where estimating with OLS directly could also lead to bias estimates due to endogenous nature of the efficiency score (
\(\theta\)) as an explanatory variable.
2.2.1 The Conditional Mixed-process (CMP)
The endogeneity nature of welfare and efficiency variables can lead to under or over-estimation of the true impact of welfare on efficiency and the other way round. To account for this possibility, the study employed the Conditional Mixed-process (CMP) estimator proposed by Roodman (
2011) to estimate Eqs. (
2) and (
4) separately.
2 The CMP is an empire of multi-equation systems with the ability to take a different format of dependent variables. It also controls for both simultaneity and endogeneity where consistent estimates are produced for a recursive system in which all endogenous variables are observed at the right-hand side of the equation (Asfaw and Lipper
2015). Moreover, the CMP has its foundation from the seemingly unrelated regression framework where cross-equations of the error terms are correlated (Makate et al.
2016). From Eq. (
2), assuming the determinants of welfare (as an endogenous variable) is given by:
$$W_{i} = f(X_{i} ),$$
(5)
where
\(X_{i}\) is a vector of variables (excluding efficiency score in this case) explaining the variation in welfare. Allowing for potential endogeneity of the welfare variable
\(W_{i}\), Eqs. (
2) and (
5) can jointly be estimated, and the joint marginal likelihoods can be expressed as follows;
$$\iint_{{\eta_{2} \eta_{5} }} {\left[ {\prod {L_{5} (\eta_{5} )\prod {L_{2} (\eta_{2} )} } } \right]f(\eta_{5} ,\eta_{2} )d\eta_{5} d\eta_{2} },$$
(6)
where
\(L_{2}\) and
\(L_{5}\) are conditional likelihood functions of Eqs. (
2) and (
5), respectively;
\(f(\eta_{5} ,\eta_{2} )\) is the joint estimation of the unobserved heterogeneity components. The joint distribution of the unobserved effects
\(f(\eta_{5} ,\eta_{2} )\) is assumed to be a two-dimensional normal distribution characterized as follows:
$$\left( {\begin{array}{*{20}c} {\eta_{5} } \\ {\eta_{2} } \\ \end{array} } \right) \approx N\left( {\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} {\sigma_{5}^{2} } \\ {\rho_{25} \sigma_{5} \sigma_{2} ,\sigma_{2}^{1} } \\ \end{array} } \right]} \right).$$
(7)
Similarly, the endogenous efficiency score (
\(\theta\)) in the welfare Eq. (
4) is given by:
$$\theta_{i} = f(Z_{i} ),$$
(8)
where
\(Z_{i}\) is a vector of explanatory variables (excluding the welfare variable in this case), then Eqs. (
4) and (
8) could also be jointly estimated by the CMP estimator as elaborated above. The full model is jointly estimated through the conditional mixed process, which utilizes the Geweke, Hajivassiliou, and Keane (GHK) algorithm to consistently estimate the likelihood function in Eq. (
6) (Roodman
2011).The main objective for jointly estimating Eqs. (
2) and (
5), and Eqs. (
4) and (
8) is to deal with potential self-selection bias. Maitra (
2004) noted that joint estimation implies the possibility of non-zero covariance between the error terms of the equations under consideration. For example, Eqs. (
2) and (
5), thus,
\(\text{cov} (\eta_{5} ,\eta_{2} ) \ne 0\). The joint estimation of Eqs. (
2) and (
5), and Eqs. (
4) and (
8) (with correlated errors) allows selectivity bias to be derived from the estimates of the two-way impact of welfare and technical efficiency. Following the arguments of Chamberlin et al. (1975) as cited in Makate et al. (
2016), a system of equations does not necessarily require a set of instruments for identification. However, for the purpose of a ‘good practice’, the study included some instrumental variables for the identification of welfare in Eq. (
2) and TE in Eq. (
4). Instruments for identification must satisfy two conditions. First, they must have a direct correlation with the endogenous variable and second, they must not have a direct correlation with the error term of the outcome variable. In this study, income from other crops and engagement of spouse in non-farm activities were used as instrumental variables for the endogenous variable, welfare in Eq. (
5). Thus, these two variables significant influence welfare but redundant in explaning farm efficiency. For Eq. (
8), location-specific variables, and frequency of pesticides application were used to instruments the endogenous variable, TE. Similarly, the location-specific variables and frequency of pesticides explain the variation in farm efficiency score but not welfare,