Data sources, measures, and sample
The field study analyzes recall effectiveness of the CPSC. The CPSC releases a standardized recall announcement together with the affected firm if the CPSC, the firm, a consumer, or any other supply chain member identifies a significant product hazard. The primary two objectives of any recall are to (a) locate and remove defective products as quickly as possible and (b) “communicate accurate and understandable information in a timely manner to the public about the product defect, the hazard, and the corrective action” (CPSC
2012: 18). Each recall announcement includes the exact recall date, product details, hazard, remedy, incidents and injuries, number of units recalled, time frame during which the product has been sold, and price.
Remedy Following Chen et al. (
2009), Liu et al. (
2016), Mafael et al. (
2022), and Raithel and Hock (
2021), and based on the CPSC recall report, we construct the dummy variable
Remedy that has a value of 1 if the firm offered full remedy (i.e., is responsible for fixing the issue, such as a free repair, exchange, or refund) and 0 if the firm offered partial remedy (i.e., shifts (some of) the responsibility for fixing the issue to their customers, such as free repair kits).
Incident likelihood measures the likelihood of future incidents if the defective product is not corrected. We measure this variable by calculating the ratio between (i) the natural logarithm of the number of consumer safety incidents which have been reported to the CPSC before the product recall was announced and (ii) the natural logarithm of the number of recalled units (i.e., recall volume). This variable’s values can range from 0 to 1.
7
Firm reputation is measured by
Fortune magazine’s reputation score one year before the recall. This measure has been used in prior product recall research (e.g., Raithel & Hock,
2021). The survey is conducted each year among high-ranking executives, directors, and financial analysts in the US that addresses quality, innovativeness, investment value, financial soundness, employer-related aspects, community and environmental responsibility, and corporate assets.
8
The model includes several control variables:
Product price is the natural logarithm of the maximum retail price in US dollars (from CPSC). With an increasing financial value of the product (and thus increasing financial threat if the product is malfunctioning), customers’ willingness to return the product should be higher.
Product sell time is the natural logarithm of days the products have been sold before the recall (from CPSC). The longer the product is being sold, the more difficult it is to trace all units.
Product volume is the natural logarithm of the number of recalled units (from CPSC). With increasing product volume, it becomes more difficult (Hall & Johnson-Hall,
2021) and costly for firms (Raithel et al.,
2021) to trace all units and achieve high recall effectiveness.
Hazard high / hazard medium Following Raithel and Hock (
2021), we code two dummy variables for
Hazard (from CPSC).
Hazard High is 1 if a very serious injury is likely or death is possible (e.g., fire),
Hazard Medium is 1 if a major injury is possible but death is very unlikely (e.g., laceration). Hazard Low (if only a minor injury or no injury is possible (e.g., bruise)) serves as baseline condition. Since a higher failure hazard poses a greater threat to customers’ health, recall effectiveness is expected to be higher (Hoffer et al.,
1994; Rupp & Taylor,
2002).
Percentage product registration 581 U.S.-based, “CloudResearch approved participants” (approval rate > 80%, < 5,000 studies completed) participated in the study (Mage = 38.96, 57% female). To avoid fatigue, each participant only rated 15 different products. Each product was rated by at least 100 participants. We asked participants if they had ever registered the product (e.g., bicycle). The answer choices were “I never purchased one before,” “I have purchased and registered one before,” and “I have purchased one before but did not register it.” The more products are registered, the easier products can be traced, which increases recall effectiveness.
Product relevance 148 U.S.-based, “CloudResearch approved participants” (approval rate > 80%, < 5,000 studies completed) participated in the study (Mage = 32.04, 43% female). To avoid fatigue, each participant only rated 20 different products (e.g., how frequently do you use a bicycle (1 = not at all, 7 = very)). Each product was rated by at least 64 participants. Consumers who use a product more (vs. less) frequently are more (vs. less) dependent on the proper functioning of the recalled product. Thus, they could either have a higher incentive to participate in the recall because they need a working product or a lower incentive because they do not have an alternative while their defective product is being fixed or replaced (e.g., child car seat).
Media attention is the natural logarithm of Associated Press (AP) articles mentioning the product recall on the announcement day (from Factiva). It controls for recall salience, which could covary positively with customer awareness of the recall and, thus, recall participation.
Investor response is the abnormal stock return (difference between actual and expected stock return) on the recall announcement day (from CRSP) and controls for investors’ sentiment and expected financial implications of the recall (e.g., Chen et al.,
2009; Raithel & Hock,
2021).
Period since last recall is the natural logarithm of days since the announcement of the firm’s last product recall before the current recall. To avoid the impact of outliers, the maximum value is capped at 1,000 days.
Industry fixed effects control for product category-specific recall effectiveness. For example, it is easier to track electronic devices, such as smartphones, as opposed to furniture.
Year fixed effects control for changes in technology (e.g., faster dissemination of product recall information through social media, advanced product tracking) as well as general regulator activities (e.g., in 2012, the OECD launched the global recall portal, which also covers CPSC recalls, new product safety standards, and product safety awareness campaigns).
Correction for sample self-selection
Firms can decide not to share any recall effectiveness data. To correct this potential bias, we adopt a Heckman-type correction (Heckman,
1979), which has been applied in the product recall context before (e.g., Liu et al.,
2016). It involves a selection equation (sample self-selection) and one outcome equation (recall effectiveness). The selection equation is a binomial probit model, which models the (assumed) exogenous factors (exclusion restrictions) influencing sample self-selection. We follow the recommendations of Certo et al. (
2016) to test for potential sample selection, test the (empirical) validity of exclusion restrictions, and report the results.
We identified three potential exclusion restrictions that have a bearing on firms’ decision to share recall effectiveness data but are unlikely to have a direct impact on recall effectiveness. We use (1) the natural logarithm of the cumulative number of prior injuries. The risk of costly litigation should increase the firm’s motivation and ability to collect and share information about recall compliance. For example, IKEA has started to collect contact information from consumers who buy specific products (Real Homes
2021) after the firm was fined $46 million in a lawsuit related to a prior product recall (The New York Times,
2020). Consumers, on the other hand, are less likely to be aware of the total number of injuries associated with prior product recalls. Their recall participation is more likely affected by the characteristics of the
current recall. We use (2) the percentage of firms reporting recall effectiveness in a year because it is more likely that firms have the willingness and ability to restore the records for more recent recalls. The correlation of this exclusion restriction with the sample indicator is 0.478 (
p = 0.000). However, this ability and motivation to restore and share more recent data should not directly correlate with recall effectiveness. The correlation of the exclusion restriction with recall effectiveness is only 0.052 (
p = 0.451). Finally, we use (3) the firm’s batch number (dummies for eight clusters based on alphabetical order) because the CPSC does not approach all firms at once but rather on a rolling basis over a longer period (two years). This increases the likelihood that the response rates differ due to reasons unrelated to the recall (e.g., CPSC approaches one batch of firms around a holiday period, which lowers response rates). It seems unlikely that customers’ decision to (not) participate in the recall is affected by the firm’s name order in the alphabet.
Model estimation
We estimate two first stage models. First, for each recall
i, the probability for
Ti = 1 (recall effectiveness reported = 1 vs. not = 0) is modeled as a function of the sample self-selection exclusion restrictions
zi:
$$P\left({T}_{i}=1\right)={{\boldsymbol{\alpha }}^{{\varvec{T}}}{\varvec{z}}}_{{\varvec{i}}}+{{{{\varvec{\beta}}}^{{\varvec{T}}}{\varvec{x}}}_{{\varvec{i}}}+ \epsilon }_{i}$$
(1)
In line with Certo et al. (
2016), we enter the focal X-variables (
Remedy,
Incident likelihood, and
Reputation) into the first stage model. If they are significant, a sample-selection bias is likely, which requires the inclusion of the sample-selection correction term into the second stage model. The results suggest that the three focal variables are significantly associated with the sample indicator in the first stage model (Δ
χ2(3) = 7.28,
p = 0.063, ΔPseudo-R
2 = 0.017), indicating that a self-selection bias in the second-stage estimates is possible. The three exclusion restrictions have a relatively strong predictive power of the sample indicator as the incremental Pseudo-R
2 increase is 0.222.
9 Table
C1 in Web Appendix C shows the first-stage model results.
Second, for each recall
i, the probability for
Ri = 1 (remedy is full = 1 vs. partial = 0) is modeled as a function of the exclusion restrictions
yi associated with remedy choice:
$$P\left({R}_{i}=1\right)={{\boldsymbol{\alpha }}^{{\varvec{T}}}{\varvec{y}}}_{{\varvec{i}}}{+ \epsilon }_{i}$$
(2)
The two exclusion restrictions have a relatively strong predictive power of remedy choice (Pseudo-R
2 is 0.316.). Table
C2 in Web Appendix C shows the first-stage model results.
In the second stage, we enter the two control functions into the outcome equation:
$$\begin{array}{l}{Recall\;Effectiveness}_{i}={{\varvec{\beta}}}^{{\varvec{T}}}\left(\begin{array}{c}{Remedy}_{i}\\ {Incident\;Likelihood}_{i}\\ {Reputation}_{i}\\ {c}{{Remedy}_{i}*Reputation}_{i}\\ {{Incident\;Likelihood}_{i}*Reputation}_{i}\end{array}\right)\\ {\qquad} +{{{\varvec{\gamma}}}^{{\varvec{T}}}{\varvec{C}}{\varvec{o}}{\varvec{n}}{\varvec{t}}{\varvec{r}}{\varvec{o}}{\varvec{l}}{\varvec{s}}}_{{\varvec{i}}}+{\lambda }_{1}{IMR(sample\;selection)}_{i}+{\lambda }_{2}{IMR(remedy\;choice)}_{i}+{\varepsilon }_{i} \end{array}$$
(3)
For each recall
i, recall effectiveness is a function of the focal covariates
Remedy (full vs. partial),
Incident Likelihood,
Reputation, and the focal interaction terms. The vector
β includes the regression coefficients. The vector
Controls (including industry fixed effects, year fixed effects, and intercept) contains the control variables. The coefficients
λ1 and
λ2 control for the sample self-selection and remedy choice biases. It is common practice to add several control functions simultaneously (e.g., Lawrence et al.,
2021). Finally, ε is the error term. Since the outcome is a percentage ranging between 0 and 1, we estimate a fractional probit regression (Papke & Wooldridge,
1996) and adjust for heteroscedasticity by industry cluster-robust standard errors and z-standardization of all metric covariates.
10 The z-standardization includes the focal metric variables
Incident Likelihood and
Reputation, which are included in interaction terms, thereby allowing for interpretation of the focal variables’ main effects (Spiller et al.,
2013). To compare effect sizes, we also show the Average Marginal Effects (AME).
Results
Table
D1 in Web Appendix D shows the descriptive statistics and correlations of second-stage model variables.
11 Table
1 below shows the results for the fractional probit regression Table
D2 in Web Appendix D shows the main effects only model).
Table 1
Fractional probit regression results for Study 1
Focal effects | Remedy (full vs. partial) | H1 ( +) | .333* | .156 | .032 | .114 |
Incident Likelihooda | H2 ( +) | -.020 | .050 | .685 | -.007 |
Reputationa | ( ±) | -.390*** | .098 | .000 | -.133 |
Remedy*Reputationa | H3 ( +) | .371** | .129 | .004 | .127 |
Incident Likelihooda*Reputationa | H4 ( +) | .111** | .042 | .008 | .038 |
Controls | Product pricea | ( +) | .361** | .112 | .001 | .123 |
Product sell timea | (-) | -.327*** | .067 | .000 | -.111 |
Recall volumea | (-) | -.116** | .044 | .008 | -.040 |
Hazard: Medium (vs. low) | ( ±) | -.011 | .177 | .951 | -.004 |
Hazard: High (vs. low) | ( ±) | -.173$ | .104 | .095 | -.059 |
Product relevancea | ( +) | .092$ | .049 | .061 | .031 |
Percentage product registrationa | ( +) | -.058 | .078 | .455 | -.020 |
Media attentiona | ( ±) | -.092* | .037 | .014 | -.031 |
Investor responsea | ( ±) | -.056 | .064 | .381 | -.019 |
Period since last recalla | ( ±) | .081* | .038 | .032 | .027 |
Industry fixed effects | | YES | | | |
Year fixed effects | | YES | | | |
Endogeneity correction | Control Function Sample Selection (IMRb) | | .127 | .254 | .616 | |
Control Function Remedy Choice (IMRb) | | .314 | .371 | .397 | |
Model Fit | Wald Chi2 | | 81.000*** | | | |
Pseudo-R2 | | .135 | | | |
Sq. correlation btw. observed and predicted | | .454 | | | |
N | | 217 | | | |
The model is significant (χ2(5) = 81.00, p < 0.001) and has a good fit (Pseudo-R2 = 0.135, squared correlation of observed and predicted Recall Effectiveness is 0.454). The focal covariates (including interaction) produce a Pseudo-R2 of 0.024 and the squared correlation of observed and predicted Recall Effectiveness is 0.085.
We find the following regarding H1 to H4 (we apply an error rate of α = 5% to all tests):
Discussion
In Study 1, we leverage unique field data to analyze drivers of recall effectiveness. We find support for three out of four hypotheses. First, the results highlight the importance of full (vs. partial) remedy. Generally, firms are advised to offer full remedy to achieve high recall effectiveness (support for H1). There are, however, situations where firms can offer partial remedy and achieve equally high recall effectiveness (H3). Vice versa, in other situations, a firm’s full (vs. partial) remedy offer creates disproportionally higher recall effectiveness. First, when firm reputation is low (not extremely low), there is no significant difference between partial and full remedy. For high reputation firms, however, offering full remedy is much more important as it leads to significantly higher recall effectiveness than partial remedy. Although incident likelihood alone, all else being equal, is not significantly related to recall effectiveness (H2), this effect differs significantly for low vs. high reputation firms (H4). The results suggest that for low reputation firms the relationship between incident likelihood and recall effectiveness is negative. Customers do not trust low reputation firms to correct the defective product when there is a high incident likelihood. In contrast, for high reputation firms, high incident likelihood goes along with higher recall effectiveness. These findings suggest that recall participation is shaped by the firm’s reputational profile. Accordingly, high, medium, and low reputation firms should manage recalls differently to achieve similarly high recall effectiveness.
The goal of the next two experiments is twofold. First, we would like to test the results of the field study in an experimental setting, thereby eliminating any endogeneity and self-selection concerns. Second, we would like to hone in on the psychological process by testing the extent to which our focal mediators perceived benefits and self-efficacy are affected by remedy (Study 2), incident likelihood (Study 3), and firm reputation (Studies 2 and 3), thereby testing H5a—H7b.