1 Introduction

When the milk industry stopped advertising, milk sales remained steady for 12 months (Sutherland 2009, p. 191). This outcome contradicts the prediction of immediate and gradual decay of milk sales based on extant advertising models. The absence of gradual sales decline offers an opportunity to increase profit by stopping advertising if managers seek to minimize costs, for example, during recessions (Gijensberg et al. 2009; Tellis and Tellis 2009). But after remaining steady for the year, milk sales declined sharply, highlighting the risks inherent in not advertising. In another case study, Sutherland (2009, p. 193) found that brand awareness did not decline immediately or gradually after advertising stopped, but it dropped sharply after several months.

Previous marketing literature extensively studied how advertising affects awareness formation (e.g., Zielske and Henry 1980; Mahajan and Muller 1986; Batra et al. 1995; Naik et al. 1998; Dube et al. 2005; Bruce 2008; Srinivasan et al. 2010). Based on extant awareness formation models, awareness declines immediately and gradually in the absence of advertising. However, the above anecdotal evidence tells a different story. Because no analytical or empirical study examines explicitly how awareness evolves when advertising stops, we lack the understanding of the effects of cessation of advertising on the erosion of awareness. This paper aims to fill this void by studying awareness evolution in the absence of advertising.

Consider two brands, one well known and the other obscure, that stop advertising. Awareness of the well-known brand’s advertising would last longer than that for the obscure one because consumers remember the ads for the well-known brand. We note that “awareness” refers to the ad awareness, which market research firms (e.g., Millward Brown, Inc.) measure by asking the question: “Which of these brands of cars have you seen advertised on television recently?” Extant advertising models presuppose (i.e., without testing) that consumers forget ads instantly, suggesting a lack of consumer memory. Specifically, Zielske and Henry (1980) or Mahajan and Muller (1986) apply the classical Nerlove and Arrow (1962) model, where awareness decreases in the absence of advertising at the rate proportional to the current awareness level. But if consumers remembered ads for 1 month (say), then the awareness loss today would depend on the awareness level prevailing a month ago rather than the current awareness level. This memorableness of ads, which delays forgetting, is ignored even in recent advertising models (see Batra et al. 1995; Naik et al. 1998; Dube et al. 2005; Bruce 2008; Srinivasan et al. 2010).

The above discussion brings to fore the following questions: how to incorporate memory in standard awareness models? What is the 90% duration of advertising effects in the presence of consumer memory? How would managers estimate consumer memory using readily available market data?

To address them, we conceptualize the impact of consumer memory as “delaying the forgetting” of ads. We then capture the role of delayed forgetting in a model of awareness formation via a delay differential equation (DDE). We emphasize that DDEs, which open up a new class of dynamic models exhibiting richer dynamics and promising newer insights (e.g., see Bellen and Zennaro 2003; Arino et al. 2006) than the corresponding ordinary differential equations, have not been applied in marketing; this study marks the first application to an important advertising phenomenon (viz., memory for ads). In addition, we explore analytically the duration of advertising effects under various scenarios. Next, we apply Kalman filtering to estimate not only the forgetting rate (i.e., the carryover effect), but also the time delay in forgetting (i.e., ad memorability). Econometric results establish the existence of consumer memory for ads. Specifically, for Peugeot ads, we estimate ad memorability of 3 weeks. Finally, if we ignore consumer memory as in the extant advertising models, we would overstate the forgetting rate from 8.6% to 14.1% per week (about 39% larger).

The rest of the paper proceeds as follows: Section 2 formulates the model; Sections 3 and 4 present analytical and empirical results, respectively; Section 5 provides the managerial implications and new avenues for future research; Section 6 concludes the paper.

2 Awareness formation model in the presence of consumer memory

Awareness formation models describe the growth and decay of a brand’s awareness over time. Marketing literature contains several models of awareness formation (see Mahajan et al. 1984; Mahajan and Muller 1986; Bass et al. 2007; Bruce 2008; Naik et al. 2008; Srinivasan et al. 2010), of which the Nerlove–Arrow (NA) or autoregressive model is the most commonly used in theoretical and empirical analyses (see Fig. 1 for the frequency of published marketing studies using different dynamic specifications in the last 5 years in marketing journals).

Fig. 1
figure 1

Frequency of different dynamic specifications

Specifically, the NA model is given by the ordinary differential equation:

$$ \dot{A} = \beta u(t) - \delta A(t) $$
(1)

where \( \dot{A} = \frac{{dA}}{{dt}} \) denotes the change in unaided (or aided) awareness A(t) over time t, u(t) is the advertising spending, and β and δ measure ad effectiveness and forgetting rate, respectively. The first term on the right-hand side of Eq. 1 causes awareness to grow in response to advertising spending u(t); the larger the ad effectiveness β, the faster the awareness growth. The second term (−δA) represents a loss in awareness due to forgetting; the larger the forgetting rate δ, the greater the decay in awareness. In the extant models, this awareness loss is proportional to the current awareness level A(t).

Equation 1 embodies an implicit assumption that consumers forget instantaneously. If consumers possess memory, such that they remember ads for τ periods, then forgetting would be delayed by τ periods. In other words, when consumers remember ads for τ periods before forgetting it, there is a delay in the onset of awareness decay. Therefore, delayed forgetting—rather than instantaneous forgetting—should drive awareness decay. To incorporate delayed forgetting in (1), we let consumers possess memory for τ periods. Consequently, awareness loss should be proportional to \( - \delta A(t - \tau ) \) rather than − δA(t). Then, the resulting model is given by the delay differential equation:

$$ \dot{A} = \beta u(t) - \delta A(t - \tau ) $$
(2)

where A(t − τ) is the awareness τ periods ago. A(t) and A(t − τ) refer to awareness at time t and at time (t − τ), respectively. Thus, A(t) measures the level of awareness, while τ indicates the memorability of advertisements.

In Eq. 2, forgetting is not instantaneous as in extant models, but delayed by τ periods because consumers remember the ad for τ periods; that is, the awareness loss today depends on those consumers who saw the ad τ periods ago. When τ  = 0, Eq. 2 nests Eq. 1; i.e., the standard NA model is a special case of our more general formulation. Thus, this nesting clarifies that Eq. 1 presupposes instantaneous forgetting: no consumer memory because τ  = 0. The next section explores analytically the consequences of endowing consumers with memory.

3 Analytical results

We examine awareness evolution after the advertising stops. To this end, we set u(t) = 0 and analyze the erosion of awareness over time. Figure 2 compares the awareness evolution from the standard and proposed models using δ = 0.01 and τ = 52. In the presence of consumer memory (bold line), we observe that (a) awareness remains constant for a while, i.e., the decay begins with a delay rather than immediately; and (b) awareness declines sharply, i.e., not gradually. This pattern is similar to the case study of milk advertising, where sales remained constant for a year (due to memorable ads) before dropping rapidly. Next, we characterize the duration of advertising effects under four market scenarios: no memory, presence of memory, and memory varies across consumers, and across exposures.

Fig. 2
figure 2

Sales evolution in the presence and absence of memory

The duration of advertising effects refers to the time interval required for awareness to drop to a fixed fraction of its initial value after the advertising stops (Clarke 1976; Naik 1999; Tellis 2004). We determine the time taken for awareness to decay by 90% of its current value by setting u(t) = 0 and integrating Eq. 1 to obtain \( \int_{A(t)}^{A(t + {D_0})} {\frac{{dA}}{A}} = - \int_t^{t + {D_0}} {\delta dt} \), where D 0 is the 90% duration of advertising effects and A(t + D 0) = (1 − 0.90) × A(t). In the absence of consumer memory, this duration of advertising effects equals (see Naik 1999, p. 356):

$$ {D_0} = \frac{{Ln(10)}}{\delta } $$
(3)

In the presence of consumer memory, however, we need to set u(t) = 0 in Eq. 2 and solve the resulting delay differential equation \( \dot{A} = - \delta A(t - \tau ) \). The Web Appendix presents the derivation based on the omega function, which is given by \( {\omega_0}(x) = \mathop \sum \limits_{n = 1}^\infty {( - n)^{n - 1}}\frac{{{x^n}}}{{n!}} \) for the principal branch and is similar to the exponential function\( \exp (x) = \mathop \sum \limits_{n = 1}^\infty \frac{{{x^n}}}{{n!}} \) (for further details, see Corless et al. 1996). The duration of ad effects in the presence of consumer memory equals (see the Web Appendix),

$$ {D_\tau } = - \frac{{\tau Ln(10)}}{{{\omega_0}( - \delta \tau )}} $$
(4)

In Eq. 4, D τ is positive because ω 0(−δτ) is negative. Furthermore, the durations in (3) and (4) relate to each other via the expression\( {D_\tau } = {D_0}\exp ({\omega_0}( - \delta \tau )) \). Consequently, as τ approaches zero, the \( \mathop {{\lim }}\limits_{\tau \to 0} {D_\tau } = {D_0} \) because ω 0(0) = 0 and so exp(ω 0(0)) = 1. Thus, the proposed DDE-based model nests the standard NA model.

What is the 90% duration if ad memory varies across consumers or exposures? To incorporate memory variation across consumers, let a consumer i remember an ad for s i periods. Then the distribution of memory across all consumers, given by ϕ(s i ), results in the average length of memory \( \bar{\tau } = \int\limits_i {{s_i}\phi ({s_i})d{s_i}} \). The resulting duration of ad effects when memory varies across consumers is given by

$$ {D_\tau } = - \bar{\tau }ln(10)/{\omega_0}( - \delta \bar{\tau }) $$
(5)

To incorporate memory variation among ad exposures, let consumers remember different exposures for different lengths of time. This scenario converts the delay differential equation to an integro-delay differential equation \( \dot{A} = - \delta \int {A(t - \tau )d\tau } \). Although the derivation becomes more complicated, in the Web Appendix, we derive analytically the duration of ad effects:

$$ {D_\tau } = - \tau ln(10)/2{\omega_0}( - \sqrt {{\delta {\tau^2}}} /2) $$
(6)

Thus, we derived the duration of ad effects under four scenarios (no memory, memory, memory varies across consumers and across exposures). The next section investigates the existence and estimation of consumer memory using market data.

4 Empirical results

How would managers estimate ad memorability using readily available data? To this end, we present the market data, develop an estimation approach, validate it using simulation studies, and report the empirical results.

4.1 Market data

Companies monitor awareness for their brands using tracking studies (see Sutherland 2009, p. 168). PSA Peugeot Citroën, a French company with $73 billion in revenues, tracked the percentage of consumers aware of Peugeot 206’s advertising (for details, see Naik et al. 2008). In contrast to extant advertising studies, we analyze the decline in awareness after Peugeot stopped advertising. Specifically, we study 32 consecutive weeks during which Peugeot did not advertise, allowing us the opportunity to estimate the memory for their advertising campaign, without continued ad spending contaminating the evolution of awareness levels. The mean awareness level in the dataset was 0.625% and the standard deviation was 0.322%. Figure 3 displays the evolution of awareness over the 32 weeks in the absence of advertising. Given no advertising during this period, measurement errors drive the fluctuations up and down in the awareness levels.

Fig. 3
figure 3

Awareness evolution over the 32 weeks

4.2 Estimation method

To quantify the memory for Peugeot 206 ads, we set u(t) = 0 in Eq. 2 and discretize the resulting model \( \dot{A} = - \delta A(t - \tau ) \) to obtain the transition equation,

$$ {A_t} = {A_{t - 1}} - \delta {A_{t - \tau }} + {\nu_t} $$
(7)

where A t denotes awareness in week t, and the error term νt follows N(0,\( \sigma_\nu^2 \)). In Eq. 7, awareness in week t depends on awareness levels not only from the last week, but also from τ weeks ago (because consumers remember the ads). Eq. 7 is not a Markov process, which has the property that a future state depends only on the present but not the past. Because we need the Markov property to construct the likelihood function, we restore it by expressing (7) as follows:

$$ \left[ {\begin{array}{*{20}{c}} {{A_t}} \\{{A_{t - 1}}} \\{{A_{t - 2}}} \\{{A_{t - 3}}} \\\vdots \\{{A_{t - \tau + 1}}} \\\end{array} } \right] = \left[ {\begin{array}{*{20}{c}} 1 & 0 & 0 & \ldots & \ldots & { - \delta } \\1 & 0 & 0 & \ldots & \ldots & 0 \\0 & 1 & 0 & \ldots & \ldots & \vdots \\\vdots & \ddots & \ddots & \ddots & \ldots & \vdots \\\vdots & \ddots & \ddots & \ddots & \ddots & 0 \\0 & 0 & \ldots & 0 & 1 & 0 \\\end{array} } \right]\left[ {\begin{array}{*{20}{c}} {{A_{t - 1}}} \\{{A_{t - 2}}} \\{{A_{t - 3}}} \\{{A_{t - 4}}} \\\vdots \\{{A_{t - \tau }}} \\\end{array} } \right] + \left[ {\begin{array}{*{20}{c}} {{\nu_t}} \\0 \\0 \\0 \\\vdots \\0 \\\end{array} } \right] $$
(8)

where \( {\alpha_t} = ({A_t}, \cdots, {A_{t - \tau + 1}})\prime. \). Note that Eq. 8 is a Markov process because α t depends on α t-1 . Also, the dimension of α t depends on the length of consumer memory τ. We initialize Eq. 8 using the sample mean.

Next, we link (8) to the observed awareness Y t via the observation equation,

$$ {Y_t} = [1,0, \cdots, 0]{\alpha_t} + {\varepsilon_t} $$
(9)

where the error term εt follows N(0, \( \sigma_\varepsilon^2 \)), and it captures the fluctuations observed in Fig. 3. The probability of observing the entire sequence \( \{ {Y_1},{Y_2}, \cdots, {Y_T}\} \) is given by the likelihood function:

$$ \begin{array}{*{20}{c}} {{L_\tau }(\theta ) = P({Y_1},{Y_2}, \cdots, {Y_T};\theta, \tau )} \\{ = P({Y_1}|{\Im_0}) \times P({Y_2}|{\Im_1}) \times P({Y_3}|{\Im_2}) \times \cdots \times P({Y_T}|{\Im_{T - 1}})} \\\end{array} $$
(10)

where \( \theta = (\delta, {\sigma_\nu },{\sigma_\varepsilon })\prime \) is the parameter vector, P(⋅) denotes the probability measure, and \( {\Im_t} = {Y_t} \cup {\Im_{t - 1}} \) is the information set. Simplifying (10) for a given τ, we obtain the log-likelihood,

$$ L{L_\tau }(\theta ) = - \frac{1}{2}\mathop \sum \limits_{t = 1}^T \ln ({f_t}) - \frac{1}{2}\mathop \sum \limits_{t = 1}^T \frac{{\varepsilon_t^2}}{{{f_t}}} $$
(11)

where \( {f_t} = Var({Y_t}|{\Im_{t - 1}}) \) and \( {\varepsilon_t} = {Y_t} - E[{Y_t}|{\Im_{t - 1}}] \), which are obtained from the Kalman filter recursions (e.g., see Naik et al. 1998, p. 233).

Finally, we obtain the parameter estimates by maximizing Eq. 11; that is, \( \hat{\theta } = \arg \max (L{L_\tau }(\theta )) \). Based on the information matrix, we get the standard errors from the squared-root of the diagonal of the covariance matrix,\( \left[ { - E[\frac{{{\partial^2}L{L_\tau }(\theta )}}{{\partial \theta \partial \theta \prime}}]} \right]_{\theta = \hat{\theta }}^{ - 1} \). To quantify consumer memory, we estimate the model for various τ and retain the model that yields the smallest bias-corrected Akaike information criterion (Hurvich and Tsai 1989), \( {\hbox{AI}}{{\hbox{C}}_C}(\tau ) = - 2{\hbox{LL}}_\tau^* + \frac{{T(T + p)}}{{T - p - 2}} \) where \( {\hbox{LL}}_\tau^* \) is the maximized log-likelihood value, p is the number of parameters in θ. Thus,\( \hat{\tau } = \arg \min \,({\hbox{AI}}{{\hbox{C}}_C}(\tau )) \) furnishes the estimate of ad memorability.

4.3 Simulation results

Using Monte Carlo studies, we show that the proposed approach teases apart the effects of memory from carryover. We generate 1,000 data sets using the proposed model as the data-generating process with δ = 0.15, τ = 3, and signal-to-noise ratio (SNR) = 4.30, which is similar to the real data (SNR real = 4.20). If the proposed method works, the estimated values will cluster near the true values. We estimate the parameters via the procedure described in Section 4.2 and present the simulation results in Fig. 4. These figures illustrate that the proposed method recovers ad memorability and forgetting rate effectively.

Fig. 4
figure 4

Simulation results for parameter recovery

What would be the consequence of ignoring memory? To this end, we re-analyze the simulated data using the NA model that ignores the memory effect. Figure 5 presents the results. We observe that, if brand managers ignore the memory effect, they would over-estimate the forgetting rate.

Fig. 5
figure 5

Simulation results for parameter bias

4.4 Empirical results

4.4.1 Model selection

Applying the above method to Peugeot 206 data, we estimate the proposed model for τ ranging from 1 to 8. At τ = 1, we obtain the NA model. Figure 6 illustrates the AICC values for the different models. We find that the lowest AICC value corresponds to τ = 3 weeks. Thus, market data indicates that the memory for Peugeot advertisements lingers for 3 weeks.

Fig. 6
figure 6

Information criterion for various values of consumer memory τ

We compare the retained model with τ = 3 weeks to the NA model without memory. The difference in AICC values from the NA model and the retained model is 2.35. When this difference exceeds 2, the model with the smaller AICC has stronger empirical support (Burnham and Anderson 2002, p.70). Alternatively, we can compute the AICC weights to determine the evidence ratio, i.e., the ratio of the weights from the best and the competing models. The AICC weight for the NA model is 0.236, and that for the retained model is 0.764. The evidence ratio of 3.23 indicates that the retained model is 3.23 times more likely to be the best model relative to the NA model (Burnham and Anderson 2002, p.78). Note that we make no claims that model (2) is superior to model (1) in general; rather, we provide an approach to determine which one of the nested models be retained for any given market data. Thus, based on this empirical analysis, strong evidence exists to support the presence of memory for ads.

4.4.2 Parameter estimates

Does this presence of consumer memory matter? To answer this question, we first examine the estimated parameter values. Table 1 displays them for both the NA model and the retained models. All parameter estimates have expected signs and are significant at the 90% confidence level.

Table 1 Parameter estimates and t values

We find that forgetting rates vary in the absence or presence of memory. Specifically, when we ignore consumer memory (as in the NA model), we overstate the forgetting rate substantially. The forgetting rate is 0.1409 in the absence of memory and it is 0.0857 in the presence of memory. Thus, the overestimation of forgetting rate is 39%, which corroborates with our simulation results (see Fig. 5).

5 Discussion

5.1 Managerial implications

The existence of memory for ads in the marketplace provides new managerial insights. Managers learn what happens when they stop advertising completely. The milk example, described in Section 1, revealed the delayed nature of the effect of stopping ads on sales. The decision to stop milk advertising did not hurt sales for almost 12 months; however, after 1 year of no advertising, milk sales declined “at a sickening rate” (Sutherland 2009, p. 191). When ad memorability is high, the resulting long delay and rapid decline is what we would observe based on our formulation rather than the standard NA model.

Additionally, the proposed model furnishes a new metric to track the quality of an advertisement. It yields a measure of ad memorability that is different from ad effectiveness (β) and forgetting rates (δ). Our empirical analysis illustrates an approach that managers can employ to estimate the ‘memorability’ of an advertisement.

5.2 Future research

Given the new class of advertising models based on delay differential equations, we list four avenues for future research. First, one extension would be to estimate consumer memory in the presence of advertising. Advertising can change consumer memory for the brand; hence, estimating consumer memory in its presence would have several implications for managerial decision making. For example, if advertising increases consumer memory, what is the optimal advertising schedule? Is it always optimal to maintain advertising at a certain fixed level so that consumer memory never decreases?

Second, researchers could investigate the role of competition in memory for ads. Do competitive ads reduce memory for ads of the focal brand? If so, how should managers alter their advertising schedule in the presence of competing ads? For example, Danaher et al. (2008) show that competitive advertising interference occurs when viewers of advertising of a focal brand are exposed to competitive advertising. They illustrate that competitive interference has a significant impact on the effectiveness of the ads. However, they do not account for the presence of consumer memory via delayed forgetting of ads in the presence of competition.

Third, most advertising models in marketing assume “immediacy” and “homogeneity” of forgetting start time, whereas the proposed model assumes just the latter. In other words, future studies can relax the assumption of homogeneity by estimating the individual-level memory for ads using scanner panel data (we thank an anonymous reviewer for this suggestion).

Finally, future researchers can formulate consumer memory to be state dependent, i.e.,\( \tau = f({y_{t - 1}}) \), where y t-1 is the state at time t-1 (e.g., sales or awareness last week). If a brand, say, advertises heavily then its awareness (or sales) could be high, which can enhance the memory for ads. However, consumer memory might be much smaller if awareness (or sales) did not increase sufficiently in spite of brand advertising. Hence, memory is likely to be a function of the state of the awareness (or sales) levels. The methods to estimate such DDE-based models are not available in standard software SAS, SPSS, R, Gauss or Matlab and, hence, future research needs to develop them to facilitate the understanding of which market factors affect the memory for ads. We hope this study stimulates such advancements.

6 Conclusions

This paper formulates a new model of memory for ads, expanding the class of dynamic advertising models to include delay differential equations. We derive the durations of advertising effects under various scenarios of memory. In addition, we propose an estimation method that allows managers and researchers to distinguish between models with different degrees of memory. Applying this method to market data, we established that ad memorability for the Peugeot 206 brand was about 3 weeks. We also note that other dynamic specifications such as the model of Vidale and Wolfe (1957) \( \dot{S} = \beta u(M - S) - \delta S \) and Sethi (1983) dynamics \( \dot{S} = \beta u\sqrt {{M - S}} - \delta S \) reduce to \( \dot{S} = - \delta S \) when u(t) = 0. Consequently, when advertising ceases (as in our setting), these dynamic specifications are the same as the Nerlove–Arrow model. Hence, the use of Nerlove–Arrow specification does not limit our contributions. Finally, and most importantly, the new model and method based on delay differential equations allows us to investigate the cessation effects of advertising, which is an under-researched topic in marketing. We encourage researchers to join us in augmenting our understanding of delayed actions or outcomes.