INTRODUCTION

With the rapid growth of e-commerce, an increasing number of marketing channels are available to marketers. In addition to traditional advertising through TV, radio, direct mail, magazine, newspaper, outdoor billboard and so on, digital advertising by paid search, online display, video, social and email has attracted great attention in recent years because of their precise targeting and prompt performance tracking. To drive sales or lead generation, or for branding, it is not unusual for several marketing campaigns to run simultaneously or consequently, via different online, as well as offline, channels. A user who is converted – made a purchase or generated a lead in the end – might have experienced several advertisements from a variety of channels. To achieve maximal marketing efficiency, one needs to determine how a given budget should be allocated across channels in order to maximize business target measures like revenue, ROI, lead generation or growth rate and so on. For this, media mix modeling (Tellis, 2006) comes into play.

Mathematically, media mix modeling, as an optimization procedure, has two steps. First, one identifies the response of some business target measure to some spend, which is the key and basis of whole optimization. Second, the numerical optimization is performed, using existing software (for example, James, 1994) or implementing certain optimization algorithm, such as Markov Chain Monte Carlo (Metropolis et al, 1953).

Practically, the target response to spend can be approached in two ways – top-down and bottom-up. With the bottom-up method, the attribution model (Shao and Li, 2011) is developed at the user level, and applied to every conversion. Then, contribution from each channel can be quantified by the sum over all the conversions. Given a precise attribution model, the response curve of the business target, such as revenue, leads generated and so on, versus spend can be built individually for all channels, assuming that the budget allocation can be optimized (Basu and Batra, 1984). However, building a reliable attribution model could be very challenging because there are too many factors that need to be considered. For example, channel interaction and user behavior related features, such as the order and time interval between using various channels, as well as a user’s preference, may all affect the final conversion probability, in addition to the impact of the channels used.

Alternatively, the top-down approach ignores all user level details, and works directly on each channel’s spend and business target output data. Instead of building an attribution model, the top-down method makes some appropriate assumptions on the underlying response function forms, and then extracts all the channel response curves at the same time by data fitting. As input data are aggregated at the level that the budget is going to be optimized, an individual user’s personal preference is averaged out and thus ignored. Furthermore, because the response curves of all channels are simultaneously extracted directly from observable measurement – business target metric and channel spend, and channel interactions are addressed automatically.

However, there is another problem with the top-down method in dealing with media mix modeling: prolonged or lagged effect of advertising on user’s conversion behavior, generally known as advertising adstock, or carry-over effect (Broadbent, 1979). This time-response effect couples tightly with so-called shape effect, or, in other words, response of business target measure to spend. In the bottom-up approach, such carry-over effect can be addressed by user’s history related timing variables and attribution in the conversion basis. With the top-down method, to extract business target measure response to spend precisely and reliably, time response has to be well-modeled and extracted at the same time. The commonly advocated additive or multiplicative regression modeling method to fold both effects simply by introducing time lag related or transformed variables (Tellis, 2006; Bhattacharya, 2008) may not be a good choice in this case. To the best knowledge of the authors, a unified treatment of modeling and extracting both effects simultaneously is lacking, and is the contribution of this article.

To make the description concise and easy to understand, hereafter, we will use revenue as the business target measure in the context. But the assumptions, methods and procedures described in this article are very general and applicable to other metrics too.

As spend and revenue data are sensitive information for any company, and restricted for publication, a Monte Carlo simulation study is conducted in this work. The purpose of this study is twofold: (i) to prove that the proposed algorithm works, and (ii) to set up a framework where real data can be processed and results can be obtained immediately.

This article is organized as follows. In the next section the modeling of time response is presented. The following section is devoted to detailing assumptions on revenue response. The procedure to extract both time and revenue response from historical data is explained in the section after that. The Monte Carlo simulation set up and process, which serves to verify that our algorithm to extract response model parameters does indeed work, is described in the subsequent section. Then, in the penultimate section the optimization algorithm and results from simulation data are described. Conclusions and some related issues are summarized and discussed in the final section.

FROM SPEND TO REVENUE – RESPONSE IN TIME

In the real world, if we advertise today, no matter via what channel – online display, paid search or offline TV, newspaper or direct mail – we can never expect to receive response-related revenue at one time point, but, rather, a distribution of revenue over time. This is because individual consumers’ responses to an advertisement may vary in time. Some users may act quickly as they planned to buy long ago, and they see better offer from advertisement now. Others may have to wait to the next payday because of tight budget. And so the campaign effect can last for days or weeks after campaign has ensued or ended.

To model this kind of time-response effect, mathematically, rather than Dirac δ function, that is defined by

and

which may properly model time response of price promotion, some distribution that can simultaneously model time latency, time smear and time decay effects of advertisement is demanded. Here, by time latency effect we mean the time from advertisement start to first purchase resulting from an advertisement; while time smear effects the characteristic of purchase spread over time, and the time decay effect refers to how long the advertisement effect will last. Some studies, by combining Google analytics and Hewlett Packard (HP) online conversion data, have shown that the average time from Google search to purchase in the HP Home and Home Office store, is about 1–2 weeks (Liu, 2012). Obviously, one cannot expect the effects of an advertisement campaign that ended years ago to persist now or last forever.

Among many choices, Gaussian convoluted exponential decay formulated as

and

is advocated in our work, where μ is the Gaussian mean, characterizing time latency, σ is the Gaussian width, quantifying time smear or how soon the advertisement effect reaches its maximum and τ is the decay life time, indicating how quickly an advertisement effect diminishes.

Figure 1 shows three typical distribution of the Gaussian convoluted exponential decay with different parameter configuration. Note that the curves can be shifted along the time axis to the left and right without shape change, by adjusting μ. The general time response is expected to be something like the solid curve. The δ function like response can be modeled by very small τ in relative to σ, as the dash-dotted curve. A prompt response followed by an observable decay should look like the dotted curve. Here, the time unit is determined by input data, at which the revenue and spend data are aggregated, and revenue in this plot should be normalized such that the area under curve equals to 1, as the amplitude will read from another distribution for a given spend, that is introduced in the next section.

Figure 1
figure 1

Time responses with different parameter configuration.

Although time response is not involved in budget allocation optimization, the extracted time-response model parameters from real data will provide helpful insights for business planning. For example, determining when to start a campaign ahead of certain day, and when to stop it, in order to drive maximal purchase in that day, can be inferred from the model parameters, or by inspecting the time-response curve visually, or, more scientifically, by some optimization calculation.

FROM SPEND TO REVENUE – RESPONSE IN AMOUNT

The revenue response to spend is expected to be non-linear, monotonically increasing and to eventually get saturated when the maximal return is reached. However, different channels may respond differently – some channels may be more sensitive to small spends, while other channels may be more sensitive to large spends because of the threshold effect (Hanssens et al, 2001). In the first case, the response curve should be concave down, and in the second case concave up, in the low spend range.

The normalized lower incomplete gamma function

and

where k is the shape parameter and θ the scale parameter (Abramowitz and Stegun, 1965), could be a good candidate to model the response amount of revenue to spend.

Shown in Figure 2 are three curves corresponding to different shape and scale parameter configurations. One can see that the concave up, down and straightforward response in the low spend range are well-approximated by the solid, dash-dotted and dotted curves, respectively.

Figure 2
figure 2

Revenue response with different parameter configuration.

Note that this function is asymptotic to 1, like the time response, and, again, it provides the shape that we are looking for. Therefore, another parameter, Rmax to gauge the maximal absolute amount of revenue response to the infinite large spend, is still needed. Therefore, for each channel, we end up with six parameters in total.

As aforementioned, revenue response to spend is the basis and key for the budget allocation optimization. The main task of the media mix modeling is to model this response. The optimization output completely relies on how accurately the extracted revenue response explains real data. An inaccurate revenue response input will bias budget allocation in the output. Hence, the accuracy of revenue response reconstructed from real data is very crucial. Thus, model parameters Rmax, k and θ have to be reconstructed with high accuracy.

RESPONSE CURVE RECONSTRUCTION

In this work, the objective is, given historical data of daily (or hourly, weekly, monthly and so on) revenue and spend in each channel, to determine how much marketing budget should be allocated to each of the channels in order to maximize total returned revenue. The first step is to find out revenue to spend response. Besides the given revenue and spend data, if a user’s history and transaction information are also available, then one can build an attribution model, where for every conversion, split revenue according to model predicted channel attribution, sum attributed revenue over all conversions for every channel, will get channel revenue, by combining with channel spend, a revenue response can be built. Here, we assume that only revenue and channel spend by time is available, so the problem has to be approached differently. On the basis of the assumptions on time and revenue response introduced in the previous two sections, we extract channel response by a minimization procedure.

Specifically, revenue received in day i, the r i is modeled by

where i is a data point or time index that goes from 1 to N, j is the channel label from 1 to M. As input data have already been discretized, the time integral here will be replaced by a summation over some number of days before and include the day that the revenue data point r i corresponds. Therefore, a cutoff beyond which time response ƒ(t; μ j , σ j , τ j ) drops to a certain level, say 1 per cent of the time response maximum, for example, is applied in this work.

The model parameters are determined by

Here, the least square loss is adopted. Other loss function definitions, such as log-likelihood can also be used for minimization to fit model parameters.

As each channel is modeled by six parameters, we will end up with 30 parameters to fit when there are five channels. Searching for a global minimum of such non-linear function in as high as 30-dimensional space is not trivial, and turns out to be the biggest challenge in this work, as, conceptually, the model we proposed here is mathematically elegant, practically, we have to prove that it is doable – even with limited data points. This is why Monte Carlo simulation study is needed. With simulation data, we know what we put in, by checking the consistency between input and reconstructed responses, we can get some sense on how our approach will work when applied to real data.

MONTE CARLO SIMULATION

To verify that our algorithm to model and extract both time and revenue responses works in reality, a Monte Carlo simulation study is conducted. The simulation study has two steps: data generation and model parameter reconstruction. In the data generation step, each channel’s daily spend is generated following some statistical distribution, then the corresponding channel revenue is calculated according to revenue response for the set of input parameters. Owing to time response, one can expect that, for spend in a given day, the resulted revenue should be distributed over a few days. Then the sum of revenue over channels and over time up to the day will be ‘observed’ revenue. Finally, the generated spend of each channel and calculated ‘observed’ revenue of every day are fed into reconstruction.

In the parameter reconstruction step, it is assumed that we know nothing about the time and revenue response model parameters that we used to generate the data. The only input is daily revenue, and spend of each channel. By assuming a Gaussian convoluted exponential decay for time response and incomplete Gamma for revenue response, we try to extract the set of parameters, and see if we can reproduce those parameters that we input in the generation step. Only in the case where the reconstructed parameters agree with those input ones used in generation step, within error, can we be sure that the algorithm is working. And only after that can the algorithm can be reliably applied to real data.

Shown in Table 1 are the parameters used for our Monte Carlo data generation. Here, we assume there are five channels. The channel spend is uniformly generated, between 0 and 10, which is based on the revenue to spend response curve, as shown in Figure 2. Furthermore, note that campaigns may run in one or more channels, start at different times and last for specific time periods, run simultaneously with each other or stop in the same time to make a black period. To simulate this kind of campaign setup, we assume that Channel 1 runs 90 per cent of the time, Channel 2 60 per cent of the time, Channels 3 and 4 run 40 per cent of the time and Channel 5 runs 30 per cent of the time randomly.

Table 1 Parameters used to generate time and revenue response

The generated revenue and channel spend are shown in Table 2. For this work, 1000 data points are generated – assumes we have about 3 years of daily revenue and spend data. But only the top 100 rows are shown here.

Table 2 Revenue and channel-spend data by Monte Carlo generation

The reconstructed model parameters based on the 1000 data points are tabulated in Table 3. One can see that the input parameters are very well-reproduced.

Table 3 Reconstructed model parameters with 1000 data points

The input and reconstructed time and revenue response curves, with input and reconstructed parameters, are plotted versus each other in Figures 3 and 4, respectively. Although there is a minor difference between input and corresponding reconstructed parameters, the reconstructed curve overlaps almost exactly with those corresponding input ones.

Figure 3
figure 3

Input versus reconstructed time-response curves with 1000 data points.

Figure 4
figure 4

Input versus reconstructed revenue response curves with 1000 data points.

To account for the case in which less data points are available, the stability and quality of model parameter reconstruction are investigated with less than 50 data points. The reconstructed model parameters are tabulated in Table 4. The input and reconstructed time and revenue response curve comparisons are shown in Figures 5 and 6, respectively. From these tables and plots, one can see that the reconstruction quality is pretty good even for the case of less available data points. However, one should keep in mind that the real data may contain various uncertainties resulting from marketing condition change and/or data collection and so on. To extract the underlying model parameters accurately, more data points are always desired and preferred.

Table 4 Reconstructed model parameters with 50 data points
Figure 5
figure 5

Input versus reconstructed time-response curves with 50 data points.

Figure 6
figure 6

Input versus reconstructed revenue response curves with 50 data points.

In our work, both TMINUIT minimization subroutine call (James, 1994) and Markov Chain Monte Carlo optimization algorithm (Metropolis et al, 1953) are tried, and produced very close results. On the basis of our experience, the Markov Chain Monte Carlo algorithm is preferred, as with properly chosen step size and temperature parameter, it generally converges very well and always reproduces the input parameters with relatively high accuracy.

BUDGET ALLOCATION OPTIMIZATION

Once revenue responses are extracted for all channels, the budget allocation can be optimized by

Here, s k is the spend to be allocated into channel k and r(s k ) the corresponding revenue from the response curve. This optimization is done on a daily basis for our simulation, that is, to re-allocate every day’s total spend of the five channels, to maximize total revenue. In reality, it is to optimally allocate the daily budget to achieve maximal revenue return. This can be done again, by TMINUIT and Markov Chain Monte Carlo.

Note that, for optimization performance evaluation, optimizing spend allocation is always based on reconstructed model parameters. As estimation on optimization resulted lift could be biased, revenue calculated with reconstructed parameters while reconstruction is actually failed or significantly off from input for Monte Carlo simulation optimization performance study should be calculated with input parameters instead of reconstructed ones. For real data, revenue calculation has to be based on reconstructed model parameters, and compared with received revenue to get optimization lift estimation, as no input – the real underlying parameters available in this case.

The generated and re-allocated daily channel spend, as well as corresponding revenue – before and after optimization – are shown in Table 5. Again, only the first 100 of the 1000 data points are shown. One can see that the lift in revenue is very significant. Overall, nearly 60 per cent increase in total revenue by re-allocating channel spend optimally for this simulated data set is obtained.

Table 5 Revenue and channel spend, left – from Monte Carlo generation, right – after daily budget allocation optimization, no time-response smear is applied

CONCLUSION AND DISCUSSION

In this work, an algorithm to model both time response and revenue to spend response at the same time for media mix modeling is presented. A Monte Carlo simulation study is conducted to investigate the possibility of extracting time and revenue response simultaneously from revenue and channel-spend data. The quality and reliability of model parameter reconstruction from various sizes of data are also investigated. The performance of re-allocating channel spend optimally based on extracted revenue response is evaluated. Our simulation results show, relative to arbitrary assignment of daily budget to each channel, that nearly a 60 per cent increase in revenue can be achieved by channel-spend optimization.

The algorithm described here is very general and can be applied to any budget allocation optimization at any level, for example, business unit, region or country, product category, retail store and so on, wherever budget is going to be allocated to multiple places, where time responses may behave differently and need to be taken into consideration carefully. At the company level, for instance, when allocating budget to R&D, production, marketing and so on, one may have to keep in mind that R&D generally takes a much longer time than marketing to see its return. Although revenue is taken as an example in this work, it can be replaced by any other business target metric like profit, growth rate, lead generation and so on.

Throughout this work, the marketing condition is assumed to be static, and so our target metric – revenue here depends only on time and channel spend. This is to simplify our modeling effort. In a real marketing environment, many other factors such as competitive effect, context effects and so on (Tellis, 2006) have to be considered. In addition, possible non-linear effects of time response to spend, that is, time-response model parameters may change with advertising intensity, and so be a function of spend, is ignored.

Compared with an autoregressive model, which is popularly advocated in economics, our algorithm models time and revenue response separately, while an autoregressive model mixes them together. Although mathematically, an autoregressive model is as simple as a regression, intuitively, our approach offers a clearer picture and description about the problem. When some channel or business function unit has a very long time lag in responding to an investment/spend, like, say, research and development, it is addressed by time response in our model the same way as for other channels or business functions that have no time lag. But, with limited data points, an autoregressive model could be hard to fit, as the number of time lag-related terms have to be added in.

Finally, although our algorithm has been proven to work by a Monte Carlo simulation study, it is still subject to be tested with real data and in a real marketing environment. Further, results from the application of this algorithm to business will be tested in future.