3.1 Preliminaries: the classical framework
The classical queueing system is an M / M / 1 model. The first-come-first-served (FCFS) discipline is considered, and there is unlimited waiting space. Arrivals are according to a Poisson arrival process with rate \(\lambda \), service times are independent and identically distributed (i.i.d.) with rate \(\mu \), and there is a single server. Customers are delay sensitive, and we let C denote the waiting cost per time unit for a customer (which is assumed to be paid when the customer enters service). Customers also receive a reward R from service.
In Naor [
65], a customer inspects, upon arrival, the queue length (number of customers in the system) and decides whether to join or balk. An individual joins a queue of size
i if, and only if, her expected utility
\(R - \frac{C(i+1)}{\mu } \ge 0\). The equilibrium joining strategy, i.e., individual optimizing strategy, is a threshold-based strategy where customers who observe
n customers in queue upon arrival join if, and only if,
\(n+1 \le n_e\), where
\(n_e \equiv \lfloor R\mu /C \rfloor \). The social benefit, per unit of time, assuming a threshold joining strategy with threshold
n is given by
\(\lambda (1-p_n) R - C q\), where
\(p_n\) is the stationary probability of finding
n in the system, given a maximum queue length of
n, and
q is the expected queue length. A pure threshold socially optimal strategy exists, and Naor [
65] shows that the social benefit attains its maximum at a value
\(n_s \le n_e\). Rooted in this classical result, a general theme in the queueing-games literature is that the selfish behavior of utility-maximizing customers leads to sub-optimal equilibrium solutions compared to the socially optimal solution. The aim is then to investigate ways of restoring the imbalance. In Naor’s framework, by imposing an appropriate admission fee, i.e., a static, queue-length independent price,
\(\theta \), customers can be motivated to adopt the threshold
\(n_s\) instead of
\(n_e\). The toll may also be set from a revenue maximizer’s objective, i.e., to maximize
\(\lambda (1-p_n) \theta \). In this case, the fee levied by the manager is too high, i.e.,
\(n_r \le n_s \le n_e\), where
\(n_r\) is the corresponding equilibrium threshold.
Edelson and Hilderbrand [
24] consider the basic unobservable model, where customers do not observe the queue length upon arrival, and make joining decisions based on the expected waiting time. Customers may either join the queue, not join, or adopt a mixed strategy where they join with probability
q. It is found that a unique equilibrium strategy exists, and that it is based on the value of
R: If
R is “low,” then no customer joins; if
R is intermediate, then customers adopt a mixed strategy with joining probability
\(q_e \equiv \mu C/R\); and, if
R is large, then everyone joins. The social benefit function attains its maximum at a value
\(q_{soc}\) such that
\(q_{soc} \le q_e\). Thus, as in the observable case, individual optimization leads to queues that are longer than socially desired, but the gap can be corrected by imposing an appropriate admission fee. We note that the objectives of a profit maximizer and the social planner coincide.
3.3 To reveal or not to reveal? Observable versus unobservable queues
We begin by surveying papers which compare the observable and unobservable systems, i.e., address whether or not to reveal queue-length information.
Social welfare, revenue maximization, and throughput. Hassin [
33] studied the impact of information suppression from both the social planner’s and revenue maximizer’s perspectives. In both cases, two quantities play a central role: (i) the potential arrival rate,
\(\lambda \), and (ii) the value of service, relative to the cost of waiting,
\(\nu _s \equiv R \mu /C\). Hassin [
33] compares profits, under profit-maximizing admission fees, in the observable and unobservable cases. He finds that if customers are “very” sensitive to delay,
\(\nu _s \le 2\), i.e.,
\(C \ge R \mu /2\), then it is optimal to reveal the queue length for all
\(\lambda > 0\). However, if customers are not very delay sensitive,
\(\nu _s > 2\), then there exists a threshold,
\(\varLambda _R\), such that it is only optimal to reveal the queue length for
\(\lambda > \varLambda _R\). The intuition behind these results is as follows: When
\(\lambda \) is large, many customers would opt to balk based on average wait-time information, which is high because
\(\lambda \) is high. In this case, disclosing the queue-length information encourages more customers to join in low-congestion states. While it is true that it also discourages customers from joining highly congested states, the key is that these customers would have balked anyway in the unobservable case; thus, revealing information helps the firm. We now turn to the social welfare results. First, we note that the problem would be straightforward if a social welfare maximizing fee can be imposed. In this case, revealing delay information can only help the social planner since, in the observable case, a customer would enter only when it is socially desirable to do so, but this is not the case in the unobservable model. The more challenging case is when pricing cannot be socially controlled, for example, because price regulation is not desirable, but information suppression can be socially controlled. Under the assumption of a revenue-maximizing toll, the values of
\(\nu _s\) and
\(\lambda \) play similar roles, but the threshold on
\(\lambda \),
\(\varLambda _S\), is different and it is shown that
\(\varLambda _S < \varLambda _R\). Thus, a social planner may want to reveal the queue length when it is not optimal for a revenue maximizer to do so, i.e., for
\(\varLambda _S< \lambda < \varLambda _R\). However, it is never optimal to suppress information when a revenue maximizer voluntarily chooses to reveal it, i.e., for
\(\lambda > \varLambda _R\).
Chen and Frank [
16] study how information suppression impacts throughput. Intuitions similar to the ones in Hassin [
33] continue to apply, so we will be brief. In particular, for a fixed admission fee, the role played by the system’s load is prominent. On the one hand, if the arrival rate is low, in particular
\(\lambda < \varLambda ^*\), then customers may be turned away by real-time queue-length information, while they would have joined with a (low) average wait-time information. This implies that
\(\lambda _O < \lambda _U\), i.e., the effective joining rate is smaller in the observable system than in the unobservable system. On the other hand, if the arrival rate is high, in particular
\(\lambda > \varLambda ^*\), then
\(\lambda _O > \lambda _U\). Shone et al. [
74] take a different view and focus on the situation where the decision of a service provider to reveal the queue-length information
does not affect throughput. Shone et al. [
74] assume out the possibility of optimizing the admission fee. They compare the observable and unobservable systems in terms of joining rates, both individually optimal (selfish) and socially optimal (altruistic), in addition to various other system performance measures. The authors derive necessary and sufficient conditions for the equality of equilibrium selfish and altruistic joining rates between the observable and unobservable systems and show that both equalities cannot simultaneously hold. Shone et al. [
74] also observe that the decision of whether or not to reveal the queue length depends strongly on
\(\nu _s\), as was observed in Chen and Frank [
16].
A network of providers. The papers above focus on a setting with a single provider. Singh et al. [
75] consider a competitive environment with two service providers instead. These providers may choose to diffuse different levels of information, either real-time or historical. The paper studies the first mover’s benefit, i.e., the first provider to announce real-time information. It considers two parallel
M /
M / 1 queues, in a multi-period setting, where one provider announces the real-time queue length, and the other provider announces the expected delay of the previous period. For a performance comparison, the authors consider the market share and the expected delay, and customers join the lower-delay alternative. The authors find that the benefit of being the first mover depends on the service capacity. In particular, for the lower-capacity provider, being the initiator in announcing real-time information increases the market share and reduces delays. However, the same does not hold for the higher-capacity provider, where results are mixed. The authors also find that social welfare always increases when there is benefit on market share and delay.
Dong et al. [
19] also consider a network setting with multiple providers, but they focus on a network of hospitals instead. In particular, they study, in the context of an empirical investigation, the impact of delay announcements on coordination in the network. Coordination is measured through the correlations of delays between hospitals: There is synchronization if those correlations are positive. This observation is rooted in a queueing-theoretic result which establishes that the join-the-shortest-queue (JSQ) discipline synchronizes queues in the system. Indeed, if customers check the delay information, then it is reasonable to assume that they would join the shortest queue, which would then lead to synchronization. Thus, exploring the impact of delay information reduces to studying correlations between the waiting times at adjacent hospitals. By relying on data of real-life announcements and patient response (measured through online searches), the authors investigate whether the announcements do indeed impact the behavior of patients. They provide empirical evidence that this is indeed the case. They also conduct an extensive numerical study to investigate how sensitivity of customers to delays, the load of the system, and the heterogeneity between hospitals impact the synchronization level in the system. They show that using average wait predictors may lead to oscillations in the system, where customers systematically flock to one of the two queues; this numerical observation is studied in Pender et al. [
66].
3.4 Granularity, timing, and breadth of the delay information
The papers above consider either full revelation or no revelation of real-time system-state information. However, there are other considerations, such as the timing, granularity, and breadth of the shared delay information. We now survey papers which study decisions pertaining to those characteristics.
“Discrete” information: High and low announcements. The idea that full information may not be necessary, and that a discrete high-low type of announcement may suffice, already follows immediately from Naor [
65]. Indeed, in the observable case, customers follow a threshold-type joining decision; this indicates that only the information on whether or not the queue length exceeds a threshold,
L, should suffice. Because this information structure is much simpler, there is interest in studying it. We note that setting
\(L=0\) corresponds to the unobservable model in Edelson and Hilderbrand [
24].
Altman and Jimenez [
6] consider high-low announcements when there is no pricing decision. First, the authors consider that the value of
L is fixed (not necessarily at optimum). In the social planner problem, they optimize the probabilities of accepting an arrival if the queue length is below or above
L. Next, they consider the individual optimization problem where utility-maximizing customers make their joining decisions, and investigate the ensuing equilibrium. In both problems, the optimal admission strategy has the form of either accepting all arrivals when the queue length is below
L, or rejecting all arrivals when it is above
L. The authors also show that imposing a socially optimal
L value in the individual optimization problem does not lead to the socially optimal outcome. Hassin and Koshman [
39] consider a similar setting as in Altman and Jimenez [
6], albeit with pricing decisions. In particular, customers are charged
\(p_L\) when the queue length is below
L, and
\(p_H\) otherwise. Hassin and Koshman [
39] demonstrate how to obtain the maximum value of social welfare in Naor’s model by using their coarse dynamic pricing scheme.
The above two-signal strategy arises at equilibrium in Allon et al. [
5]. In this paper, the authors relax two fundamental assumptions: (i) that the firm is truth-telling in revealing information, and (ii) that the information shared is quantifiable and verifiable by customers. As such, they allow for a richer information set which also includes intentional vagueness: A firm is intentionally vague when it provides the same announcement in different states of the system. They show that even though the information provided to customers is non-verifiable, it can improve the profits of the firm and the expected utility of customers. The incentives of the firm and its customers are neither perfectly misaligned (they both prefer shorter waits), nor perfectly aligned (the firm benefits from higher throughput, whereas the customers do not). This misalignment between the firm and its customers plays a key role in the analysis: Depending on its level, different equilibria emerge. Of particular interest are equilibria with influential cheap talk, i.e., ones where the firm can induce distinct customer actions based on different unverifiable messages.
Different levels of information. We now turn to the literature investigating the problem of finding the “best” type of delay information to share. Duenyas and Hopp [
20] investigate that problem in a manufacturing setting. Each customer who places an order generates a reward for the firm, and there is a penalty for being late (per unit time exceeding the quoted lead time). In response to a quoted lead time,
a, each customer places an order with probability
p(
a). Duenyas and Hopp [
20] derive an optimal quote which maximizes the expected profit (revenue minus penalty cost), under both infinite (
\(G/G/\infty \)) and finite (
G /
G / 1) capacity settings. In the infinite-capacity case, the optimal quote does not depend on the current backlog in the system. In the finite-capacity alternative, the optimal lead-time quoting policy is state-dependent and increasing in the state, i.e., the higher the congestion, the higher the lead-time quote. Specifically, a profit-maximizing firm should give granular, state-dependent information rather than rely on a coarse information-sharing scheme.
In their model, Duenyas and Hopp [
20] trade the reliability of the quoted delay for maximizing throughput: While there is a penalty for being late, the firm is not, otherwise, restricted in the quote that it provides, i.e., it is not constrained to being reliable. In contrast, Dobson and Pinker [
18] consider a similar problem but assume that the firm must provide reliable quotes: The state-dependent lead-time quote provided,
\(l_i\), depends on the number
i of customers in the system and is a fractile from the conditional wait-time distribution which must be met
\((100\tau )\%\) of the time. In other words, letting
\(W_i\) denote the conditional steady-state waiting time, we must have that
\({\mathbb {P}}(W_i \le l_i) = \tau \). The proportion of customers who join the system, in response to
\(l_i\), is given by
\(\alpha (l_i, \tau )\). Dobson and Pinker [
18] compare alternative scenarios,
\(S_k\), which reflect different levels of information granularity: For scenario
\(S_k\), customers are provided with a state-dependent announcement
\(l_i\) for
\(i < k\), and with a static announcement for
\(i \ge k\). Increasing
k amounts to increasing the granularity of the delay information. The authors derive a sufficient condition under which sharing more information increases throughput, and emphasize that this need not always be the case. Importantly, they demonstrate that higher throughput may also be associated with lower expected waiting times, and less variable waits, because the delay information deters customers from joining highly congested states, and encourages customers to join low-congestion states. They also highlight the importance of customer heterogeneity, i.e., the extent to which different information granularity leads to different demand rates: The greater the heterogeneity, the higher the throughput, i.e., the higher the value that can be derived from quoting lead times.
The role played by customer heterogeneity is also central in the work of Guo and Zipkin. Guo and Zipkin [
28] consider three levels of information: (1) no information, (2) partial information, i.e., queue length upon arrival, and (3) full information, i.e., exact waiting time. For performance measures, they consider throughput and the expected customer utility. Customers are assumed to be heterogeneous in their delay costs. Specifically, each arriving customer has a cost type,
\(\theta \), which is drawn from a continuous and bounded distribution,
H, and density function,
h. There is also a basic cost function,
c(
w), associated with a wait
w. Thus, the cost incurred by a
\(\theta \)-customer who is delayed for
w is equal to
\(\theta c(w)\). Different levels of delay information incite more or fewer customers to join. The information provided also segments customers depending on their delay sensitivity: A customer who joins under one type of information may balk under another type. Guo and Zipkin [
28] demonstrate that both system throughput and customer utility, under different information levels, are impacted by the shape of the customer delay distribution. Depending on that distribution, they characterize conditions under which information helps either the customers or the service provider. The main takeaway is that more information may or may not be beneficial, depending on the distribution of customer delay sensitivity. In subsequent papers, the above results are generalized to systems with phase-type service times [
29], different levels of information [
30], and alternative cost functions [
31].
In a series of papers, Burnetas and Economou [
15], Economou and Kanta [
22,
23], and Economou et al. [
21], the authors quantify the impact of state information on system dynamics under various assumptions. Burnetas and Economou [
15] consider an
M /
M / 1 queue with setup times. In particular, when a new customer arrives to an empty system, the server requires an exponentially distributed time with rate
\(\theta \) before beginning service. At time
t, the state of the system is described by the pair (
N(
t),
I(
t)), where
N(
t) is the number of customers in the system and
\(I(t) = 0\) or 1 is the state of server (idle or busy, respectively). Customers may be exposed to different levels of information about the system, corresponding to four cases: (i) fully observable, where customers observe both
N(
t) and
I(
t); (ii) almost observable, where customers observe only
N(
t); (iii) almost unobservable, where customers observe only
I(
t); (iv) fully unobservable, where customers do not observe either
I(
t) or
N(
t). In all cases, customer equilibrium strategies are analyzed, as well as the stationary behavior in the system and the social benefit for all customers. Economou et al. [
21] consider an extension of Burnetas and Economou [
15] where both general service and general setup times are allowed. Economou and Kanta [
23] assume that the waiting space is divided into compartments, to be served sequentially in increasing order, and joining customers may know either the compartment number (but not their position in the compartment that they join) or their position within a compartment (but not the compartment number). Both information levels correspond to partial information since customers do not fully observe the system state in either case. For a frame of reference, if a customer knows both the compartment index and the compartment position, then the model reduces to the model in Naor [
65], whereas if neither are known then the model reduces to the model in Edelson and Hilderbrand [
24]. Economou and Kanta [
22] and Wang and Zhang [
80] assume that the server may break down and require repair. The time to repair is considered to be equal to 0 in the former and is exponentially distributed in the latter. The authors in those two papers compare two levels of information: (i) fully observable, where customers know both the queue length and the state of the server, and (ii) partially observable, where customers know only the queue length. Both papers compare equilibrium threshold balking strategies in their contexts.
Timing and breadth. The question of when to make a delay announcement, and the extent to which information should be shared, have also been investigated in the literature. He and Down [
42] rely on both heavy-traffic analysis and simulation to study performance in a queueing system where only a fraction of customers are informed about waiting times. Specifically, they consider two customer classes and two server pools. Dedicated customers in each class can only be served by one of the two pools, for example, because of a language requirement. A fraction of customers is flexible and may choose one of the two server pools depending on which has the shortest queue. He and Down [
42] focus on the expected waiting time for both classes and demonstrate that “a little flexibility goes a long way” in that delay information (the queue length) significantly improves performance even when a small proportion of customers are informed about waiting times. They also address the question of information updating by considering, numerically, a setting where the mean waiting time is updated periodically, and customers use the most recent update in making their joining decisions. They show that there could be significant degradation in performance if the delay information is not updated frequently enough, and the system may experience oscillation behavior because customers herd together for one queue for a period of time.
Hu et al. [
44] also address the question of the breadth of the information shared. They consider a setting where only a fraction of customers are informed about the queue length in the system. Informed customers make their joining decisions based on the observed queue length. Uninformed customers make their joining decisions based on the expected waiting time in the system. The fraction of informed customers is assumed to be exogenous. Informed customers join the system in accordance with the threshold joining policy in an observable queue, as in Naor [
65]. Uninformed customers randomize their joining decisions. Uninformed customers indirectly influence informed customers by influencing the distribution of the queue length in the system. The authors find that, in systems which are not under very low loads, informing a fraction of customers about real-time delay information increases either the throughput or the social welfare. Their results depend on both the offered load in the system and the joining behavior of uniformed customers. To relate their results to Chen and Frank [
16]: They find that when the offered load is low enough, throughput decreases with the information. Similarly, if the offered load is high enough, then throughput increases with the information. However, in the intermediate region for the offered load, throughput is maximal if only a fraction of customers are informed. Also, while the standard view, as in Hassin [
33], is that social welfare is always improved by revealing the queue, the authors demonstrate that when the offered load is high enough, it is optimal to have only a fraction of informed customers, i.e., social welfare does not always increase by revealing the queue length to everyone. In short, the presence of uninformed customers improves throughput under low offered loads and increases social welfare under high offered loads.
Despite its practical importance, the question of timing of the announcements remains understudied, with the vast majority of papers assuming that the announcement is given immediately upon arrival of the delayed customer. At a high level, the trade-off is as follows: Postponing the announcement allows the firm to make a more informed decision about whether or not to admit the customer. With more information at its disposal because of the delay in making the announcement, the firm should benefit. However, postponing the announcement also means potentially keeping customers longer in queue. Thus, it is not clear whether a firm would want to resort to this postponement. Allon and Bassamboo [
4] address this question in the context of an unobservable
M /
M /
N queue; the model specifics are, otherwise, similar to Allon et al. [
5]. The authors focus on identifying conditions under which influential cheap talk emerges in equilibrium. To model the system with postponed announcements, they consider a two-stage system. The first stage, which models, for example, a call center’s IVR, is an infinite-server queue which is essentially a delay station. The second stage is an
M /
M /
N queue: Upon entry to this
M /
M /
N queue, the firm makes a non-verifiable cheap talk type of delay announcement. The authors characterize the optimal admission policy for the firm in the second stage and demonstrate that it is of a threshold type where the threshold depends on the number of customers in the first stage. They also characterize the set of possible equilibria in the delayed cheap talk game and compare these to the non-delayed game. They show that such a comparison is complex: The firm may or may not benefit, i.e., create credibility and impact customer behavior, from delaying the delay information.
Pender et al. [
66] also consider the impact of delaying the delay announcements. Specifically, they study the oscillation behavior observed in both He and Down [
42] and Dong et al. [
19]. They use two deterministic fluid models to examine the effect of providing customers with delayed delay information. In particular, they consider two systems: System I consists of two infinite-server queues where arriving customers receive delayed information about the queue length. The delay in information is quantified by a deterministic parameter
\(\varDelta \). Customers choose which queue to join depending on the delayed delay information that they receive, in accordance with a multinomial logit customer choice model. By analyzing the dynamics of the resulting fluid model, the authors demonstrate that there is asynchronous behavior between the two queues if
\(\varDelta \) is large enough, i.e., there are systematic oscillations and no stable equilibrium. System II also consists of two infinite-server queues, but the delay information is in the form of a time-average of the queue-length information in a window of length
\(\varDelta \) instead. In this case as well, the authors demonstrate a similar asynchronous behavior between the two queues if the window over which the average is taken is long enough.
Roet-Green and Hassin [
69] also consider a setting where customers learn delayed information about the queue length in the system but, contrary to Pender et al. [
66], the delay in information is assumed to be random (exponentially distributed), corresponding to the travel time needed for a customer to join the queue after the delay information is received. In other words, customer joining decisions are not instantaneous. A customer joining strategy is a vector that assigns a probability of traveling to each possible queue length. Because the travel time is not negligible, a customer who had decided to join a system based on “old” queue-length information may decide to balk upon arrival to the system if the real-time queue length is too long. Thus, customer decisions are made at two successive epochs. The authors investigate the structure of a symmetric Nash equilibrium. They find that customers often adopt a double-threshold strategy: Customers travel when the queue length is short, balk or mix between balking and traveling when the queue length is at an intermediate length, and travel when the queue length is long. The intuition is that a customer who observes an intermediate queue assumes that previous customers must have observed short queues, and are now on their way. Thus, the system’s congestion is likely to soon increase and, consequently, the customer decides to balk. The intuition is reversed when a customer observes a long queue: In this case, that customer assumes that previous customers must have observed an intermediate queue and balked. Thus, the congestion in the system is likely to soon decrease, and the customer decides to join the queue. The authors also demonstrate that social welfare may be higher under the no-information model than under the delayed information model.
Hu and Wang [
45] consider a setting where customers share queue-length information with each other. Because information is shared at the arrival epoch of an arriving customer, it constitutes lagged information for a future customer who wishes to join the system based on this “historical” information. Customers decide to join or balk based on previous information, but do not update their decisions upon arrival to the system because they do not observe the queue length in the second stage, unlike in Roet-Green and Hassin [
69]. Indeed, they observe the queue length only upon entering the system. The authors investigate how this shared information structure affects throughput, expected queue length, and social welfare in the system, and draw comparisons between the full-information and no-information models. They find that (i) throughput under shared information is less than that under full information; (ii) the expected queue length under shared information is less than that under full information; and (iii) social welfare may be lower or higher under shared information, depending on the offered load in the system.
3.6 Empirical studies
The literature above is analytical in nature. The recent availability of granular data, for example, at the call-by-call level in call centers, has made it possible to study changes in customer behavior, in response to the announcements. We now recap the main results from those papers.
Early empirical evidence which illustrates how customers update their patience times in response to delay announcements, in call centers, can be found in Mandelbaum and Zeltyn [
61] and Feigin [
25]. Akşin et al. [
3] undertake a more detailed empirical study to explore the impact on customer behavior and, in turn, on system performance, due to the announcements. The authors begin by providing empirical evidence, using a Cox regression analysis, substantiating the impact of the announcements on the abandonment behavior of (call center) customers. Their data set has two priorities, and the announcements are equal to the queue position or the elapsed waiting time of the longest waiting customer; they are also made sequentially over time. The study reveals that both the composition and sequence of the announcements have an impact on customer abandonment behavior, and that customers who receive longer announcements, or see a deteriorating delay condition (increasing announcements during their wait), abandon earlier. The impact of the announcements is also affected by the priority class of the customer.
In order to explore the operational impact of the announcements, the authors use a structural estimation approach: They model callers’ abandonment decisions as in the optimal stopping time model introduced in Akşin et al. [
2]. Specifically, time is divided into periods, and a customer makes a decision on whether or not to abandon at the beginning of each period. Customers are heterogeneous in both the rewards that they receive from service and their per-unit waiting costs (both of these are drawn from lognormal distributions). The announcements received impact the abandonment distribution of callers which, in turn, impacts their decisions on staying or reneging, sequentially over time. The parameters of that endogenous model for caller abandonment are estimated from data, for each priority class. In order to study the impact of the announcements, the authors assume a setting where customers receive only one announcement upon arrival. By relying on the approximation in Whitt [
87], they characterize the equilibrium that arises in the system in steady state, where the equilibrium is defined as one where the distribution of waiting times based on the optimal stopping time model coincides with the distribution of the waiting time using the approximation from Whitt [
87]. Through a simulation study, Akşin et al. [
3] then study the operational impact of the announcements. Their main conclusions are as follows: (i) delay information helps customers make better decisions in the sense that callers who receive a long (short) delay announcement abandon more and faster (less and slower); (ii) the impact of the announcements is strongest when the state of the system is congested; and (iii) the increased granularity of the wait-time announcement (exact queue length position vs. range for the number in queue) leads to a smoother change in caller behavior.
Yu et al. [
89] also adopt an empirical approach in studying the impact of delay announcements on customer patience. They begin by introducing the concepts of informative and influential announcements. An informative announcement is one that carries information about the current congestion level in the system, i.e., one where longer delays do indeed correspond to larger announcements. An influential announcement is one where the patience of customers changes in response to the announcements. By statistically comparing the survival distributions of customers, the authors find that the impact of the announcements is ambiguous: Some announcements are influential and/ or informative, whereas others are not. This prompted the authors to undertake a deeper investigation into the dynamics of the performance impact of the announcements; they did so by relying on a structural estimation approach.
The structural model is as follows: Customers may return multiple times and, at each return, receive multiple delay announcements during their wait. At each announcement epoch, the caller revisits their decision of staying until service or reneging. Customers are heterogeneous, but their heterogeneity is modeled through their cost–reward ratio rather than separately through their service rewards and waiting costs. The cost–reward ratios and variance of idiosyncratic shocks are then estimated from data. The authors consider two models: (i) a base model where customers update their beliefs about offered waits using the announcements received; (ii) a refined model where not only customer beliefs but also the waiting costs of customers are impacted by the announcements. The authors find that their second model explains the ambiguous impact on customer impatience observed earlier in their data analysis. In particular, they show that while the cost–reward ratio decreases in the offered wait associated with the announcements (“I waited so long already, so why not wait a little longer?”), the variance of the idiosyncratic shocks increases. This dual effect explains the nontrivial impact of the announcements on customer behavior. The authors then explore, through a simulation study, what managerial implications can be drawn from their analysis. In particular, they find that providing delay announcements leads to an increase in the surplus of customers (surplus is equal to reward minus waiting cost), and that less refined delay information (in the form of three signals on the congestion of the system) may lead to higher customer surplus than more granular information.
Yu et al. [
90] undertake a field experiment in an Israeli bank’s call center to explore the loss aversion of customers in time, and its dependence on the delay information available. Specifically, customers who receive delay announcements typically form a reference point based on the announcement received. If the actual waiting time experienced is smaller than that reference point, then the time difference is considered a gain. If the actual waiting time experienced is larger, then the time difference is considered a loss. Loss aversion means that customers value lost time more than they value gained time. Customers are either provided with accurate, inaccurate, or no announcements. By using a structural model to infer the customers’ value of time (the abandonment behavior is modeled through an optimal stopping time problem), the authors find that customers indeed exhibit loss aversion, and that this is independent of the correctness of the delay information given. (Loss aversion is measured through an increase in the per-unit waiting cost after the announcement.) However, the accuracy of the delay announcement does have an impact on the reference point formed. Specifically, with accurate information, the reference point coincides with the delay information given, whereas with inaccurate information, customers use the observed average delay as a reference point instead. This contradicts the standard viewpoint that firms should give an inaccurate but high announcement to make the customers “feel better about their waits.” Indeed, the analysis suggests that customers may disregard such inaccurate announcements but retain their loss aversion.
In a related paper, Webb et al. [
81] rely on a proportional hazards model for the hazard rate of the abandonment distribution instead. The covariates used in that model include the gain and loss in time effects due to the announcements. In particular, the announcement creates a reference point which is the expectation of the wait time for service. The authors find that a model in which customers react to the announced value of the first announcement, and in which reference points are induced by the first two announcements, is the best fit to their data. They also find that customers are loss averse, that they fall for sink cost effects, and that a higher announcement leads to more abandonment. Finally, they study implications on staffing decisions and find that firms who take behavioral implications of the announcements into account can significantly reduce their staffing levels.