Research on agent negotiators has given rise to a broad variety of bidding strategies that have been established both in the literature and in implementations [
47,
59,
60,
86,
95,
125]. Examples of such general agent negotiators in the literature include, among others: Zeng and Sycara [
206], who introduce a generic agent called
Bazaar; Faratin et al. [
60], who propose an agent that is able to make trade-offs in negotiations and is motivated by maximizing the joint utility of the outcome; Karp et al. [
96], who take a game-theoretic view and propose a negotiation strategy based on game-trees; Jonker et al. [
95], who propose a a concession oriented strategy called
ABMP; and Lin et al. [
124], who propose an agent negotiator called
QOAgent.
5.4.1 Learning the bidding strategy using regression analysis
An agent is generally unaware of the opponent’s exact negotiation strategy, but might have knowledge about the type of strategy used. If such knowledge is available and can be captured in a formula with unknown parameters, the opponent’s strategy can be estimated by applying regression analysis.
There are two main approaches to this problem. The first is given by Mudgal and Vassileva [
139], who employ probabilistic influence diagrams to predict the opponent’s counteraction to an offer in a single-issue negotiation. Bayesian learning is used to update a probabilistic influence diagram of the opponent, which yields the probability distribution for the next opponent’s action only, so this can be viewed as a one-step regression method.
Another key approach is by Hou [
85], who introduces a method to estimate the opponent’s strategy in a single-issue negotiation assuming that a standard tactic dependent on time, behavior, or resources is used (as discussed by Faratin et al. [
59]). Hou derives a model for the time-dependent and resource-dependent strategies and use a non-linear regression to estimate their parameters. The opponent is estimated to use the best matching model, except when the error is higher than a threshold, in which case the opponent is assumed to use a behavior-dependent tactic. Following a similar approach, Agrawal and Chari estimate the opponent’s decision function as an exponential function [
1]; and Haberland et al. [
73], Ren and Zhang [
161], and Yu et al. [
204] present methods to estimate the decision function when the opponent employs a time-dependent tactic.
Brzostowski et al. [
28] introduce a more general method than Hou to predict the opponent’s bidding strategy, by applying non-linear regression to estimate the parameters of four complex models that mix time- and behavior-based components. The utility gained by using the model is significantly higher than their earlier method, which uses derivatives to estimate the opponent’s strategy [
29] (we will discuss this further in Sect.
5.4.2).
With more focus on application, Papaioannou et al. compare the performance of multiple estimators in predicting the opponent’s bidding strategy [
151,
153]. The estimators are used to predict the opponent’s decision function, which is then used to determine which offer should be proposed in the final round to avoid non-agreement. The setting is a single-issue bilateral negotiation where a client and provider exchange offers in turn. Three parameter estimation methods are evaluated: one is based on polynomial interpolation using cubic splines; another uses 7th degree polynomial interpolation; the third is a genetic algorithm that evolves the parameters of a polynomial function. All methods significantly improve the acceptance ratio of the negotiation.
5.4.2 Learning the bidding strategy using time series forecasting
When little is known about the general structure of the opponent’s bidding strategy, time series forecasting is a viable alternative to the regression-based methods described above. A time series is simply a set of observations that is sequentially ordered in time. In the context of negotiation, the time series typically consists of the utilities of offers received from the opponent, but causally related series can also be used (e.g., perceived cooperativeness of the opponent over time). Learning the opponent’s bidding strategy then boils down to creating a forecast of the time series, using a set of statistical techniques and smoothing methods.
We identified four ways to do so: using neural networks, derivatives, signal processing methods, and Markov chains.
Artificial neural networks The most frequently used method to predict the opponent’s offers is to represent the opponent’s decision function by an artificial neural network. The network is first trained using a large database of previous negotiation exchanges and is then used to predict the next bid. Neural networks are very powerful and can be used to approximate complex functions; however, studying their structure will not in general give any additional insights in the function being approximated.
Oprea [
146] was one of the first to demonstrate the potential of using neural networks to predict the opponent’s future offers in bilateral negotiations. The approach focuses on single-issue negotiations and only takes one of the negotiation sides into account. The input neurons are the values for the last five opponent’s bids, which means that the agent’s own offers are assumed to have no influence on the opponent’s behavior.
To predict the offers of a
human negotiator, Carbonneau et al. [
35] use an artificial neural network in a specific domain consisting of four issues. They extend their approach in [
36], in which all possible pairs of issues are allowed to serve as input to the neural network. While this significantly complicates the structure of the neural network, this allows to find patterns between issues. This also facilitates training an opponent model on a particular scenario and makes it easy to apply it to scenarios where one of the issues is removed from the negotiation domain. A similar approach is discussed by Lee and Ou-Yang [
115], who are even able to predict the
value for each issue; for this, four output neurons are created, each returning the value for one of the four issues. Despite that the model was trained using data derived from other opponents, the authors find a positive correlation between the actual and predicted values.
For the general multi-issue case,
multilayer perceptrons (MLPs) can be used, as is done by Masvoula et al. [
133]. MLPs are artificial neural networks where some nodes have a nonlinear activation function. Masvoula et al. test two networks in an experimental setting: a network in which each issue is approximated by a separate MLP, and a network where a single MLP is used for all issues. The amount of input neurons and hidden layer neurons are empirically determined for both networks, after which they are shown to reliably predict the opponent’s next offer, with the single MLP network resulting in the lowest mean error. In more recent work [
132], Masvoula investigates the performance of two artificial neural networks that learn the opponent’s strategy without relying on historical knowledge. The first model is a simple MLP that is retrained every round, using the complete negotiation trace of the opponent. The second one is more advanced (and outperforms the first one), as the structure of the neural network is optimized in every round, using a genetic algorithm that rates neural networks based on their complexity and prediction error.
The methods above do not explicitly constrain the opponent’s strategy. If it is known that the opponent employs a time-dependent tactic, work by Rau et al. and Papaioannou et al. [
151,
153,
160] can be used. Owing to the reduced search space of time-dependent tactics, Rau et al. [
160] find that the concession tactic and weight of every issue offered by the opponent can be learned from this process in an exact manner. Papaioannou compares the performance of five estimators for the opponent’s bidding strategy, of which we have already discussed three regression-based estimators in Sect.
5.4.1. Of the remaining two estimators, one is based on a multi-layer perceptron neural network, and the other uses a radial basis function neural network [
151,
153]. The latter estimator outperforms all other estimators when it comes to predicting the opponent’s future offers. It achieves the lowest overall error and its application results in the the largest number of successful negotiations.
Derivatives Derivatives of the concession function of the opponent is another way to reveal a lot of information on the type of decision function used and to what extent the opponent responds to different negotiation factors. In a single-issue negotiation, given a sequence of opponent offers
\(b_i \in [0, 1]\), the derivatives of order
k are defined as follows :
$$\begin{aligned} {\varDelta }^1 b_i= & {} b_{i+1} - b_i, \end{aligned}$$
(6)
$$\begin{aligned} {\varDelta }^{k+1} b_i= & {} {\varDelta }^{k} b_{i+1} - {\varDelta }^{k} b_i. \end{aligned}$$
(7)
Brzostowski et al. [
29] argue that there are two main factors contributing to the overall behavior of a negotiating agent: a time-dependent and a behavior-dependent component. They make a prediction for both components and combine the results to estimate the opponent’s behavior. The two predictions are combined based on two observational measures: the
time influence metric and the
behavior influence metric.
For the calculation of the
time influence metric, it is assumed that a time-dependent tactic may be modeled by using a polynomial or an exponential function. If the opponent uses a pure time-dependent tactic, all derivatives should have the same sign:
$$\begin{aligned} \forall _{i, j, k} \, \mathrm {sgn}\big ({\varDelta }^k b_i\big ) = \mathrm {sgn}\big ({\varDelta }^k b_j\big ). \end{aligned}$$
(8)
For each
kth order derivative, Brzostowski et al. calculate how strongly the time-dependent assumption holds by comparing the amount of positive and negative signs within an order. Given a
kth order derivative, if all the differences are either positive or negative, the time-dependent criterion is fully satisfied, otherwise the time-dependent assumption is violated. The results of all derivatives can be combined to create the time influence metric.
The
behavior influence metric measures to what extent the opponent responds to the agent’s actions. For each round
i, the agent’s own concession
\({\varDelta }^1 a_i\) is compared with the opponent’s concession
\({\varDelta }^1 b_i\), resulting in a metric
\(r_i\) for each round:
$$\begin{aligned} r_i = \frac{{\varDelta }^1 a_i}{{\varDelta }^1 b_i}. \end{aligned}$$
(9)
The results for each round are then aggregated in a weighted sum called
r. If
r is equal to one, then the opponent makes concessions of the same size. For
\(r < 1\), the opponent is competitive, and if
\(r > 1\), the opponent is cooperative. This metric can be combined with a measure for the monotonicity of the sequence
\(r_i\) to create the behavior influence metric, analogous to the time-dependent metric.
With this information, two predictions can be made of the future behavior of the opponent and then combined to yield the final forecast. The behavior-dependent prediction uses extrapolation of the behavior influence metric to predict the opponent’s next offer. The second prediction is solely based on time and uses the negotiation history to determine the opponent’s next offer. The prediction is based on the concavity and convexity of the opponent’s concession curve as measured by the differentials. The opponent’s time-dependent behavior can then be approximated by polynomials to make a prediction of the opponent’s future offers. Again, if it is known the opponent uses a time-dependent strategy, more specific methods can be used to approximate the opponents concession curve, such as the derivative-based approach by Mok and Sundarray [
137].
Signal processing A third way to forecast the opponent’s offers is to employ techniques used in signal processing. This type of modeling technique has recently attracted attention from a number a negotiation researchers, and three main methods have been developed since 2010.
The first main approach is to use a
Gaussian process to predict the opponent’s decision function. During the negotiation, the opponent’s bidding history is recorded as a series of ordered pairs of time and observed utility. Next, a Gaussian process regression technique is used to determine a Gaussian distribution of expected utility for each time step. Williams et al. [
196,
197] use this technique to estimate the optimal concession rate in a multi-issue negotiation with time-based discounts. Their approach can handle a wider range of scenarios compared to the derivation-based methods discussed above, because the opponent tactic can be more complicated than a weighted combination of time- and behavior-dependent tactics. To counter noise, only a small number of time-windows are sampled, from which only the maximum utility offered by the opponent is used to make the predictions. The strategy was implemented in the
IAMhaggler2011 [
199] agent, which finished third in ANAC 2011 [
10]. The agent performed much better than the others on large domains, however only performed averagely on small domains.
Chen and Weiss [
44,
46] also predict the opponent’s preferences to determine the agent’s optimal concession rate. Similar to Williams et al., the maximum offered utility in a set of time windows is recorded. This time,
discrete wavelet transformation is used to decompose the signal in two parts: an approximation and a detail part. The idea is that the first captures the trend of the signal, whereas the latter contains the noise, which is therefore omitted. After an initial smoothing procedure, cubic spline interpolation is used to make a prediction for future time windows. The end result is a smooth function that indicates the maximum utility that can be expected in the future.
An alternative method employed by Chen and Weiss relies on
empirical mode decomposition and
autoregressive moving averaging [
45]. The same procedure is used to sample the opponent’s decision function, but now, empirical mode decomposition is used to decompose the sampled signal into a finite set of components, after which autoregressive moving averaging is used to predict the future behavior of each of these components.
Finally, when the opponent is expected to change its strategy over time without signaling this explicitly to the agent, the work by Ozonat and Singhal [
150] can be used to estimate the opponent’s strategy in a multi-issue negotiation. Using switching linear dynamical systems, a technique commonly used in signal processing literature to model dynamical phenomena, the opponent’s decision function is predicted in terms of what utility the agent can expect in the future.
Markov chains The final time series forecasting method relies on
Markov chains. The idea is that the set of opponent strategies is known; however, it is undisclosed when the opponent changes its strategy. The set of strategies make up the states of a Markov chain, where the transition matrix represents the probability of going from one state—a strategy—to another. Narayanan and Jennings [
140] use this method to model the opponent’s strategy in a single issue negotiation and apply Bayesian learning to estimate the transition matrix, where each hypothesis presents a possible transition matrix. The hypotheses are updated each round using the received opponent’s offers, and then used to derive a counter-strategy.