Learning about the opponent in automated bilateral negotiation: A comprehensive survey of opponent modeling techniques

(1)

DOI 10.1007/s10458-015-9309-1

Learning about the opponent in automated bilateral

negotiation: a comprehensive survey of opponent

modeling techniques

Tim Baarslag1 · Mark J. C. Hendrikx2 · Koen V. Hindriks2 · Catholijn M. Jonker2

Abstract A negotiation between agents is typically an incomplete information game, where

the agents initially do not know their opponent’s preferences or strategy. This poses a chal-lenge, as efficient and effective negotiation requires the bidding agent to take the other’s wishes and future behavior into account when deciding on a proposal. Therefore, in order to reach better and earlier agreements, an agent can apply learning techniques to construct a model of the opponent. There is a mature body of research in negotiation that focuses on modeling the opponent, but there exists no recent survey of commonly used opponent model-ing techniques. This work aims to advance and integrate knowledge of the field by providmodel-ing a comprehensive survey of currently existing opponent models in a bilateral negotiation set-ting. We discuss all possible ways opponent modeling has been used to benefit agents so far, and we introduce a taxonomy of currently existing opponent models based on their under-lying learning techniques. We also present techniques to measure the success of opponent models and provide guidelines for deciding on the appropriate performance measures for every opponent model type in our taxonomy.

Keywords Negotiation· Software agents · Opponent model · Learning techniques · Automated negotiation· Opponent modeling · Machine learning · Survey

B

Tim Baarslag T.Baarslag@soton.ac.uk Mark J. C. Hendrikx M.J.C.Hendrikx@tudelft.nl Koen V. Hindriks K.V.Hindriks@tudelft.nl Catholijn M. Jonker C.M.Jonker@tudelft.nl

1 _{University of Southampton, Southampton, UK} 2 _{Delft University of Technology, Delft, The Netherlands}

(2)

1 Introduction

Negotiation is a process in which parties interact to settle a mutual concern to improve their status quo. Negotiation is a core activity in human society, and is studied by various disciplines, including economics [147,158], artificial intelligence [69,91,106,107,118,182], game theory [19,69,91,118,120,147,172], and social psychology [170].

Traditionally, negotiation is a necessary, but time-consuming and expensive activity. Therefore, in the last two decades, there has been a growing interest in the automation of negotiation and e-negotiation systems [18,71,91,99,107], for example in the setting of e-commerce [20,79,105,126]. This attention has been growing since the beginning of the 1980s with the work of early adopters such as Smith’s Contract Net Protocol [186], Sycara’s persuader [189,190], Robinson’s oz [164], and the work by Rosenschein [168] and Klein [102]. The interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators and to find better outcomes than human negotiators [20,55,89,121,126,149,192].

The potential benefits of automation include reduced time and negotiation costs result-ing from automation [33–35,126], a potential increase in negotiation usage when the user can avoid social confrontation [27,126], the ability to improve the negotiation skills of the user [80,121,125], and the possibility of finding more interesting deals by exploring more promising portions of the outcome space [80,126].

One of the key challenges for a successful negotiation is that usually only limited informa-tion is available about the opponent [143]. Despite the fact that sharing private information can result in mutual gains, negotiators are unwilling to share information in situations with a competitive aspect to avoid exploitation by the other party [50,75,81,158]. In an auto-mated negotiation, this problem can be partially overcome by deriving information from the offers that the agents exchange with each other. Taking advantage of this information to learn aspects of the opponent is called opponent modeling.1

Having a good opponent model is a key factor in improving the quality of the nego-tiation outcome and can further increase the benefits of automated negonego-tiation, including the following: reaching win-win agreements [90,123,206]; minimizing negotiation cost by avoiding non-agreement [151,153,183,184]; and finally, avoiding exploitation by adapting to the opponent’s behavior during the negotiation [57,85,199]. Experiments have shown that by employing opponent models, automated agents can reach more efficient outcomes than human negotiators [22,124,149].

Besides improving the quality of the negotiation process, opponent models are essential for the transition of automated negotiation from theory to practice. It has been shown that non-adaptive agents are exploitable given a sufficiently large negotiation history as their behavior becomes predictable [24,135]. The risk of exploitation can be minimized by creating adaptive agents that use opponent models to adapt their behavior.

Despite the advantages of creating an opponent model and two decades of research, there is no recent study that provides either an overview of the field, or a comparison of different opponent modeling techniques. Therefore, in order to stimulate the development of efficient future opponent models, and to outline a research agenda for the field of opponent model-ing in negotiation, this survey provides an overview of existmodel-ing opponent models and their underlying concepts. It discusses how to select the best model depending on the negotiation 1_{Despite the usage of the term “opponent”, opponent models can be beneficial for both parties. For example,} Lin et al. [123,124] and Oshrat et al. [149] use opponent modeling techniques to maximize joint utility. The term agent modeling could also apply to our setting; however, it is in line with current practice to call it opponent modeling.

(3)

Table 1 Types of negotiation settings discussed in this work. Classification based on Lomuscio et al. [126]

Parameter Value

Agent setting Bilateral

Deadline Private/public

Domain configuration Single-issue/multi-object/multi-issue

Interaction between issues Yes/no

Preference profiles Private/partially disclosed

Sessions Single/multiple

Strategy Private/partially disclosed

setting, and identifies a number of problems that are still open. One of our major findings is that despite the variety in opponent modeling techniques, most current models rely on a small, common set of learning techniques. Furthermore, it turns out that there are only four types of opponent attributes that are learned by these techniques.

Apart from employing different techniques to build an opponent model, different bench-marks have been used to test the effectiveness of opponent models. This makes it particularly difficult to compare present techniques. An additional contribution of this work is to give an exhaustive overview of measures that are used throughout the literature. We distinguish two types of measures, and we recommend which measures to use to reliably quantify the quality of an opponent model.

The opponent modeling techniques discussed in this work are applicable to a large diver-sity of negotiation protocols. Protocols may differ in many aspects, including the domain configuration, the number of agents, state of issue(s), and availability of information. Table1 provides an overview of the scope of this survey; distinguishing the main parameters of negotiation protocols as defined by Lomuscio et al. [126] and Fatima et al. [65].

Finally, note that the problems involved in automated negotiation are very different from human negotiation. In negotiation sessions between humans, only as little as ten bids may be exchanged, whereas negotiating agents may exchange thousands of bids in less than a minute. Humans may compensate for this lack of information exchange by explicitly communicating information about their preferences, both verbally and nonverbally. To delimit our scope, we do not discuss attributes that are relevant in human negotiations but are not yet used in automated negotiation, such as emotions [54,100].

The remainder of our work is organized as follows. We start by providing an overview of related surveys in Sect.2. Section3sets out the basic concepts of bilateral negotiation. Section4describes the fundamentals underlying the learning methods that have been applied to construct an opponent model. Different opponent models are created to learn different negotiation aspects; we introduce our taxonomy of the various concepts that are learned, and how they are learned, in Sect.5. Section6provides recommendations on how to measure the quality of an opponent model. Finally, in Sect.7we cover the lessons learned, we examine the latest trends, and we provide directions for future work.

2 Related surveys

The field of automated negotiation has produced over 2000 papers in the last two decades. This work covers the period from the first opponent models introduced around 1997 (cf. [205])

(4)

to the latest models developed in 2014 (cf. [37,46,76,77,92]). During this period, several surveys have been conducted that are related to our work, including surveys by Beam and Segev [18], Papaioannou et al. [152], Masvoula et al. [134], Yang [203], and Chen and Pu [42]. Our work incorporates all techniques for bilateral negotiations covered in these surveys, as we consider various types of opponent models based on multiple different learning techniques, including Bayesian learning and artificial neural networks. In comparison to these surveys, we discuss a larger body of research and categorize the opponent models based on the aspect of the opponent they aim to model. Furthermore, we provide an overview of measures used to quantify the quality of opponent models, and provide guidelines on how to apply these metrics.

Beam and Segev surveyed the state of the art in automated negotiation in 1997 [18]. Their work describes machine learning techniques applied by intelligent negotiation agents, mainly discussing the potential of genetic algorithms to learn an effective negotiation strategy. Their survey naturally misses out on more recent developments, such as on-line opponent modeling techniques used in one-shot negotiations, as for example introduced by Buffett and Spencer [31,32] and Hindriks and Tykhonov [83]. More recently, Papaioannou et al. surveyed learning techniques based on neural networks to model the opponent’s behavior in both bilateral and multilateral negotiations [152]. Masvoula et al. also surveyed learning methods to enhance the strategies of negotiation agents [134]. One of the strengths of their survey is that it provides a comprehensive overview of learning methods. The modeling techniques are divided based on the type of strategy in which they are applied. Finally, Chen and Pu survey preference elicitation methods for user modeling in decision support systems [42]. The goal of these systems is to capture the user’s preferences in a setting in which the user is willing to share their preferences, or at least does not try to misrepresent them. While the goal of decision support systems differs from opponent modeling in automated negotiation, similar learning techniques—such as pattern matching—are used to estimate the user’s or opponent’s preferences.

A number of surveys have been conducted on the general topic of automated negotiation, for example by Jennings et al. [91], Kraus [107], Braun et al. [25], and Li et al. [118]. Jennings et al. argue that automated negotiation is a main concern for multi-agent system research [91]; and Kraus examines economic theory and game-theory techniques for reaching agreements in multi-agent environments [107]. Braun et al. review electronic negotiation systems and negotiation agents, concisely describing how learning techniques have been used to learn characteristics of the opponent [25]. Li et al. distinguish different types of negotiation, and briefly discuss opponent modeling [118]. Despite the wide scope of all of the surveys above, their discussion of opponent modeling is limited.

Negotiation is also studied as an extensive-form game within the game theory literature [118], a field of study founded on the work by Nash [141] and Raiffa [157]. In cooperative game theory, the aim is to jointly find a solution within the outcome space that satisfies par-ticular axioms, an example being the Nash outcome (see also Sect.6.1on accuracy measures, p. 36) that satisfies the Nash axioms [141]. Non-cooperative game theory is concerned with identifying rational behavior using the concept of strategy equilibrium: the state in which for every agent it is not beneficial to change strategy assuming the other agents do not switch their tactic [148].

The game theory literature on the topic of negotiation is vast. For an overview we refer to Binmore and Vulkan [19]; Li and Giampapa [118]; and Chatterjee [40]. One prominent example of game theoretic negotiation research is by Rubinstein [172], who considers an alternating offers negotiation protocol without deadline in which two agents negotiate about the division of a pie. Another example is the work by Zlotkin and Rosenschein [208], which

(5)

investigates a monotonic concession strategy that results in a strategy equilibrium. As out-lined in [64], agents do not typically perform opponent modeling in the game theoretic model, but instead determine their strategy through theoretical analysis, which is possible because of the assumption of perfect rationality. The assumption of common knowledge—an assumption typically made in cooperative game theory—can lead to difficulties in practice [40,41,64,118] as competitive agents aim to not share information to prevent exploitation [50,75,81,158]. Other practical issues include the computational intractability of full agent rationality [40,41,91,118] and the applicability of game theoretical results to specific nego-tiation settings only [19,91,118]. Despite these concerns, several authors have promoted the application of game theory results in the design of heuristic and learning negotiation strate-gies [91,118]. For instance, evolutionary game theory (EGT) is a framework to describe the dynamics and evolution of strategies under the pressure of natural selection [178]. In this approach, negotiating agents can learn the best strategy through repeated interactions with their opponents. This has just started to make its impact on research into the negotiation dynamics of multi-agent bargaining settings [10,43]. In Sect.6.2, we discuss EGT as a way to quantify the robustness of a negotiation strategy to exploitability in an open negotiation environment.

An interesting area, although out of scope of this paper, is that of user modeling in general (see e.g., [136] for a survey by McTear on the topic), and in particular that of using machine learning of dialogue-management strategies by Schatzmann and colleagues [180]. McTear’s work surveys artificial intelligence techniques applied to user modeling and is by now 20 years old (a newer one has not been published to date). Characteristics of users modeled by AI techniques include goals, plans, capabilities, attitudes, preferences, knowledge, and beliefs. The relevant parts with respect to our survey are the preference profiling and the distinction between learning models of individual users versus models for classes of users, and between models for one session and models maintained and updated over several sessions.

A survey on preference modeling is by Braziunas and Boutilier [26] and focuses on direct elicitation methods; i.e., by asking direct questions to the user and is therefore not in the scope of this paper. Schatzmann’s survey [180] addresses systems and methods to learn a good dialogue strategy for which automatic user simulation tools are essential. The methods to learn these strategies can be relevant for argumentation-based negotiation systems.

Another related area of research is the topic of machine learning techniques in game playing; e.g., checkers, rock-paper-scissors, scrabble, go, and bridge. Fürnkranz argues that opponent modeling has not yet received much attention in the computer games community [67]—take for example chess, in which opponent modeling is not a critical component. However, it is essential in others, such as computer poker [171]. This is due to the fact that, as in negotiation, maximizing the reward against an effectively exploitable opponent is potentially more beneficial than exhibiting optimal play [171]. These surveys make several distinctions that we do, such as offline and online learning, and they employ many techniques that can also be used in negotiation, such as Bayesian learning and neural networks.

3 Preliminaries

Before we discuss opponent models, we first introduce the terminology used throughout the paper. The defining elements of a bilateral negotiation are depicted in Fig.1. A bilateral automated negotiation concerns a negotiation between two agents, usually called A and B or

(6)

Fig. 1 Overview of the defining elements of an automated bilateral negotiation

The negotiation setting consists of the negotiation protocol, the negotiating agents, and the negotiation scenario. The negotiation protocol defines the rules of encounter to which the negotiating agents have to adhere. The negotiation scenario takes place in a negotiation

domain, which specifies all possible outcomes (the so-called outcome space). The negotiation

agents have a preference profile, which expresses the preference relations between the possible outcomes. Together, this defines the negotiation scenario that takes place between the agents. The negotiation scenario and protocol specify the possible actions an agent can perform, given the negotiation state.

3.1 Negotiation domain

The negotiation domain—or outcome space—is denoted byΩ and defines the set of possible negotiation outcomes. The domain size is the number of possible outcomes|Ω|. A negotiation domain consists of one or more issues, which are the main resources or considerations that need to be resolved through negotiation; for example, the price or the color of a car that is for sale. Issues are also sometimes referred to as attributes, but we reserve the latter term for

opponent attributes, which are properties that may be useful to model to gain an advantage

in a negotiation.

To reach an agreement, the agents must settle on a specific alternative or value for each negotiated issue. That is, an agreement on n issues is an outcome that is accepted by both parties of the formω = ω1, . . . , ωn, where ωi denotes a value associated with the i th

issue. We will focus mainly on settings with a finite set of discrete values per issue. A partial

agreement is an agreement on a subset of the issues. We say that an outcome space defined

by a single issue is a single-issue negotiation, and a multi-issue negotiation otherwise. Negotiating agents can be designed either as general purpose negotiators, that is,

domain-independent [122] and able to negotiate in many different settings, or suitable for only one specific domain (e.g., the Colored Trail domain [66,68], or the Diplomacy game [52,56,108]). There are obvious advantages to having an agent designed for a specific domain: it enables the agent designer to construct more effective strategies that exploit domain-specific information. However, this is also one of the major weaknesses, as such agents need to be tailored to every new available domain and application; this is why many of the agents and learning mechanisms covered in this survey are domain-independent.

3.2 Negotiation protocol

A negotiation protocol fixes the rules of encounter [169], specifying which actions each agent can perform at any given moment. Put another way, it specifies the admissible negotiation

(7)

and that the agents strictly adhere to it. Our focus here is on bilateral negotiation protocols. For other work in terms of one-to-many and many-to-many negotiations (for example to learn when to pursue more attractive outside options in a setting with multiple agents), we refer to [3,119,142,147,156,183]. We do not aim to provide a complete overview of all protocols, instead we recommend Lomuscio et al. [126] for an overview of high-level parameters used to classify them, and Marsa-Maestre et al. [128] for guidelines on how to choose the most appropriate protocol to a particular negotiation problem.

An often used negotiation protocol in bilateral automated negotiation is the alternating

offers protocol, which is widely studied and used in the literature, both in game-theoretic

and heuristic settings (a non-exhaustive list includes [61,107,109,147,148]). This protocol dictates that the two negotiating agents propose outcomes, also called bids or offers, in turns. That is, the agents create a bidding history: one agent proposes an offer, after which the other agent proposes a counter-offer, and this process is repeated until the negotiation is finished, for example by time running out, or by one of the parties accepting.

In the alternating offers setting, when agent A receives an offer xB→Afrom agent B, it has to decide at a later time whether to accept the offer, or to send a counter-offer xA→B. Given a bidding history between agents A and B, we can express the action performed by A with a

decision function [62,181]. The resulting action is used to extend the current bidding history between the two agents. If the agent does not accept the current offer, and the deadline has not been reached, it will prepare a counter-offer by using a negotiation strategy or tactic to generate new values for the negotiable issues (see Sect.3.6).

Various alternative versions of the alternating offers protocol have been used in automated negotiation, extending the default protocol, and imposing additional constraints; for example, in a variant called the monotonic concession protocol [143,169], agents are required to initially disclose information about their preference order associated with each issue and the offers proposed by each agent must be a sequence of concessions, i.e.: each consecutive offer has less utility for the agent than the previous one. Other examples are the three protocols discussed by Fatima et al. [65] that differ in the way the issues are negotiated: simultaneously in bundles, in parallel but independently, and sequentially. The first alternative is shown to lead to the highest quality outcomes. A final example is a protocol in which only one offer can be made. In such a situation, the negotiation can be seen as an instance of the ultimatum game, in which a player proposes a deal that the other player may only accept or refuse [185]. In [176], a similar bargaining model is explored as well; that is, models with one-sided incomplete information and one one-sided offers. It investigates the role of confrontation in negotiations and uses optimal stopping to decide whether or not to invoke conflict.

3.3 Preference profiles

Negotiating agents are assumed to have a preference profile, which is a preference order ≥ that ranks the outcomes in the outcome space. Preferences are said to be ordinal when they are fully specified by a preference order. Together with the domain they make up the

negotiation scenario.

In many cases, the domain and preferences stay fixed during a single negotiation encounter, but while the domain is common knowledge to the negotiating parties, the preferences of each player are private information. This means that the players do not have access to the preferences of the opponent. In this sense, the negotiators play a game of incomplete

infor-mation. However, the players can attempt to learn as much as they can during the negotiation

(8)

An outcomeωis said to be weakly preferred over an outcomeω if ω≥ ω. If in addition

ωω_{, then}_ω_{is strictly preferred over}_{ω, denoted ω}_{> ω. An agent is said to be indifferent} between two outcomes ifω≥ ω and ω ≥ ω. In that case, we also say that these outcomes are equally valued and we write ω ∼ ω. An indifference curve or iso-curve is a set of outcomes that are equally valued by an agent. In a total preference order, one outcome is always (weakly) preferred over the other outcome for any outcome pair, which means there are no undefined preference relations. Finally, an outcomeω is Pareto optimal if there exists no outcomeωthat is preferred by an agent without making another agent worse off [158]. For two players A and B with respective preference orders≥Aand≥B, this means that there

is no outcomeωsuch that: ω>Aω ∧ ω≥Bω ∨ω>Bω ∧ ω≥Aω .

An outcome that is Pareto optimal is also said to be Pareto efficient. When an outcome is not Pareto efficient, there is potential, through re-negotiation, to reach a more preferred outcome for at least one of the agents without reducing the value for the other.

The outcome space can become quite large, which means it is usually not viable to explic-itly state an agent’s preference for every alternative. For this reason, there are more succinct preference representations for preferences [48,53].

A well-known and compact way to represent preference orders is the formalism of con-ditional preference networks (CP-nets) [23]. CP-nets are graphical models, in which each node represents an negotiation issue and each edge denotes preferential dependency between issues. If there is an edge from issue i to issue j , the preferences for j depend on the specific value for issue i . To express conditional preferences, each issue is associated with a condi-tional preference table, which represents a total order of possible values for that issue, given its parents’ values.

A preference profile may be specified as a list of ordering relations, but it is more common in the literature to express the agent’s preferences by a utility function. A utility function assigns a utility value to every possible outcome, yielding a cardinal preference structure.

Cardinal preferences are ‘richer’ than ordinal preferences in the sense that ordinal prefer-ences can only compare between different alternatives, while cardinal preferprefer-ences allow for expressing the intensity of every preference [48]. Any cardinal preference induces an ordinal preference, as every utility function u defines an orderω≥ ω if and only if u(ω) ≥ u(ω).

Some learning techniques make additional assumptions about the structure of the utility function [98], the most common in negotiation being that the utility of a multi-issue outcome is calculated by means of a linear additive function that evaluates each issue separately [98,158,159]. Hence, the contribution of every issue to the utility is linear and does not depend on the values of other issues. The utility u(ω) of an outcome ω = ω1, . . . , ωn ∈ Ω

can be computed as a weighted sum from evaluation functions ei(ωi) as follows: u(ω) =

n

i=1

wi· ei(ωi), (1)

where thewiare normalized weights (i.e.wi = 1). Linear additive utility functions make

explicit that different issues can be of different importance to a negotiating agent and can be used to efficiently calculate the utility of a bid at the cost of expressive power, as they cannot represent interaction effects (or dependencies) between issues.

A common alternative is to make use of non-linear utility functions to capture more complex relations between offers at the cost of additional computational complexity. Non-linear negotiation is an emerging area within automated negotiation that considers multiple

(9)

inter-dependent issues [88,129]. Typically this leads to larger, richer outcome spaces in comparison to linear additive utility functions. A key factor in non-linear spaces is the ability of a negotiator to make a proper evaluation of a proposal, as the utility calculation of an offer might even prove NP-hard [52]. Examples of this type of work can be found in [87,101,127, 166].

For non-linear utility functions in particular, a number of preference representations have been formulated to avoid listing the exponentially many alternatives with their utility assess-ment [48]. The utility of a deal can be expressed as the sum of the utility values of all the

constraints (i.e., regions in the outcome space) that are satisfied [87,130]. These constraints may in turn exhibit additional structure, such as being represented by hyper-graphs [74]. One can also decompose the utility function into subclusters of individual issues, such that the utility of an agreement is equal to the sum of the sub-utilities of different clusters [166]. This is a special case of a utility structure called k-additivity, in which the utility assigned to a deal can be represented as the sum of basic utilities of subsets with cardinality≤ k [49]. For example, for k = 2, the utility u(ω1, ω2, ω3) might be expressed as the utility value of the individual issues u1(ω1) + u2(ω2) + u3(ω3) (as in the linear additive case), plus their 2-way interaction effects u4(ω1, ω2) + u5(ω1, ω3) + u6(ω2, ω3). This is in turn closely related to the OR and XOR languages for bidding in auctions [144], in which the utility is specified for a specific set of clusters, together with rules on how to combine them into utility functions on the whole outcome space.

Finally, the preference profile of an agent may also specify a reservation value. The reservation value is the minimal utility that the agent still deems an acceptable outcome. That is, the reservation value is equal to the utility of the best alternative to no agreement. A bid with a utility lower than the reservation value should not be offered or accepted by any rational agent. In a single-issue domain, the negotiation is often about the price P of a good [59,62,205,206]. In that case, the agents usually take the roles of buyer and seller, and their reservation values are specified by their reservation prices; i.e., the highest price a buyer is willing to pay, and the lowest price at which a seller is willing to sell.

3.4 Time

Time in negotiation is limited, either because the issues under negotiation may expire, or one or more parties are pressing for an agreement [39]. Without time pressure, the negotiators have no incentive to accept an offer, and so the negotiation might go on forever. Also, with unlimited time an agent may simply try a large number of proposals to learn the opponent’s preferences. The deadline of a negotiation refers to the time before which an agreement must be reached [158]. When the deadline is reached, the negotiators revert to their best alternative to no agreement.

The negotiator’s nearness to a deadline is only one example of time pressure [38], which is defined as a negotiator’s desire to end the negotiation quickly [154]. An alternative way to model time pressure is to supplement the negotiation scenario with a discount factor, which models the decline of the negotiated goods over time. Letδ in [0, 1] be the discount factor and let t in[0, 1] be the current normalized time. A way to compute the real-time discounted utility uδ(ω) from the undiscounted utility u(ω) is as follows:

uδ(ω) = u(ω) · δt. (2)

Ifδ = 1, the utility is not affected by time, and such a scenario is considered to be undis-counted, while ifδ is very small, there is high pressure on the agents to reach an agreement.

(10)

Alternatively, time may be viewed as a discrete variable, in which the number of negotiation exchanges (or rounds) are counted. In that case, the deadline is specified as a maximum number of rounds n and discounting is applied in every round k≤ n as uδ(ω) = u(ω) · δk_.

Note that, from a utility point of view, the presence of a discount factorδ is equivalent to the probability 1− δ that the opponent walks away from the negotiation in any given negotiation round.

Deadlines and discount factors can have a strong effect on the outcome of a negotiation and may also interact with each other. For example, it is shown in [177] that in a game-theoretic setting with fully rational play, time preferences in terms of deadlines may lead to a game of ‘sit and wait’ and may completely override other effects such as time discounting.

3.5 Outcome spaces

A useful way to visualize the preferences of both players simultaneously is by means of an

outcome space plot (Fig.2). The axes of the outcome space plot represent the utilities of player A and B, and every possible outcomeω ∈ Ω maps to a point (uA(ω), uB(ω)). The

line that connects all of the Pareto optimal agreements is the Pareto frontier.

Note that the visualization of the outcome space together with the Pareto frontier is only possible from an external point of view. In particular, the agents themselves are not aware of the opponent utility of bids in the outcome space and do not know the location of the Pareto frontier.

From Fig.2we can immediately observe certain characteristics of the negotiation scenario that are very important for the learning behavior of an agent. Examples include the domain size, the relative occurrence of Pareto optimal outcomes, and whether the bids are spread out over the domain.

Fig. 2 A typical example of an outcome space between agents A and B. The points represent all outcomes

that are possible in the negotiation scenario. The line is the Pareto frontier, which connects all of the Pareto efficient outcomes

(11)

3.6 Negotiation tactics

The bidding strategy, also called negotiation tactic or concession strategy, is usually a com-plex strategy component. Two types of negotiation tactics are very common: time-dependent

tactics and behavior-dependent tactics. Each tactic uses a decision function, which maps the

negotiation state to a target utility. Next, the agent can search for a bid with a utility close to the target utility and offer this bid to the opponent.

3.6.1 Time-dependent tactics

Functions which return an offer solely based on time are called time-dependent tactics. The standard time-dependent strategy calculates a target utility u(t) at every turn, based on the current time t. Perhaps the most popular time-based decision function can be found in [59,61], which, depending on the current normalized time t∈ [0, 1], makes a bid with utility closest to

u(t) = Pmi n+ (Pmax− Pmi n) · (1 − F(t)), (3)

where

F(t) = k + (1 − k) · t1/e.

The constants Pmin, Pmax∈ [0, 1] control the range of the proposed offers, and k ∈ [0, 1] determines the value of the first proposal. For 0< e < 1, the agent concedes only at the end of the negotiation and is called aBoulware negotiation tactic. If e≥ 1, the function concedes quickly to the reservation value, and the agent is then called aConceder. Figure3shows a plot of several time-dependent tactics for varying concession factors e.

The specification of these strategies given in [59,61] does not involve any opponent mod-eling; that is, given the target utility, a random bid is offered with a utility closest to it.

3.6.2 Baseline tactics

TheHardliner strategy (also known as take-it-or-leave-it, sit-and-wait [4] orHardball [117]) can be viewed as an extreme type of time-dependent tactic. This strategy stubbornly makes a bid of maximum utility for itself and never concedes, at the risk of reaching no agreement.

(12)

Random Walker (also known as the Zero Intelligence strategy [70]) generates random bids and thus provides the extreme case of a maximally unpredictable opponent. Because of its limited capabilities, it can also serve as a useful baseline strategy when testing the efficacy of other negotiation strategies.

3.6.3 Behavior-dependent tactics

Faratin et al. introduce a well-known set of behavior-dependent tactics or imitative tactics in [59]. The most well-known example of a behavior-dependent tactic is the Tit for Tat strategy, which tries to reproduce the opponent’s behavior of the previous negotiation rounds by reciprocating the opponent’s concessions. Thus, Tit for Tat is a strategy of cooperation based on reciprocity [5].

Tit for Tat has been applied and found successful in many other games, including the Iterated Prisoner’s Dilemma game [6]. In total three tactics are defined: Relative Tit for Tat,

Random Absolute Tit for Tat, and Averaged Tit for Tat. The Relative Tit for Tat agent

mim-ics the opponent in a percentage-wise fashion by proportionally replicating the opponent’s concession that was performed a number of steps ago.

The standard Tit for Tat strategies from [59] do not employ any learning methods, but this work has been subsequently extended by theNice Tit for Tat agent [15] and theNice Mirroring Strategy [81]. These strategies achieve more effective results by combining a simple Tit for Tat response mechanism with learning techniques to propose offers closer to the Pareto frontier.

4 Learning methods for opponent models

An extensive set of learning techniques have been applied in automated negotiation. Below we provide an introduction to the most commonly used underlying methods. Those that are already familiar with these techniques can skip to the next section.

The first two sections discuss Bayesian Learning (Sect. 4.1) and Non-linear

Regres-sion (Sect.4.2). Both methods have mainly been applied as an online learning technique, because they do not require a training phase to produce a reasonable estimate, and because their estimates can be improved incrementally during the negotiation.

In contrast, the other two methods, Kernel Density Estimation (Sect.4.3) and Artificial

Neural Networks (Sect.4.4), generally require a training phase, and are mainly applied when a record of the negotiation history is available. With these methods, it is computationally inexpensive to take advantage of the learned information during the negotiation.

4.1 Bayesian learning

Bayesian learning is the most prominent probabilistic approach in opponent modeling. Bayesian learning is based on Bayes’ rule:

P(H | E) = P(E | H) · P(H)

P(E) . (4)

Bayes’ rule is a tool for updating the probability that a hypothesis H holds based on observed evidence E. In the formula above, P(H | E) is the posterior probability that the hypothesis

H holds given evidence E, and P(E | H) is called the conditional probability of an event E occurring given the hypothesis H . P(H) denotes the prior probability for the hypothesis,

(13)

independent of any evidence, and similarly, P(E) denotes the prior probability that evidence

E is observed.

Bayesian learning is typically used to identify the most likely hypothesis Hi out of a

set of hypothesesH = {H1, . . . , Hn}. In the negotiation literature typically a finite set of

hypotheses is assumed, for example the type of the opponent. In that case the likelihood of the hypotheses given observed evidence E can be determined using the alternative formulation of the Bayes’ rule:

P(Hi | E) = nP(E | Hi) · P(Hi) j=1P(E | Hj) · P(Hj).

(5) An agent can formulate a set of independent hypotheses about a property of the opponent and discover—using evidence—which hypothesis is most likely valid. The idea is that each time new evidence E is observed, we can use Equation (5) to update and compute an improved estimate of the posterior probability P(Hi | E). After processing the evidence, an agent can

conclude which hypothesis is most probable.

One disadvantage of using Bayesian learning is its computational complexity. Updating a single hypothesis Higiven a piece of evidence Ekmay have a low computational complexity;

however, there may be many such hypotheses Hi, and pieces of evidence Ek. For example,

when modeling the opponent’s preferences, this set of hypotheses can be custom made, or generated from the structure of the functions assumed to model the preferences. Even in a negotiation scenario with linear additive utility functions, modeling the preferences requires a set of preference profiles for each negotiable issue. This already leads to a number of hypotheses that is exponential in the number of issues. Another challenge lies in defining the right input for the learning method (e.g. finding a suitable representation of the opponent’s preference profile); in general it is not straightforward to define a suitable class of hypotheses, and it may be hard to determine the conditional probabilities.

4.2 Non-linear regression

Non-linear regression is a broad field of research, and we only present the aspects needed for the application of this technique to opponent modeling. We provide a brief introduction based on [138]. For a more complete overview of the field of non-linear regression, we refer to [17].

Non-linear regression is used to derive a function which “best matches” a set of observa-tional sample data. It is employed when we expect the data to display a certain funcobserva-tional relationship between input and output, from which we can then interpolate new data points. A typical negotiation application is to estimate the opponent’s future behavior from the nego-tiation history assuming that the opponent’s bidding strategy uses a known formula with unknown parameters.

A simple non-linear regression model consists of four elements: the dependent (or

response) variable, the independent (or predictor) variables, the (non-linear) formula, and its parameters. To illustrate, suppose we have a set of observations as shown in Fig.4, and we want to find the relationship between x and y in order to predict the value of y for new values of x. Suppose the relationship is believed to have the form y(x) = ax2+bx +c, where a and

b are parameters with unknown values. In this formula, yis the dependent variable and x is the independent variable. Using non-linear regression, we can estimate the parameters a and

b such that the error between the predicted yvalues and the observed y values is minimized. The error is calculated using a loss function. In the negotiation literature typically the error is calculated as the sum of squared differences between the predicted and observed values.

(14)

Fig. 4 Example of a non-linear regression based on a polynomial of the second degree. The best fit is shown

as the black line

Alternative loss functions may for example calculate the absolute difference, or treat positive and negative errors differently.

The parameters for the quadratic formula discussed in this example can be solved using a closed-form expression. Non-linear regression is typically used when this is not possible, for example when there are a large number of parameters that have a non-linear relation with the solution. The calculation of the parameters is based on an initial guess of the parameters, after which an iterative hill-climbing algorithm is applied to refine the guess until the error becomes negligible. Commonly used algorithms are the Marquardt Method and the simplex algorithm. An introduction to both these methods is provided by Motulsky and Ransnas [138]. The main problem with hill-climbing algorithms is that they can return a local optimum instead of the global optimum. Furthermore, in extreme cases the algorithm may even not converge at all. This can be resolved by using multiple initial estimates and selecting the best fit after a specified amount of iterations.

4.3 Kernel density estimation

Kernel density estimation (KDE) is a mathematical technique used to estimate the probability distribution of a population given a set of population samples [50]. Figure5illustrates the estimated probability density function constructed from six observations.

The first step of KDE consists of converting each sample into a so-called kernel

func-tion. A kernel function is a probability distribution which quantifies the uncertainty of the

observation. Common choices for kernels are the standard normal distribution or the uni-form probability distribution. The second step is to accumulate all kernels to estimate the probability distribution of the population.

While KDE makes no assumptions about the values of the samples, or in which order the samples are obtained, the kernel function typically requires a parameter called bandwidth, which determines the width of each kernel. When a large number of samples is available over the complete range of the variable of interest, then a small bandwidth can lead to an accurate estimate. With few samples, a large bandwidth can help generalize the limited available information. The choice of bandwidth needs to strike a balance between under-fitting and over-under-fitting the resulting distributions. As there is no choice that works optimally in all cases, heuristics have been developed for estimating the bandwidth. Jones et al. provide an overview of commonly used bandwidth estimators [93]. The heuristics are based on

(15)

Fig. 5 Example of the use of KDE. Observations are marked with a line on the x axis. The dashed lines mark

the standard normal kernels that sum up to the probability density estimation indicated by the solid line

statistical characteristics of the sample set, such as the sample variance and sample count. The estimation quality of KDE can be further improved by varying the bandwidth for each kernel, for example based on the amount of samples found in a window centered at each observation. Using an adaptive bandwidth is called adaptive (or variable) KDE and can further decrease the estimation error at the cost of additional workload.

KDE is a computationally attractive learning method. The computationally intensive parts (automatic bandwidth selection and the construction of a kernel density estimate) can be done offline, after which the lookup can be performed during the negotiation.

4.4 Artificial neural networks

Artificial neural networks are networks of simple computational units that together can solve complex problems. Below we provide a short introduction to artificial neural net-works (ANN’s) based on Kröse et al. [110]. Our overview is necessarily incomplete due to broadness of the field; therefore, for a more complete overview we refer to Haykin [78] and for a survey of the applications of neural networks in automated negotiation to Papaioannou et al. [152].

An ANN is a computational model with the ability to learn the relationship between the input and output by minimizing the error between the output signal of the ANN and the expected output. Since all that is required is a mechanism to calculate the error, ANN’s can be applied when the relation between input and output is unknown. ANN’s have been used for several purposes, including classification, remembering, and structuring of data.

A neural network consists of computational units called neurons, which are connected by weighted edges. Figure6visualizes a simple neural network consisting of six neurons. A single neuron can have several incoming and outgoing edges. When a neuron has received all inputs, it combines them according to a combination rule, for example the sum of the inputs. Next, it tests whether it is triggered by this input by using an activation function; e.g., whether a threshold has been exceeded or not. If the neuron is triggered, it propagates the combined signal over the output lines, else it sends a predefined signal.

The set of neurons function in an environment that provides the input signals and processes the output signals of the ANN. The environment calculates the error of the output, which the

(16)

Fig. 6 Example of an ANN symbolizing the logical XOR. The input neurons expect a value of 0 or 1. The

edges show the weights. The activation functions are threshold functions, which values are depicted inside the nodes. The combination rule is the sum of the inputs. If the combined input is larger than or equal to the threshold the combined input is propagated. Otherwise, the value 0 is propagated on the output line neural network uses to better learn the relation between the input and output by adjusting the weights on the edges between the neurons.

Neurons can be ordered in successive layers based on their depth. In Fig.6each layer has a unique color. The first layer is called the input layer, the last one is the output layer. Both the input and output neuron generally have no activation function. The layers in between are called hidden layers as they are not directly connected to the environment.

To illustrate how a simple neural network works, assume that the input x= 0 and y = 1 are fed to the network in Fig.6. In that case, the leftmost light gray neuron receives input 0, which results in the output 0, as the neuron is not triggered. The rightmost light gray neuron however, is triggered since it receives input 1 and therefore propagates the output 1. The middle light gray neuron receives the input 0 and 1, which are integrated using the combination rule. The combined signal is insufficient to trigger the neuron, resulting in a 0 as output. Since the rightmost light gray neuron is the only neuron that produced a non-zero output, the final output is 1.

The amount of neurons and their topology determines the complexity of the input-output relationship that the ANN can learn. Overall, the more neurons and layers, the more flexible the ANN. However, the more complex the ANN, the more complex the learning algorithm and consequentially the higher the computational cost of learning.

An ANN is typically used when there is a large amount of sample data available, and when it is difficult to capture the relationship between input and output in a functional description; e.g., when negotiating against humans.

5 Learning about the opponent

A bilateral negotiation may be viewed as a two-player game of incomplete information where both players aim to achieve the best outcome for themselves. In general, an opponent model is an abstracted description of a player (and/or its behavior) during the game [193]. In negotiation, opponent modeling often revolves around three questions:

– Preference estimation What does the opponent want? – Strategy prediction What will the opponent do, and when?

– Opponent classification What type of player is the opponent, and how should we act accordingly?

These questions are often highly related. For example, some form of preference estimation is needed in order to understand how the opponent has acted according to its own utility. Then,

(17)

by adequately interpret the opponent’s actions, we may deduce its strategy, which in turn can help predict what the agent will do in the future.

Constructing an opponent model may alternatively be viewed as a classification problem where the type of the opponent needs to be determined from a range of possibilities [179]; one example being the work by Lin et al. [124]. Here the type of an opponent refers to all opponent attributes that may be modeled to gain an advantage in the game. Taking this perspective is particularly useful when a limited number of opponent types are known in advance, which at the same time is its main limitation.

Note that our definition excludes work in which a pool of agents are tuned or evolved to optimize their performance when playing against each other, without having an explicit opponent modeling component themselves. For readers interested in this type of approach we refer to Liang and Yuan [120], Oliver [145], Sánchez-Anguix et al. [175], and Tu et al. [191].

Opponent modeling can be performed online or offline, depending on the availability of historical data. Offline models are created before the negotiation starts, using previously obtained data from earlier negotiations. Online models are constructed from knowledge that is collected during a single negotiation session. A major challenge in online opponent modeling is that the model needs to be constructed from a limited amount of negotiation exchanges, and a real-time deadline may pose the additional challenge of having to construct the model as fast as possible.

Opponent modeling can be performed at many different levels of granularity. The most elementary of preference models may only yield a set of offers likely to be accepted by the opponent, for instance by modeling the reservation value. A more detailed preference model is able to estimate the acceptance probability for every outcome (e.g. using a probabilistic representation of the reservation value). An even richer model can involve the opponent’s

preference order, allowing us to rank the outcomes. We can achieve the richest preference

representations with a cardinal model of preferences, yielding an estimate of the opponent’s full preference profile. The preferred form of granularity depends not only on the complexity of the negotiation scenario, but also on the level of information required by the agent. For instance, if the agent is required to locate Pareto optimal outcomes in a multi-issue domain, it will require at least an ordinal preference model.

Note that in most cases, comparing different approaches is impossible due to the variety of quality measures, evaluation techniques and testbeds in use; we will have more to say on how to evaluate the different approaches in Sect.6.

Even though there are large differences between the models, a common set of high level motivations behind their construction can be identified. We found the following motivations for why opponent models have been used in automated negotiation:

1. Minimize negotiation cost [7–9,11,50,72,73,90,103,113,137,143,146,149,151,153, 155,160,165,166,183,184,188,205–207] In general, it costs time and resources to nego-tiate. As a consequence, (early) agreements are often preferred over not reaching an agreement. As such, an opponent model of the opponent’s strategy or preference profile aids towards minimizing negotiation costs, by determining the bids that are likely to be accepted by the opponent. An agent may even decide that the estimated negotiation costs are too high to warrant a potential agreement, and prematurely end the negotiation. 2. Adapt to the opponent [1,15,28,29,44–46,57,72,73,75,81,82,85,92,133,139,140,

150,155,162,173,197,199,204] With the assistance of an opponent model, an agent can adapt to the opponent in multiple ways. One way is to estimate the opponent’s reservation value in an attempt to deduce the best possible outcome that the opponent will settle for.

(18)

Another method is to use an estimate of the opponent’s deadline to elicit concessions from the opponent by stalling the negotiation, provided, of course, that the agent itself has a later deadline. Finally, an opponent model can be used to estimate the opponent’s concessions to accurately reciprocate them.

3. Reach win-win agreements [11,12,15,21,30,50,51,81,83,90,94,103,113,123,124, 137,143,146,149,155,160,165,166,174,183,184,188,194,195,198,205–207] In a cooperative environment, agen ts aim for a fair result, for example because there might be opportunity for future negotiations. Cooperation, however, does not necessarily imply that the parties share explicit information about their preferences or strategy, as agents may still strive for a result that is beneficial for themselves and acceptable for their opponent. An agent can estimate the opponent’s preference profile to maximize joint utility.

We found that existing work on opponent models can fulfill any of the goals above by learning a combination of four opponent attributes, which we have listed in Table2. The remainder of this section discusses, for each attribute, the applicable opponent modeling techniques, following the order of Table2.

5.1 Learning the acceptance strategy

All negotiation agent implementations need to deal with the question of when to accept. The decision is made by the acceptance strategy of a negotiating agent, which is a boolean function indicating whether the agent should accept the opponent’s offer. Upon acceptance of an offer, the negotiation ends in agreement, otherwise it continues. More complex acceptance strategies may be probabilistic or include the possibility of breaking off the negotiation without an agreement—if that is supported by the protocol.

A common default is for the agent to accept a proposal when the value of the offered contract is higher than the offer it is ready to send out at that moment in time. The bidding strategy then effectively dictates the acceptance strategy, making this a significant case in which it suffices to learn the opponent’s bidding strategy (see Sect.5.4). Examples include the time-dependent negotiation strategies defined in [167] (e.g. theBoulware and Conceder tactics). The same principle is used in the equilibrium strategies of [61] and theTrade-off agent [60]. Other agents use much more sophisticated methods to accept; for example, acceptance strategies based on extrapolation of all received offers [97], dynamic time-based acceptance [2,51], and optimal stopping [13,104,116,176,201].

Learning an opponent’s acceptance strategy is potentially of great value as it can help to find the best possible deal for an agent which at the same time is satisfactory for the opponent. Two general approaches have been used to estimate the acceptance strategy, depending on the negotiation domain:

– Estimating the reservation value (Sect.5.1.1) In a negotiation about a single quantitative issue, where the opponent’s have opposing preferences that are publicly known—such as the price of a service—, knowledge of the opponent’s reservation value is sufficient to determine all acceptable bids. An opponent model can learn the opponent’s reservation value by extrapolating the opponent’s concessions.

– Estimating the acceptance strategy (Sect.5.1.2) An alternative approach applied to multi-issue negotiations is to estimate the probability that a particular offer is accepted, based on the similarity with bids that the opponent previously offered and/or accepted.

(19)

Table 2 All learning techniques and methods that help to learn four different opponent attributes

Opponent attributes Procedure Learning technique

5.1Acceptance strategy Bidding strategy estimation Bayesian learning [72,92,162,183,184,204–207] Non-linear regression [1,73,85,204] Interpolation of acceptance likelihood Bayesian learning [113] Kernel density estimation [149] Neural networks [57] Polynomial interpolation [173]

5.2Deadline Bidding strategy estimation

Bayesian learning [72,92,184,204] Non-linear regression [72,85,184,204]

5.3Preference profile Estimation of issue preference order

Bayesian learning [143] Heuristics [21,37,94]

Kernel density estimation [50,58] Simplified genetic algorithm [90]

Classification Bayesian learning [15,31,32,51,83,123,124,155,198] Data mining aggregate

preferences

Bayesian network [174] Graph theory [165,166]

Random variable estimation [103,188] Logical reasoning and

heuristics

Heuristics [7–9,30,75–77,163,194,195]

5.4Bidding strategy Regression analysis Bayesian networks [139] Genetic algorithms [151]

Non-linear regression [1,28,73,85,153,161,204] Polynomial interpolation [153]

Time series forecasting Derivatives [29,137] Markov chains [140]

Neural networks [35,36,115,133,146,151,153,160] Signal processing [44–46,132,150,197,199]

5.1.1 Learning the acceptance strategy by estimating the reservation value

Current methods for estimating the reservation value stem from the idea that an agent will cease to concede near its reservation value, and that this behavior occurs when the negotiation deadline approaches. These methods make assumptions about the availability of domain knowledge, or assume that the opponent uses a particular strategy.

The oldest and most popular approach is by Zeng and Sycara [205,206], who propose a Bayesian learning method to estimate the reservation value, using data from previous negotiations. One single quantitative issue is negotiated, for which it is assumed the agents have opposing preferences. Before the negotiation, a set of hypothesesH = {H1, . . . , Hn}

(20)

whereviis one of the possible values for the opponent’s reservation value rv. The hypotheses,

the valuesvi, and their a priori likelihood, are all determined based on domain knowledge

derived from previous negotiations. By applying Bayesian learning during the negotiation, the probabilities of the hypotheses are updated based on observed behavior and the available domain knowledge. Intuitively, the idea is that an offer at the beginning of the negotiation is likely to be far from the reservation value. The reservation value is estimated by using the weighted sum of the hypotheses according to their likelihood. This method is widely applied; for example in work by Ren and Anumba [162] and Zhang et al. [207].

Closely related to the work by Zeng and Sycara is the work by Sim et al., who apply the same procedure when the opponent is constrained to use a particular time-dependent tactic, but with a private deadline [72,92,183,184]. The opponent’s decision function is assumed to be of a particular form in which the reservation value and deadline are related, in the sense that one can be derived from the other.

A different approach is taken by Hou [85], who presents a method to estimate the oppo-nent’s tactic in a negotiation about a single quantitative issue with private deadlines. It is assumed that the opponent employs a tactic dependent on either time, behavior, or resources. Non-linear regression is used to estimate which of the three types of strategies is used, and to estimate the values of the parameters associated with the tactic [85], including the reservation value (cf. Sect.5.4.1). A similar approach is followed by Agrawal and Chari, who model the opponent’s decision function as an exponential function [1]. When the deadline is pub-lic knowledge, Haberland’s method [73] can be used to estimate the opponent’s reservation value assuming the opponent uses a time-dependent tactic.

To improve reliability of the estimates, Yu et al. [204] combine non-linear regression with Bayesian learning to estimate the opponent’s reservation value, as well as the deadline. In their model, the opponent is assumed to use a time-dependent tactic with unknown parameters. Each round the parameters are estimated using non-linear regression. Next, that round’s estimate is used to create a more reliable set of hypotheses about the opponent’s reservation value and deadline by using Bayesian learning.

All these methods estimate the opponent’s reservation value in a single-issue negotiation using Bayesian learning (which is more computationally involved), or non-linear regression (which is faster, but requires knowledge about the structure of the opponent’s strategy). To our knowledge, artificial neural networks and kernel density estimation have not been used for this purpose. Furthermore, all these methods assume that given the reservation value all acceptable bids are known due to the known ordering of the possible values. An interesting open problem is how to apply these techniques to situations where such a ordering is not straightforward.

5.1.2 Learning the acceptance strategy by estimating the acceptance probability

The acceptance strategy can be learned by keeping track of what offers were accepted in pre-vious negotiations and by recording the offers the opponent sends out. From this information, an agent can estimate the probability that a bid will be accepted in a particular negotiation state. As it is unlikely that such an estimate can be derived for all possible bids, regression methods can be applied to determine the acceptance probability for the entire outcome space. It is easiest to apply this method in repeated single-issue negotiations, as Saha and Sen do in [173]. In this scenario, a seller may only propose a price once, which a buyer then accepts or rejects. An increasingly better estimate of the buyer’s acceptance strategy allows the seller to maximize its profit over time. To derive the set of samples, the buyer first samples the outcome space to find bids that are either always rejected, or always accepted. After that,

(21)

a number of in-between values are sampled, until the acceptance probability of a sufficient amount of bids has been determined. In order to estimate the acceptance probability of all possible offers, polynomial interpolation is applied, using Chebyshev polynomials [131]. Given the probability distribution of acceptance of each offer, the seller can determine the optimal price to maximize profit.

Interpolation of the acceptance likelihood does not directly carry over to a multi-issue negotiation setting, because the multi-issue preference space lacks the structure of the single-issue case with opposite preferences. The key approach in overcoming this challenge is by Oshrat et al. [149] and relies on a database of negotiations against a set of human negotiators with known preference profiles. During the negotiation, it is assumed that the opponent’s preference profile is known, or that the Bayes classifier introduced in [123,124] can be applied to reliably learn the opponent’s profile. The database traces then determine what bids have been proposed or accepted by the opponent, which are pooled together under the assumption that if an agent makes an offer, it is also willing to accept it. The authors then use kernel density estimation to estimate the acceptance probability for all the other bids.

Lau et al. apply a similar method based on Bayesian learning with the addition that the effect of time pressure and possible changes of the opponent’s negotiation strategy are taken into account [113]. The underlying idea is that a bid which is unacceptable for an opponent at the beginning of the negotiation might be acceptable at the end. The effect of time pressure is modeled by giving recent bids in a negotiation a higher weight. In addition, more recent negotiation traces receive a higher weight to account for possible opponent strategy changes. Finally, Fang et al. [57] present a lesser known technique for multi-issue negotiation, which assumes that every presented bid is also acceptable for the opponent. The set of acceptable offers from earlier negotiations are used to train a simple neural network that can then test whether any particular bid is acceptable or not.

5.2 Learning the deadline

The deadline of a negotiation refers to the time before which an agreement must be reached to achieve an outcome better than the best alternative to a negotiated agreement [158]. Each agent can have its own private deadline, or the deadline can be shared among the agents. The deadline may be specified as a maximum number of rounds [187], or alternatively as a real-time target. Note that when the negotiation happens in real time, the time required to reach an agreement depends on the deliberation time of the agents (i.e., the amount of computation required to evaluate an offer and produce a counter offer).

When the opponent’s deadline is unknown, it is of great value to learn more about it, as an agent is likely to concede strongly near the deadline to avoid non-agreement [63]. Because of this strong connection between the two, most of the procedures discussed in Sect.5.1.1 can also be used to estimate the opponent’s deadline. Hou [85] for example, estimates the deadline following the same procedure for estimating the reservation value. Yu et al. [204] apply a similar method with the additional constraint that the opponent uses a time-dependent tactic. Finally, Sim et al. directly calculate an estimate for the deadline from the estimated reservation value [72,92,184].

As is the case for the reservation value, these methods assume a single-issue negotiation and make strong assumptions about the opponent’s strategy type. How to weaken these assumptions and estimate the deadline in multi-issue negotiations is still an open research topic.

(22)

5.3 Learning the preference profile

The preference profile of an agent represents the private valuation of the outcomes. To avoid exploitation, agents tend to keep their preference information private [50,206]; however, when agents have limited knowledge of the other’s preferences, they may fail to reach a Pareto optimal outcome as they cannot take the opponent’s desires into account [83].

In order to improve the efficiency of the negotiation and the quality of the outcome, agents can construct a model of the opponent’s preferences [50,83,206]. Over time, a large number of such opponent models have been introduced, based on different learning techniques and underlying assumptions [12]. Learning the opponent’s preference profile can be of great value, as it provides enough information to allow an agent to propose outcomes that are Pareto optimal and thereby increase the chance of acceptance [60,81].

Four approaches have been used to estimate the opponent’s preference information. The first approach, which is discussed in Sect.5.3.1, assumes that the opponent uses a linear additive utility function. Many of the other three approaches are applicable in negotiation settings with non-linear utility functions as well.

– Estimation of issue preference order (Sect.5.3.1). The agent can estimate the importance of the issues by assuming that, as the opponent concedes, the issues it values the least are conceded first.

– Classifying the opponent’s negotiation trace (Sect.5.3.2). During the negotiation, the agent classifies the opponent’s negotiation trace, using a finite set of groups of which the preferences are known.

– Data mining aggregate preferences (Sect.5.3.3). The agent is assumed to have available a large database containing aggregate customer data. The problem of opponent modeling then essentially reduces to a data mining problem.

– Applying logical reasoning and heuristics to derive outcome order (Sect.5.3.4). The agent deduces preference relations of the opponent from the opponent’s negotiation trace, using common sense reasoning and heuristics.

5.3.1 Learning the preference profile by estimating the issue preference order

The issue preference order of an agent is the way the agent ranks the negotiated issues according to its preferences; that is, it is an ordinal preference model over the set of issues, rather than the full set of outcomes. Learning the opponent ranking of issues can be already sufficient to improve the utility of an agreement [60]. If the opponent is assumed to use a linear additive utility function (as defined in Sect.3.3), then this reduces the problem of learning the continuous preference of all possible outcomes to learning the ranks of n issues, effectively limiting the size of the search space to n! discrete possibilities. Needless to say, such an assumption might not be realistic depending on the definition of the issues and the complexity of the negotiation scenario. However, especially in situations where the number of interactions between the negotiating parties is limited, an agent can fall back on learning the opponent’s issue preference order.

The learning techniques discussed in this section estimate the importance of the issues by analyzing the opponent’s concessions, assuming that an opponent concedes more strongly on issues that are valued less. They all follow the same pattern: initially, each issue is assigned an initial weight. Next, each round the difference in value for each issue between the current and previous bid is mapped to an issue weight by applying a similarity measure. Finally, the estimated weights are used to update an incremental estimate.