PAC Rank Elicitation using Ranking Procedures

(1)

PAC Rank Elicitation using

Ranking Procedures

Róbert Busa-Fekete1,2, Balázs Szörényi2,3, Eyke Hüllermeier1

1_{Computational Intelligence Group, Philipps-Universit¨}_{at Marburg, GERMANY}

2_{Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, HUNGARY} 3_{INRIA Lille - Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d’Ascq, FRANCE}

Pozna´n March 27, 2014

(2)

(Value-based) Stochastic Multi-armed Bandits

Preference-based Stochastic Multi-armed Bandits

PAC Rank Elicitation using Ranking Procedures

Numerical experiments

(3)

Stochastic

multi-armed

bandit setup

I Setup: [Lai and Robbins, 1985, Auer et al., 2002]

I There are givenM arms/items/options: A = {a1, . . . , aM}

I Each arm ai is associated with adistribution νi I Arm distributions ν1, · · · , νM are not known!

I µi = E [νi] ⇒ total order on arms if µi 6= µj for i 6= j

I Best arm ai∗ where i∗= argmax_{1≤i ≤M}µ_i

I At each time step t, the (online) learning algorithm selects an arm it _to be sampled, and receives rewardrt i.i.d.∼ νit

(4)

Stochastic

multi-armed

bandit setup

I Goal of the (online) learner (also called decision maker/agent):

I Minimize the expectedregret: X i 6=i∗ (µi∗− µ_i) | {z } ∆i T X t=1 P(it= i ) = X i 6=i∗ ∆i T X t=1 P(it = i )

I UCB, -greedy[Auer et al., 2002], UCBV[Audibert et al., 2007]

I PAC setup: Find an -optimal arm with probability at least 1 − δ I -optimal arm: µi∗− µi = ∆i <

I Based on as few sample as possible

I Sample complexity: number of samples taken prior to termination

(5)

Preference-based

stochastic

multi-armed

bandit setup

I Setup:

I There are givenM items/arms: A = {a1, . . . , aM}

I Arms can be compared in apairwise manner: ai aj

I Pairwise comparisons obey afixed probability distributionfor each pair of arms, thus,

pi ,j = P(ai aj) = E [I{ai aj}]

I pi ,j is calledpairwise probabilityfor arm i and j

I _{If p}_{i ,j} _{> 1/2, then arm a}_i _{is preferred to arm a}_j_,

or concisely ai “beats” aj

I At each time step t, the (online) learning algorithm selects a

pair of arms it _{and j}t _{to be compared, and observes}

Yit_,jt = I{a_it a_jt}

I Goal of the (online) learner (decision maker/agent):

I Optimize some kind of regret

I Find the best arm

(6)

How to make the setup complete?

I Preferential cycles

I _p_{i ,j} _{> 1/2, p}_{j ,k} _{> 1/2 and p}_k,i _{> 1/2 (Condorcet paradox)} I No best arm ⇒ no reasonable regret either!

I Matrix of pairwise probabilities: P = [pi ,j]_{1≤i ,j ≤M}

I Assumptions on P

I Dueling bandit setup[Yue et al., 2012] I Beat-the-Mean[Yue and Joachims, 2011]

I Preference-based bandits with statistical models[Busa-Fekete et al., 2014]

I No assumption on P, but some ranking procedure is applied I R : P → O_A, where OAset of orders on A

I Copeland, Majority vote, Random walk[Busa-Fekete et al., 2013] I For example, ai≺COaj⇔ di< djwhere di= {k ∈ [M]|1/2 < pi ,k}

I _{R is assumed to be “smooth”}_{[Urvoy et al., 2013]}

I This talk: PAC Rank Elicitation[Busa-Fekete et al., 2014]

(7)

Practical applications of Preference-based MAB

I Online advertisement[Chapelle et al., 2012]

I Two ads are shown at one time = compare arms

I Crowdsourcing[Chen et al., 2013]

I _{Amazon Mechanical Turk}

I Widely-used platform in Natural Language Processing (NLP) to annotate database (Lexical Substitution, Machine Translation)

I For example: there is given an English sentence and some possible German translations of it. The goal is to find a ranking which reflects to the quality of the translations.

I The annotators are asked in terms of simple questions: which alternative is better out of these two?

(8)

PAC Rank Elicitation using

Ranking Procedures

(9)

PAC Rank Elicitation Setup: Formal problem

I Ranking procedure: R : P → OA defines a strict order ≺R

over arms

I Ranking distance: d : SM × OA → N0 defines a distance

between a complete ranking and a strict order ≺R over arms

I Definition

An algorithm A a(ρ, δ)-PAC rank elicitation algorithm with respect to a ranking procedure R and rank distance d , if it returns a ranking τ for whichd (τ, ≺R) < ρwith probability at least1 − δ.

(10)

PAC Rank Elicitation Setup: Formal problem

I Ranking procedure: R : P → OA defines a strict order ≺R

over arms

I Ranking distance: d : SM × OA → N0 defines a distance

between a complete ranking and a strict order ≺R over arms

I Definition

An algorithm A a(ρ, δ)-PAC rank elicitation algorithm with respect to a ranking procedure R and rank distance d , if it returns a ranking τ for whichd (τ, ≺R) < ρwith probability at least1 − δ.

(11)

Online learning framework: RankEl

Select pairs(it, jt)

Observe ot ∼ Y_{i t ,j t}

Update pairwise estimatebp

t i t ,j t Repeat t := t + 1 Continue or terminate? Parameter: δ, ρ Ranking procedure R Rank distance d (., .) Recommendation: d (τ, ≺R) < ρ with probability at least 1 − δ

yes no

I The pairwise probabilities P arenot knownto the learner, but it can gain information about them viasampling!

(12)

Online learning framework: RankEl

Select pairs(it, jt)

Observe ot ∼ Y_{i t ,j t}

Update pairwise estimatebp

t i t ,j t Repeat t := t + 1 Continue or terminate? Parameter: δ, τ Ranking procedure R Rank distance d (., .) Recommendation: d (τ, ≺R) < ρ with probability at least 1 − δ

What to sample?

When to stop? What to recommend?

yes no

(13)

Pairwise probability estimates and their confidence intervals

I nt_{i ,j}: number of comparisons for arms ai and aj up to time t

I t =PM i =1 PM j =1n t i ,j

I _bp_{i ,j}t : estimate of the pairwise probabilities pi ,j

I b pt i ,j = 1/nti ,j P t0_∈It i ,jo t0 _{where I}t i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: c_{i ,j}t = c(n_{i ,j}t , t, δ) = r 1 2nt i ,j ln 5M2_t4 4δ

Lemma (like index-based bandits, such as UCB and L-UCB)

The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pt_{i ,j}− c_{i ,j}t ,_bpt_{i ,j}+ c_{i ,j}t < δ

(14)

Pairwise probability estimates and their confidence intervals

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pt_{i ,j}− ct i ,j,bp t i ,j+ ci ,jt < δ

(15)

Pairwise probability estimates and their confidence intervals

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pt_{i ,j}− ct i ,j,bp t i ,j+ ci ,jt < δ

(16)

Rank distances

I Ranking distance: d : SM× OA → N0 defines a distance between a

complete ranking and a strict order ≺R over arms

I Rank distances:

I _{Number of discordant pairs (NDP) (related to Kendall’s distance)}

dK(τ, ≺) = M X i =1 X j 6=i I{τj < τi}I{ai ≺ aj}

I _{Maximum rank difference (MRD)}

dM(τ, ≺) = min

τ0_∈L≺_{1≤i ≤M}max |τi− τ

0 i|

(17)

Rank distances

I Ranking distance: d : SM× OA → N0 defines a distance between a

complete ranking and a strict order ≺R over arms

I Rank distances:

I _{Number of discordant pairs (NDP) (related to Kendall’s distance)}

dK(τ, ≺) = M X i =1 X j 6=i I{τj< τi}I{ai ≺ aj}

I _{Maximum rank difference (MRD)}

dM(τ, ≺) = min

τ0_∈L≺_{1≤i ≤M}max |τi− τ

0 i|

(18)

Ranking procedures

I R : P → OA, where OA set of (strict) orders on A

I Copeland’s order ≺CO

I ai≺COaj⇔ di < dj (di: skill of arm ai)

I di = #{k ∈ [M] | 1/2 < pi ,k}

I Sum of expectations order ≺SE (Majority voting)

I ai≺SEaj ⇔ yi < yj (yi: skill of arm ai)

(19)

Example: Bundesliga

I Last 10 seasons

I Average of the outcome of 20 matches

I 1: win, 0: loss, 1/2-1/2: tie

(20)

Example: Bundesliga

I Last 10 seasons

I 1: win, 0: loss, 1/2-1/2: tie

(21)

Ranking procedures

I What if pi ,j = 1/2 or yi = yj?

I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗

I d_i∗= # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I s_i∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I -sensitive sum of expectations: ai ≺SE aj ⇔ yi + < yj

I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0

I -sensitive Copeland’s order: [d_i∗, d_i∗+ s_i∗] ≺ [d_j∗, d_j∗+ s_j∗]

(22)

Ranking procedures

(23)

Ranking procedures

(24)

Ranking procedures

(25)

Example: Bundesliga

I Last 10 seasons

I 1: win, 0: loss, 1/2-1/2: tie

(26)

Instantiations of PAC Rank Elicitation Setup

Distance

_NDP

_MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(27)

Instantiations of PAC Rank Elicitation Setup

Distance

_NDP

_MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(28)

Implementation for -sensitive SE ranking

I Each arm ai is associated with an interval [yi, yi+ ]

I ai≺SEaj ⇔ yi+ < yj I yi = M−11 P k6=ipi ,k I Estimate for yi: I b yi = _M−11 Pk6=ibp t i ,k

I Confidence interval for _byi:

I ct i = 1 M−1 P k6=ic t i ,k I Since pi ,j ∈ [bp t i ,j − ci ,jt ,bp t

i ,j− ci ,jt ] for any 1 ≤ j ≤ M, therefore

yi ∈ [byi− c

t i,byi + c

t

(29)

Implementation for -sensitive SE ranking

I |y_i − y_j| > and the overlapping part >

0 0.5 1 yi b y_it− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z }

I If we are not lucky, then yj+ < yi ⇒ aj ≺SEai

I |yi − yj| > and the overlapping part ≤

0 0.5_y 1 i b y_it b y_it− ct i by t i + c t i yj b y_jt b yt j − cjt by t j + cjt | {z }

I Even if we are not lucky, we know that yj+ 6< yi

I Therefore aj ⊀SE_a

i

I |y_i − y_j| < (overlapping part becomes < , sooner or later)

0 0.5 1

yi yj

(30)

Implementation for -sensitive SE ranking

0 0.5 1 yi b y_it b y_it− ct i yb t i + c t i yj b y_jt b yt j − cjt by t j + cjt | {z }

I Therefore aj ⊀SE _a

i

0 0.5 1

yi yj

(31)

Implementation for -sensitive SE ranking

0 0.5 1 yi b y_it b y_it− ct i yb t i + c t i yj b y_jt b yt j − cjt by t j + cjt | {z }

I Therefore aj ⊀SE _a

i

0 0.5 1

yi yj

(32)

Implementation for -sensitive SE ranking

I Summarising these observations: If the overlapping part of confidence interval is smaller than for arms ai and aj, then we can decide their

order with respect to ≺SE

I Lemma

Letσt be the order according to_by_it values, and Oti ,j be the indicator of two

confidence intervals are overlapping more than :

Oti ,j = I{|[byit− cit,by t i + cit] ∩ [by t j − cjt,by t j + cjt]| > }

Then for any time step t, and for any sampling strategy,

dK(σt, ≺SE) ≤ 1 2 M X i =1 X j 6=i Oti ,j and dM σt, ≺SE ≤ max 1≤i ≤M X j 6=i Oti ,j

(33)

Implementation for -sensitive SE ranking

I Summarising these observations: If the overlapping part of confidence interval is smaller than for arms ai and aj, then we can decide their

order with respect to ≺SE

I Lemma

Let σt be the order according to_by_it values, and Oti ,j be the indicator of two

confidence intervals are overlapping more than :

Oti ,j = I{|[byit− cit,by t i + cit] ∩ [by t j − cjt,by t j + cjt]| > }

Then for any time step t, and for any sampling strategy,

dK(σt, ≺SE) ≤ 1 2 M X i =1 X j 6=i Oti ,j and dM σt, ≺SE ≤ max 1≤i ≤M X j 6=i Oti ,j

(34)

Implementation for -sensitive SE ranking

I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρ

I For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | ∃ j0_{: (O}t i ,j0 = 1)

I Those pairs whose intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | ρ ≤P j0_6=iOt_{i ,j}0

o

I What to recommend?:

I σt _{which sorts the arms based on} b yt

1, . . . ,by t M

(35)

Implementation for -sensitive SE ranking

I What to sample?:

o

1, . . . ,by t M

(36)

Implementation for -sensitive SE ranking

I What to sample?:

o

1, . . . ,by t M

(37)

Implementation for -sensitive SE ranking

I What to sample?:

o

1, . . . ,by t M

(38)

Instantiations of PAC Rank Elicitation Setup

Distance

_NDP

_MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(39)

Implementation for -sensitive Copeland’s ranking

I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗

I d∗

i = # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I Based on theempirical estimates, define an interval [d_it, d_it+ ut_i] for each arm ai such that

d_i∗∈ [d_it, d_it+ ut_i] where d_it= #D_it = #j | 1/2 − <bp t i ,j− ci ,jt , j 6= i u_it= #U_it = #j | [1/2 − , 1/2 + ] ⊆ [_bpt_{i ,j}− ct i ,j,bp t i ,j + ci ,jt ], j 6= i

I d_it is the number of options that are already known to be beaten by ai

I u_it denotes the number of“undecided” pairwise preferences for arm aj

I Assume: [d_it, d_it+ ut_i] ≺ [d_jt, d_jt+ u_jt] I d_i∗ ≤ dt i + uit < |{z} assume d_jt ≤ d_j∗ ≤ d_j∗+ s_j∗ ⇒ aj ⊀CO ai

(40)

Implementation for -sensitive Copeland’s ranking

I d∗

d_i∗∈ [d_it, d_it+ ut_i] where d_it= #D_it = #j | 1/2 − <bp t i ,j− ci ,jt , j 6= i u_it= #U_it = #j | [1/2 − , 1/2 + ] ⊆ [_bp_{i ,j}t − ct i ,j,bp t i ,j + ci ,jt ], j 6= i

I Assume: [d_it, d_it+ ut_i] ≺ [d_jt, d_jt+ u_jt] I d_i∗ ≤ dt i + uit < |{z} assume d_jt ≤ d_j∗ ≤ d_j∗+ s_j∗ ⇒ aj ⊀CO ai

(41)

Implementation for -sensitive Copeland’s ranking

I d∗

d_i∗∈ [d_it, d_it+ ut_i] where d_it= #D_it = #j | 1/2 − <bp t i ,j− ci ,jt , j 6= i u_it= #U_it = #j | [1/2 − , 1/2 + ] ⊆ [_bp_{i ,j}t − ct i ,j,bp t i ,j + ci ,jt ], j 6= i

I Assume: [d_it, d_it+ ut_i] ≺ [d_jt, d_jt+ u_jt] I d_i∗ ≤ dt i + uit < |{z} assume d_jt ≤ d_j∗ ≤ d_j∗+ s_j∗ ⇒ a_j _⊀CO _a i

(42)

Implementation for -sensitive Copeland’s ranking

Lemma

Define a rankingτt over arms by sorting the arms ai in decreasing order

according to dt

i, and in case of a tie (dit = djt) according to the sum

d_it+ ut_i. And let

Iti ,j = I{(dit < djt+ ujt) ∧ (djt< dit+ uit)}

for all 1 ≤ i 6= j ≤ M. Then for any time step t, and for any sampling strategy, dK(τt, ≺CO) ≤ 1 2 M X i =1 X j 6=i Iti ,j

holds with probability at least 1 − δ, and dM(τt, ≺CO) ≤ max

1≤i ≤M

X

j 6=i

Iti ,j

(43)

Implementation for -sensitive Copeland’s ranking

I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iIti ,j< ρ

I For MRD distance: dM(τt, ≺CO) ≤ max1≤i ≤MPj 6=iI t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut

i) ∧ ∃ j0 : (Iti ,j0 = 1)

I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | (j ∈ U_it) ∧ ρ ≤P j0_6=iIt_{i ,j}0

o

I τt which sorts the items based on d₁t, . . . , d_Mt

(44)

Implementation for -sensitive Copeland’s ranking

I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρ

I _{For MRD distance: d}_M_(τt_{, ≺}CO_{) ≤ max}

1≤i ≤MPj 6=iI t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut

i) ∧ ∃ j0 : (Iti ,j0 = 1)

o

(45)

Implementation for -sensitive Copeland’s ranking

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut i) ∧ ∃ j

0 : (It

i ,j0 = 1)

o

(46)

Implementation for -sensitive Copeland’s ranking

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut i) ∧ ∃ j

0 : (It

i ,j0 = 1)

o

(47)

Analysis

I Correctness: the probability of a pi ,j is not in the confidence interval of

b

pt_{i ,j} is small

Lemma

The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,

I Expected sample complexity bound

I _{Let’s run RankEl repeatedly for a given instance of PAC rank}

elicitation problem defined by P with fixed parameters

I The number of pairwise comparisons taken is a random variable X

(48)

Analysis

b

pt_{i ,j} is small

Lemma

(49)

Analysis

b

pt_{i ,j} is small

Lemma

(50)

Instantiations of PAC Rank Elicitation Setup

Distance

_NDP

_MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(51)

Rank Elicitation Problem Instances: = 0.01, ≺

CO

P

₁

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

₂

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.6 0.6 0.6 0.6 4 0 a2 0.4 - 0.6 0.6 0.6 3 0 a3 0.4 0.4 - 0.6 0.6 2 0 a4 0.4 0.4 0.4 - 0.6 1 0 a5 0.4 0.4 0.4 0.4 - 0 0

(52)

Rank Elicitation Problem Instances: = 0.01, ≺

CO

P

₁

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

₂

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.6 0.6 0.6 0.6 4 0 a2 0.4 - 0.6 0.6 0.6 3 0 a3 0.4 0.4 - 0.6 0.6 2 0 a4 0.4 0.4 0.4 - 0.6 1 0 a5 0.4 0.4 0.4 0.4 - 0 0

(53)

Rank Elicitation Problem Instances: = 0.01, ≺

CO

_{, ρ = 3}

P

₃

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.1 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

₄

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.1 0.1 0.9 0.9 2 0 a2 0.1 - 0.1 0.1 0.9 1 0 a3 0.1 0.1 - 0.1 0.9 1 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.9 - 1 0

(54)

Rank Elicitation Problem Instances: = 0.01, ≺

CO

_{, ρ = 3}

P

₃

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

₄

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.1 0.1 0.9 0.9 2 0 a2 0.1 - 0.1 0.1 0.9 1 0 a3 0.1 0.1 - 0.1 0.9 1 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.9 - 1 0

(55)

Expected sample complexity

Theorem

The expected sample complexity for RankElCO

dM is O R1log R1 δ , where R1 = M2_−r 1 X r =1 1 ∆(r )+ 2 where I ∆i ,j = |1/2 − pi ,j|

I ∆_{(r )}denotes the r -th smallest value among ∆i ,j for all distinct

i , j ∈ [M]

I r1 depends on the structure of P and on ρ

I _{For RankEl}SE

dM, one can calculate a similar bound, but with

(56)

Expected sample complexity

Theorem

The expected sample complexity for RankElCO

dM is O R1log R1 δ , where R1 = M2_−r 1 X r =1 1 ∆(r )+ 2 where I ∆i ,j = |1/2 − pi ,j|

I ∆_{(r )}denotes the r -th smallest value among ∆i ,j for all distinct

i , j ∈ [M]

I r1 depends on the structure of P and on ρ

I _{For RankEl}SE

dM, one can calculate a similar bound, but with

(57)

(58)

Numerical experiments

I GOAL:Verifying that our sampling strategies more efficient than the uniformsampling

I Bundesliga data by using resampling with replacement

I The confidence parameter was set to 0.05, and thus, the accuracy was significantly higher than 1 − δ in every case.

(59)

Data: Bundesliga

I Last 10 seasons

I Average of 20 mathces I ≺

(60)

Empirical sample complexity

I The complexity of the uniform sampling is taken as 100%

I Percentage of improvement I ρ = 3, = 0.02 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO _{25.8 ± 0.4} _{25.1 ± 0.4} ≺SE _{64.8 ± 0.8} _{61.5 ± 1.0} I ρ = 3, = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO _{32.6 ± 0.5} _{32.8 ± 0.5} ≺SE _{60.3 ± 0.4} _{55.4 ± 0.5}

(61)

Empirical sample complexity

I The complexity of the uniform sampling is taken as 100

I Percentage of improvement I ρ = 5, = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO _{37.6 ± 0.5} _{46.7 ± 0.5} ≺SE _{61.5 ± 0.5} _{74.8 ± 1.5} I ρ = 3, = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO _{32.6 ± 0.5} _{32.8 ± 0.5} ≺SE _{60.3 ± 0.4} _{55.4 ± 0.5}

(62)

Conclusion and future work

I Our algorithms outperform the uniform sampling

I Other ranking procedures can be considered

I Analysis for Number of Discordant Pairs (NDP) ranking distance

(63)

Bibliography I

J.Y. Audibert, R. Munos, and C. Szepesv´ari. Tuning bandit algorithms in stochastic environments. In Proceedings of the Algorithmic Learning Theory, pages 150–165, 2007.

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.

R. Busa-Fekete, B. Szörényi, P. Weng, W. Cheng, and E. Hüllermeier. Top-k selection based on adaptive sampling of noisy preferences. In Proceedings of the 30th ICML, JMLR W&CP, volume 28, 2013.

R. Busa-Fekete, B. Szörényi, and E. Hüllermeier. Pac rank elicitation through adaptive sampling of stochastic pairwise preferences. In Under review, 2014.

Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst., 30(1):6:1–6:41, 2012.

X. Chen, P. N Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 193–202, 2013.

E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and markov decision processes. In Proceedings of the 15th COLT, pages 255–270, 2002.

R. Herbrich, T. Minka, and T. Graepel. TrueSkillTM: A bayesian skill rating system. In Advances in Neural Information Processing Systems, page 569, 2007.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. Pac subset selection in stochastic multi-armed bandits. In Proceedings of the Twenty-ninth International Conference on Machine Learning (ICML 2012), pages 655–662, 2012.

T. L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.

T. Urvoy, F. Clerot, R. F´eraud, and S. Naamane. Generic exploration and k-armed voting bandits. In Proceedings of the 30th ICML, JMLR W&CP, volume 28, pages 91–99, 2013.

Y. Yue, J. Broder, R. Kleinberg, and T. Joachims. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012.

(64)

PAC Rank Elicitation using Ranking Procedures

PAC Rank Elicitation using

Ranking Procedures

Stochastic

multi-armed

bandit setup

Stochastic

multi-armed

bandit setup

Preference-based

stochastic

multi-armed

bandit setup

How to make the setup complete?

Practical applications of Preference-based MAB

PAC Rank Elicitation using

Ranking Procedures

PAC Rank Elicitation Setup: Formal problem

PAC Rank Elicitation Setup: Formal problem

Online learning framework: RankEl

Online learning framework: RankEl

Pairwise probability estimates and their confidence intervals

Pairwise probability estimates and their confidence intervals

Pairwise probability estimates and their confidence intervals

Rank distances

Rank distances

Ranking procedures

Example: Bundesliga

Example: Bundesliga

Ranking procedures

Ranking procedures

Ranking procedures

Ranking procedures

Example: Bundesliga

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

(., .)

d

(., .)

Copeland ≺

RankEl

RankEl

Sum of Exp. ≺

RankEl

RankEl

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

(., .)

d

(., .)

Copeland ≺

RankEl

RankEl

Sum of Exp. ≺

RankEl

RankEl

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

_NDP

_MRD

_NDP

_MRD

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

Implementation for -sensitive SE ranking

_NDP

_MRD

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

Implementation for -sensitive Copeland’s ranking

_NDP

_MRD

Rank Elicitation Problem Instances: = 0.01, ≺

₁

₂

Rank Elicitation Problem Instances: = 0.01, ≺

₁

₂

Rank Elicitation Problem Instances: = 0.01, ≺

_{, ρ = 3}

₃

₄

Rank Elicitation Problem Instances: = 0.01, ≺

_{, ρ = 3}

₃

₄