PAC Rank Elicitation using
Ranking Procedures
R´obert Busa-Fekete1,2, Bal´azs Sz¨or´enyi2,3, Eyke H¨ullermeier1
1Computational Intelligence Group, Philipps-Universit¨at Marburg, GERMANY
2Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, HUNGARY 3INRIA Lille - Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d’Ascq, FRANCE
Pozna´n March 27, 2014
(Value-based) Stochastic Multi-armed Bandits
Preference-based Stochastic Multi-armed Bandits
PAC Rank Elicitation using Ranking Procedures
Numerical experiments
Stochastic
multi-armed
bandit setup
I Setup: [Lai and Robbins, 1985, Auer et al., 2002]
I There are givenM arms/items/options: A = {a1, . . . , aM}
I Each arm ai is associated with adistribution νi I Arm distributions ν1, · · · , νM are not known!
I µi = E [νi] ⇒ total order on arms if µi 6= µj for i 6= j
I Best arm ai∗ where i∗= argmax1≤i ≤Mµi
I At each time step t, the (online) learning algorithm selects an arm it to be sampled, and receives rewardrt i.i.d.∼ νit
Stochastic
multi-armed
bandit setup
I Goal of the (online) learner (also called decision maker/agent):
I Minimize the expectedregret: X i 6=i∗ (µi∗− µi) | {z } ∆i T X t=1 P(it= i ) = X i 6=i∗ ∆i T X t=1 P(it = i )
I UCB, -greedy[Auer et al., 2002], UCBV[Audibert et al., 2007]
I PAC setup: Find an -optimal arm with probability at least 1 − δ I -optimal arm: µi∗− µi = ∆i <
I Based on as few sample as possible
I Sample complexity: number of samples taken prior to termination
Preference-based
stochastic
multi-armed
bandit setup
I Setup:I There are givenM items/arms: A = {a1, . . . , aM}
I Arms can be compared in apairwise manner: ai aj
I Pairwise comparisons obey afixed probability distributionfor each pair of arms, thus,
pi ,j = P(ai aj) = E [I{ai aj}]
I pi ,j is calledpairwise probabilityfor arm i and j
I If pi ,j > 1/2, then arm ai is preferred to arm aj,
or concisely ai “beats” aj
I At each time step t, the (online) learning algorithm selects a
pair of arms it and jt to be compared, and observes
Yit,jt = I{ait ajt}
I Goal of the (online) learner (decision maker/agent):
I Optimize some kind of regret
I Find the best arm
How to make the setup complete?
I Preferential cycles
I pi ,j > 1/2, pj ,k > 1/2 and pk,i > 1/2 (Condorcet paradox) I No best arm ⇒ no reasonable regret either!
I Matrix of pairwise probabilities: P = [pi ,j]1≤i ,j ≤M
I Assumptions on P
I Dueling bandit setup[Yue et al., 2012] I Beat-the-Mean[Yue and Joachims, 2011]
I Preference-based bandits with statistical models[Busa-Fekete et al., 2014]
I No assumption on P, but some ranking procedure is applied I R : P → OA, where OAset of orders on A
I Copeland, Majority vote, Random walk[Busa-Fekete et al., 2013] I For example, ai≺COaj⇔ di< djwhere di= {k ∈ [M]|1/2 < pi ,k}
I R is assumed to be “smooth”[Urvoy et al., 2013]
I This talk: PAC Rank Elicitation[Busa-Fekete et al., 2014]
Practical applications of Preference-based MAB
I Online advertisement[Chapelle et al., 2012]
I Two ads are shown at one time = compare arms
I Crowdsourcing[Chen et al., 2013]
I Amazon Mechanical Turk
I Widely-used platform in Natural Language Processing (NLP) to annotate database (Lexical Substitution, Machine Translation)
I For example: there is given an English sentence and some possible German translations of it. The goal is to find a ranking which reflects to the quality of the translations.
I The annotators are asked in terms of simple questions: which alternative is better out of these two?
PAC Rank Elicitation using
Ranking Procedures
PAC Rank Elicitation Setup: Formal problem
I Ranking procedure: R : P → OA defines a strict order ≺R
over arms
I Ranking distance: d : SM × OA → N0 defines a distance
between a complete ranking and a strict order ≺R over arms
I Definition
An algorithm A a(ρ, δ)-PAC rank elicitation algorithm with respect to a ranking procedure R and rank distance d , if it returns a ranking τ for whichd (τ, ≺R) < ρwith probability at least1 − δ.
PAC Rank Elicitation Setup: Formal problem
I Ranking procedure: R : P → OA defines a strict order ≺R
over arms
I Ranking distance: d : SM × OA → N0 defines a distance
between a complete ranking and a strict order ≺R over arms
I Definition
An algorithm A a(ρ, δ)-PAC rank elicitation algorithm with respect to a ranking procedure R and rank distance d , if it returns a ranking τ for whichd (τ, ≺R) < ρwith probability at least1 − δ.
Online learning framework: RankEl
Select pairs(it, jt)
Observe ot ∼ Yi t ,j t
Update pairwise estimatebp
t i t ,j t Repeat t := t + 1 Continue or terminate? Parameter: δ, ρ Ranking procedure R Rank distance d (., .) Recommendation: d (τ, ≺R) < ρ with probability at least 1 − δ
yes no
I The pairwise probabilities P arenot knownto the learner, but it can gain information about them viasampling!
Online learning framework: RankEl
Select pairs(it, jt)
Observe ot ∼ Yi t ,j t
Update pairwise estimatebp
t i t ,j t Repeat t := t + 1 Continue or terminate? Parameter: δ, τ Ranking procedure R Rank distance d (., .) Recommendation: d (τ, ≺R) < ρ with probability at least 1 − δ
What to sample?
When to stop? What to recommend?
yes no
Pairwise probability estimates and their confidence intervals
I nti ,j: number of comparisons for arms ai and aj up to time t
I t =PM i =1 PM j =1n t i ,j
I bpi ,jt : estimate of the pairwise probabilities pi ,j
I b pt i ,j = 1/nti ,j P t0∈It i ,jo t0 where It i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: ci ,jt = c(ni ,jt , t, δ) = r 1 2nt i ,j ln 5M2t4 4δ
Lemma (like index-based bandits, such as UCB and L-UCB)
The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally
M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pti ,j− ci ,jt ,bpti ,j+ ci ,jt < δ
Pairwise probability estimates and their confidence intervals
I nti ,j: number of comparisons for arms ai and aj up to time t
I t =PM i =1 PM j =1n t i ,j
I bpi ,jt : estimate of the pairwise probabilities pi ,j
I b pt i ,j = 1/nti ,j P t0∈It i ,jo t0 where It i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: ci ,jt = c(ni ,jt , t, δ) = r 1 2nt i ,j ln 5M2t4 4δ
Lemma (like index-based bandits, such as UCB and L-UCB)
The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally
M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pti ,j− ct i ,j,bp t i ,j+ ci ,jt < δ
Pairwise probability estimates and their confidence intervals
I nti ,j: number of comparisons for arms ai and aj up to time t
I t =PM i =1 PM j =1n t i ,j
I bpi ,jt : estimate of the pairwise probabilities pi ,j
I b pt i ,j = 1/nti ,j P t0∈It i ,jo t0 where It i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: ci ,jt = c(ni ,jt , t, δ) = r 1 2nt i ,j ln 5M2t4 4δ
Lemma (like index-based bandits, such as UCB and L-UCB)
The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally
M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pti ,j− ct i ,j,bp t i ,j+ ci ,jt < δ
Rank distances
I Ranking distance: d : SM× OA → N0 defines a distance between a
complete ranking and a strict order ≺R over arms
I Rank distances:
I Number of discordant pairs (NDP) (related to Kendall’s distance)
dK(τ, ≺) = M X i =1 X j 6=i I{τj < τi}I{ai ≺ aj}
I Maximum rank difference (MRD)
dM(τ, ≺) = min
τ0∈L≺1≤i ≤Mmax |τi− τ
0 i|
Rank distances
I Ranking distance: d : SM× OA → N0 defines a distance between a
complete ranking and a strict order ≺R over arms
I Rank distances:
I Number of discordant pairs (NDP) (related to Kendall’s distance)
dK(τ, ≺) = M X i =1 X j 6=i I{τj< τi}I{ai ≺ aj}
I Maximum rank difference (MRD)
dM(τ, ≺) = min
τ0∈L≺1≤i ≤Mmax |τi− τ
0 i|
Ranking procedures
I R : P → OA, where OA set of (strict) orders on A
I Copeland’s order ≺CO
I ai≺COaj⇔ di < dj (di: skill of arm ai)
I di = #{k ∈ [M] | 1/2 < pi ,k}
I Sum of expectations order ≺SE (Majority voting)
I ai≺SEaj ⇔ yi < yj (yi: skill of arm ai)
Example: Bundesliga
I Last 10 seasons
I Average of the outcome of 20 matches
I 1: win, 0: loss, 1/2-1/2: tie
Example: Bundesliga
I Last 10 seasons
I Average of the outcome of 20 matches
I 1: win, 0: loss, 1/2-1/2: tie
Ranking procedures
I What if pi ,j = 1/2 or yi = yj?
I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗
I di∗= # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I -sensitive sum of expectations: ai ≺SE aj ⇔ yi + < yj
I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0
I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]
Ranking procedures
I What if pi ,j = 1/2 or yi = yj?
I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗
I di∗= # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I -sensitive sum of expectations: ai ≺SE aj ⇔ yi + < yj
I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0
I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]
Ranking procedures
I What if pi ,j = 1/2 or yi = yj?
I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗
I di∗= # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I -sensitive sum of expectations: ai ≺SE aj ⇔ yi + < yj
I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0
I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]
Ranking procedures
I What if pi ,j = 1/2 or yi = yj?
I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗
I di∗= # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I -sensitive sum of expectations: ai ≺SE aj ⇔ yi + < yj
I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0
I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]
Example: Bundesliga
I Last 10 seasons
I Average of the outcome of 20 matches
I 1: win, 0: loss, 1/2-1/2: tie
Instantiations of PAC Rank Elicitation Setup
Distance
NDP
MRD
Procedure
d
K(., .)
d
M(., .)
Copeland ≺
CORankEl
COdKRankEl
CO dMSum of Exp. ≺
SERankEl
SEdKRankEl
SE dMInstantiations of PAC Rank Elicitation Setup
Distance
NDP
MRD
Procedure
d
K(., .)
d
M(., .)
Copeland ≺
CORankEl
COdKRankEl
CO dMSum of Exp. ≺
SERankEl
SEdKRankEl
SE dMImplementation for -sensitive SE ranking
I Each arm ai is associated with an interval [yi, yi+ ]
I ai≺SEaj ⇔ yi+ < yj I yi = M−11 P k6=ipi ,k I Estimate for yi: I b yi = M−11 Pk6=ibp t i ,k
I Confidence interval for byi:
I ct i = 1 M−1 P k6=ic t i ,k I Since pi ,j ∈ [bp t i ,j − ci ,jt ,bp t
i ,j− ci ,jt ] for any 1 ≤ j ≤ M, therefore
yi ∈ [byi− c
t i,byi + c
t
Implementation for -sensitive SE ranking
I |yi − yj| > and the overlapping part >
0 0.5 1 yi b yit− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z }
I If we are not lucky, then yj+ < yi ⇒ aj ≺SEai
I |yi − yj| > and the overlapping part ≤
0 0.5y 1 i b yit b yit− ct i by t i + c t i yj b yjt b yt j − cjt by t j + cjt | {z }
I Even if we are not lucky, we know that yj+ 6< yi
I Therefore aj ⊀SEa
i
I |yi − yj| < (overlapping part becomes < , sooner or later)
0 0.5 1
yi yj
Implementation for -sensitive SE ranking
I |yi − yj| > and the overlapping part >
0 0.5 1 yi b yit− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z }
I If we are not lucky, then yj+ < yi ⇒ aj ≺SEai
I |yi − yj| > and the overlapping part ≤
0 0.5 1 yi b yit b yit− ct i yb t i + c t i yj b yjt b yt j − cjt by t j + cjt | {z }
I Even if we are not lucky, we know that yj+ 6< yi
I Therefore aj ⊀SE a
i
I |yi − yj| < (overlapping part becomes < , sooner or later)
0 0.5 1
yi yj
Implementation for -sensitive SE ranking
I |yi − yj| > and the overlapping part >
0 0.5 1 yi b yit− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z }
I If we are not lucky, then yj+ < yi ⇒ aj ≺SEai
I |yi − yj| > and the overlapping part ≤
0 0.5 1 yi b yit b yit− ct i yb t i + c t i yj b yjt b yt j − cjt by t j + cjt | {z }
I Even if we are not lucky, we know that yj+ 6< yi
I Therefore aj ⊀SE a
i
I |yi − yj| < (overlapping part becomes < , sooner or later)
0 0.5 1
yi yj
Implementation for -sensitive SE ranking
I Summarising these observations: If the overlapping part of confidence interval is smaller than for arms ai and aj, then we can decide their
order with respect to ≺SE
I Lemma
Letσt be the order according tobyit values, and Oti ,j be the indicator of two
confidence intervals are overlapping more than :
Oti ,j = I{|[byit− cit,by t i + cit] ∩ [by t j − cjt,by t j + cjt]| > }
Then for any time step t, and for any sampling strategy,
dK(σt, ≺SE) ≤ 1 2 M X i =1 X j 6=i Oti ,j and dM σt, ≺SE ≤ max 1≤i ≤M X j 6=i Oti ,j
Implementation for -sensitive SE ranking
I Summarising these observations: If the overlapping part of confidence interval is smaller than for arms ai and aj, then we can decide their
order with respect to ≺SE
I Lemma
Let σt be the order according tobyit values, and Oti ,j be the indicator of two
confidence intervals are overlapping more than :
Oti ,j = I{|[byit− cit,by t i + cit] ∩ [by t j − cjt,by t j + cjt]| > }
Then for any time step t, and for any sampling strategy,
dK(σt, ≺SE) ≤ 1 2 M X i =1 X j 6=i Oti ,j and dM σt, ≺SE ≤ max 1≤i ≤M X j 6=i Oti ,j
Implementation for -sensitive SE ranking
I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρI For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)
I Those pairs whose intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0
o
I What to recommend?:
I σt which sorts the arms based on b yt
1, . . . ,by t M
Implementation for -sensitive SE ranking
I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρI For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)
I Those pairs whose intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0
o
I What to recommend?:
I σt which sorts the arms based on b yt
1, . . . ,by t M
Implementation for -sensitive SE ranking
I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρI For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)
I Those pairs whose intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0
o
I What to recommend?:
I σt which sorts the arms based on b yt
1, . . . ,by t M
Implementation for -sensitive SE ranking
I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρI For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)
I Those pairs whose intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0
o
I What to recommend?:
I σt which sorts the arms based on b yt
1, . . . ,by t M
Instantiations of PAC Rank Elicitation Setup
Distance
NDP
MRD
Procedure
d
K(., .)
d
M(., .)
Copeland ≺
CORankEl
COdKRankEl
CO dMSum of Exp. ≺
SERankEl
SEdKRankEl
SE dMImplementation for -sensitive Copeland’s ranking
I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗
I d∗
i = # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I Based on theempirical estimates, define an interval [dit, dit+ uti] for each arm ai such that
di∗∈ [dit, dit+ uti] where dit= #Dit = #j | 1/2 − <bp t i ,j− ci ,jt , j 6= i uit= #Uit = #j | [1/2 − , 1/2 + ] ⊆ [bpti ,j− ct i ,j,bp t i ,j + ci ,jt ], j 6= i
I dit is the number of options that are already known to be beaten by ai
I uit denotes the number of“undecided” pairwise preferences for arm aj
I Assume: [dit, dit+ uti] ≺ [djt, djt+ ujt] I di∗ ≤ dt i + uit < |{z} assume djt ≤ dj∗ ≤ dj∗+ sj∗ ⇒ aj ⊀CO ai
Implementation for -sensitive Copeland’s ranking
I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗
I d∗
i = # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I Based on theempirical estimates, define an interval [dit, dit+ uti] for each arm ai such that
di∗∈ [dit, dit+ uti] where dit= #Dit = #j | 1/2 − <bp t i ,j− ci ,jt , j 6= i uit= #Uit = #j | [1/2 − , 1/2 + ] ⊆ [bpi ,jt − ct i ,j,bp t i ,j + ci ,jt ], j 6= i
I dit is the number of options that are already known to be beaten by ai
I uit denotes the number of“undecided” pairwise preferences for arm aj
I Assume: [dit, dit+ uti] ≺ [djt, djt+ ujt] I di∗ ≤ dt i + uit < |{z} assume djt ≤ dj∗ ≤ dj∗+ sj∗ ⇒ aj ⊀CO ai
Implementation for -sensitive Copeland’s ranking
I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗
I d∗
i = # {k | 1/2 + < pi ,k, i 6= k} (number of arms beaten by ai with a margin )
I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )
I Based on theempirical estimates, define an interval [dit, dit+ uti] for each arm ai such that
di∗∈ [dit, dit+ uti] where dit= #Dit = #j | 1/2 − <bp t i ,j− ci ,jt , j 6= i uit= #Uit = #j | [1/2 − , 1/2 + ] ⊆ [bpi ,jt − ct i ,j,bp t i ,j + ci ,jt ], j 6= i
I dit is the number of options that are already known to be beaten by ai
I uit denotes the number of“undecided” pairwise preferences for arm aj
I Assume: [dit, dit+ uti] ≺ [djt, djt+ ujt] I di∗ ≤ dt i + uit < |{z} assume djt ≤ dj∗ ≤ dj∗+ sj∗ ⇒ aj ⊀CO a i
Implementation for -sensitive Copeland’s ranking
LemmaDefine a rankingτt over arms by sorting the arms ai in decreasing order
according to dt
i, and in case of a tie (dit = djt) according to the sum
dit+ uti. And let
Iti ,j = I{(dit < djt+ ujt) ∧ (djt< dit+ uit)}
for all 1 ≤ i 6= j ≤ M. Then for any time step t, and for any sampling strategy, dK(τt, ≺CO) ≤ 1 2 M X i =1 X j 6=i Iti ,j
holds with probability at least 1 − δ, and dM(τt, ≺CO) ≤ max
1≤i ≤M
X
j 6=i
Iti ,j
Implementation for -sensitive Copeland’s ranking
I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iIti ,j< ρI For MRD distance: dM(τt, ≺CO) ≤ max1≤i ≤MPj 6=iI t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | (j ∈ Ut
i) ∧ ∃ j0 : (Iti ,j0 = 1)
I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0
o
I What to recommend?:
I τt which sorts the items based on d1t, . . . , dMt
Implementation for -sensitive Copeland’s ranking
I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρI For MRD distance: dM(τt, ≺CO) ≤ max
1≤i ≤MPj 6=iI t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | (j ∈ Ut
i) ∧ ∃ j0 : (Iti ,j0 = 1)
I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0
o
I What to recommend?:
I τt which sorts the items based on d1t, . . . , dMt
Implementation for -sensitive Copeland’s ranking
I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρI For MRD distance: dM(τt, ≺CO) ≤ max
1≤i ≤MPj 6=iI t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | (j ∈ Ut i) ∧ ∃ j
0 : (It
i ,j0 = 1)
I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0
o
I What to recommend?:
I τt which sorts the items based on d1t, . . . , dMt
Implementation for -sensitive Copeland’s ranking
I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρI For MRD distance: dM(τt, ≺CO) ≤ max
1≤i ≤MPj 6=iI t i ,j< ρ
I What to sample?:
I For NDP distance: (i, j) | (j ∈ Ut i) ∧ ∃ j
0 : (It
i ,j0 = 1)
I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping
I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0
o
I What to recommend?:
I τt which sorts the items based on d1t, . . . , dMt
Analysis
I Correctness: the probability of a pi ,j is not in the confidence interval of
b
pti ,j is small
Lemma
The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,
M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pti ,j− ci ,jt ,bpti ,j+ ci ,jt < δ
I Expected sample complexity bound
I Let’s run RankEl repeatedly for a given instance of PAC rank
elicitation problem defined by P with fixed parameters
I The number of pairwise comparisons taken is a random variable X
Analysis
I Correctness: the probability of a pi ,j is not in the confidence interval of
b
pti ,j is small
Lemma
The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,
M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pti ,j− ci ,jt ,bpti ,j+ ci ,jt < δ
I Expected sample complexity bound
I Let’s run RankEl repeatedly for a given instance of PAC rank
elicitation problem defined by P with fixed parameters
I The number of pairwise comparisons taken is a random variable X
Analysis
I Correctness: the probability of a pi ,j is not in the confidence interval of
b
pti ,j is small
Lemma
The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,
M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/ b pti ,j− ci ,jt ,bpti ,j+ ci ,jt < δ
I Expected sample complexity bound
I Let’s run RankEl repeatedly for a given instance of PAC rank
elicitation problem defined by P with fixed parameters
I The number of pairwise comparisons taken is a random variable X
Instantiations of PAC Rank Elicitation Setup
Distance
NDP
MRD
Procedure
d
K(., .)
d
M(., .)
Copeland ≺
CORankEl
COdKRankEl
CO dMSum of Exp. ≺
SERankEl
SEdKRankEl
SE dMRank Elicitation Problem Instances: = 0.01, ≺
COP
1
=
a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0P
2
=
a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.6 0.6 0.6 0.6 4 0 a2 0.4 - 0.6 0.6 0.6 3 0 a3 0.4 0.4 - 0.6 0.6 2 0 a4 0.4 0.4 0.4 - 0.6 1 0 a5 0.4 0.4 0.4 0.4 - 0 0Rank Elicitation Problem Instances: = 0.01, ≺
COP
1
=
a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0P
2
=
a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.6 0.6 0.6 0.6 4 0 a2 0.4 - 0.6 0.6 0.6 3 0 a3 0.4 0.4 - 0.6 0.6 2 0 a4 0.4 0.4 0.4 - 0.6 1 0 a5 0.4 0.4 0.4 0.4 - 0 0Rank Elicitation Problem Instances: = 0.01, ≺
CO, ρ = 3
P
3
=
a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.1 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0P
4
=
a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.1 0.1 0.9 0.9 2 0 a2 0.1 - 0.1 0.1 0.9 1 0 a3 0.1 0.1 - 0.1 0.9 1 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.9 - 1 0Rank Elicitation Problem Instances: = 0.01, ≺
CO, ρ = 3
P
3
=
a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0P
4
=
a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.1 0.1 0.9 0.9 2 0 a2 0.1 - 0.1 0.1 0.9 1 0 a3 0.1 0.1 - 0.1 0.9 1 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.9 - 1 0Expected sample complexity
TheoremThe expected sample complexity for RankElCO
dM is O R1log R1 δ , where R1 = M2−r 1 X r =1 1 ∆(r )+ 2 where I ∆i ,j = |1/2 − pi ,j|
I ∆(r )denotes the r -th smallest value among ∆i ,j for all distinct
i , j ∈ [M]
I r1 depends on the structure of P and on ρ
I For RankElSE
dM, one can calculate a similar bound, but with
Expected sample complexity
TheoremThe expected sample complexity for RankElCO
dM is O R1log R1 δ , where R1 = M2−r 1 X r =1 1 ∆(r )+ 2 where I ∆i ,j = |1/2 − pi ,j|
I ∆(r )denotes the r -th smallest value among ∆i ,j for all distinct
i , j ∈ [M]
I r1 depends on the structure of P and on ρ
I For RankElSE
dM, one can calculate a similar bound, but with
Numerical experiments
I GOAL:Verifying that our sampling strategies more efficient than the uniformsampling
I Bundesliga data by using resampling with replacement
I The confidence parameter was set to 0.05, and thus, the accuracy was significantly higher than 1 − δ in every case.
Data: Bundesliga
I Last 10 seasons
I Average of 20 mathces I ≺
Empirical sample complexity
I The complexity of the uniform sampling is taken as 100%
I Percentage of improvement I ρ = 3, = 0.02 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 25.8 ± 0.4 25.1 ± 0.4 ≺SE 64.8 ± 0.8 61.5 ± 1.0 I ρ = 3, = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 32.6 ± 0.5 32.8 ± 0.5 ≺SE 60.3 ± 0.4 55.4 ± 0.5
Empirical sample complexity
I The complexity of the uniform sampling is taken as 100
I Percentage of improvement I ρ = 5, = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 37.6 ± 0.5 46.7 ± 0.5 ≺SE 61.5 ± 0.5 74.8 ± 1.5 I ρ = 3, = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 32.6 ± 0.5 32.8 ± 0.5 ≺SE 60.3 ± 0.4 55.4 ± 0.5
Conclusion and future work
I Our algorithms outperform the uniform sampling
I Other ranking procedures can be considered
I Analysis for Number of Discordant Pairs (NDP) ranking distance
Bibliography I
J.Y. Audibert, R. Munos, and C. Szepesv´ari. Tuning bandit algorithms in stochastic environments. In Proceedings of the Algorithmic Learning Theory, pages 150–165, 2007.
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.
R. Busa-Fekete, B. Sz¨or´enyi, P. Weng, W. Cheng, and E. H¨ullermeier. Top-k selection based on adaptive sampling of noisy preferences. In Proceedings of the 30th ICML, JMLR W&CP, volume 28, 2013.
R. Busa-Fekete, B. Sz¨or´enyi, and E. H¨ullermeier. Pac rank elicitation through adaptive sampling of stochastic pairwise preferences. In Under review, 2014.
Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst., 30(1):6:1–6:41, 2012.
X. Chen, P. N Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 193–202, 2013.
E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and markov decision processes. In Proceedings of the 15th COLT, pages 255–270, 2002.
R. Herbrich, T. Minka, and T. Graepel. TrueSkillTM: A bayesian skill rating system. In Advances in Neural Information Processing Systems, page 569, 2007.
S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. Pac subset selection in stochastic multi-armed bandits. In Proceedings of the Twenty-ninth International Conference on Machine Learning (ICML 2012), pages 655–662, 2012.
T. L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.
T. Urvoy, F. Clerot, R. F´eraud, and S. Naamane. Generic exploration and k-armed voting bandits. In Proceedings of the 30th ICML, JMLR W&CP, volume 28, pages 91–99, 2013.
Y. Yue, J. Broder, R. Kleinberg, and T. Joachims. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012.