• Nie Znaleziono Wyników

PAC Rank Elicitation using Ranking Procedures

N/A
N/A
Protected

Academic year: 2021

Share "PAC Rank Elicitation using Ranking Procedures"

Copied!
64
0
0

Pełen tekst

(1)

PAC Rank Elicitation using

Ranking Procedures

R´obert Busa-Fekete1,2, Bal´azs Sz¨or´enyi2,3, Eyke H¨ullermeier1

1Computational Intelligence Group, Philipps-Universit¨at Marburg, GERMANY

2Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, HUNGARY 3INRIA Lille - Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d’Ascq, FRANCE

Pozna´n March 27, 2014

(2)

(Value-based) Stochastic Multi-armed Bandits

Preference-based Stochastic Multi-armed Bandits

PAC Rank Elicitation using Ranking Procedures

Numerical experiments

(3)

Stochastic

multi-armed

bandit setup

I Setup: [Lai and Robbins, 1985, Auer et al., 2002]

I There are givenM arms/items/options: A = {a1, . . . , aM}

I Each arm ai is associated with adistribution νi I Arm distributions ν1, · · · , νM are not known!

I µi = E [νi] ⇒ total order on arms if µi 6= µj for i 6= j

I Best arm ai∗ where i∗= argmax1≤i ≤Mµi

I At each time step t, the (online) learning algorithm selects an arm it to be sampled, and receives rewardrt i.i.d.∼ νit

(4)

Stochastic

multi-armed

bandit setup

I Goal of the (online) learner (also called decision maker/agent):

I Minimize the expectedregret: X i 6=i∗ (µi∗− µi) | {z } ∆i T X t=1 P(it= i ) = X i 6=i∗ ∆i T X t=1 P(it = i )

I UCB, -greedy[Auer et al., 2002], UCBV[Audibert et al., 2007]

I PAC setup: Find an -optimal arm with probability at least 1 − δ I -optimal arm: µi∗− µi = ∆i < 

I Based on as few sample as possible

I Sample complexity: number of samples taken prior to termination

(5)

Preference-based

stochastic

multi-armed

bandit setup

I Setup:

I There are givenM items/arms: A = {a1, . . . , aM}

I Arms can be compared in apairwise manner: ai  aj

I Pairwise comparisons obey afixed probability distributionfor each pair of arms, thus,

pi ,j = P(ai  aj) = E [I{ai  aj}]

I pi ,j is calledpairwise probabilityfor arm i and j

I If pi ,j > 1/2, then arm ai is preferred to arm aj,

or concisely ai “beats” aj

I At each time step t, the (online) learning algorithm selects a

pair of arms it and jt to be compared, and observes

Yit,jt = I{ait  ajt}

I Goal of the (online) learner (decision maker/agent):

I Optimize some kind of regret

I Find the best arm

(6)

How to make the setup complete?

I Preferential cycles

I pi ,j > 1/2, pj ,k > 1/2 and pk,i > 1/2 (Condorcet paradox) I No best arm ⇒ no reasonable regret either!

I Matrix of pairwise probabilities: P = [pi ,j]1≤i ,j ≤M

I Assumptions on P

I Dueling bandit setup[Yue et al., 2012] I Beat-the-Mean[Yue and Joachims, 2011]

I Preference-based bandits with statistical models[Busa-Fekete et al., 2014]

I No assumption on P, but some ranking procedure is applied I R : P → OA, where OAset of orders on A

I Copeland, Majority vote, Random walk[Busa-Fekete et al., 2013] I For example, ai≺COaj⇔ di< djwhere di= {k ∈ [M]|1/2 < pi ,k}

I R is assumed to be “smooth”[Urvoy et al., 2013]

I This talk: PAC Rank Elicitation[Busa-Fekete et al., 2014]

(7)

Practical applications of Preference-based MAB

I Online advertisement[Chapelle et al., 2012]

I Two ads are shown at one time = compare arms

I Crowdsourcing[Chen et al., 2013]

I Amazon Mechanical Turk

I Widely-used platform in Natural Language Processing (NLP) to annotate database (Lexical Substitution, Machine Translation)

I For example: there is given an English sentence and some possible German translations of it. The goal is to find a ranking which reflects to the quality of the translations.

I The annotators are asked in terms of simple questions: which alternative is better out of these two?

(8)

PAC Rank Elicitation using

Ranking Procedures

(9)

PAC Rank Elicitation Setup: Formal problem

I Ranking procedure: R : P → OA defines a strict order ≺R

over arms

I Ranking distance: d : SM × OA → N0 defines a distance

between a complete ranking and a strict order ≺R over arms

I Definition

An algorithm A a(ρ, δ)-PAC rank elicitation algorithm with respect to a ranking procedure R and rank distance d , if it returns a ranking τ for whichd (τ, ≺R) < ρwith probability at least1 − δ.

(10)

PAC Rank Elicitation Setup: Formal problem

I Ranking procedure: R : P → OA defines a strict order ≺R

over arms

I Ranking distance: d : SM × OA → N0 defines a distance

between a complete ranking and a strict order ≺R over arms

I Definition

An algorithm A a(ρ, δ)-PAC rank elicitation algorithm with respect to a ranking procedure R and rank distance d , if it returns a ranking τ for whichd (τ, ≺R) < ρwith probability at least1 − δ.

(11)

Online learning framework: RankEl

Select pairs(it, jt)

Observe ot ∼ Yi t ,j t

Update pairwise estimatebp

t i t ,j t Repeat t := t + 1 Continue or terminate? Parameter: δ, ρ Ranking procedure R Rank distance d (., .) Recommendation: d (τ, ≺R) < ρ with probability at least 1 − δ

yes no

I The pairwise probabilities P arenot knownto the learner, but it can gain information about them viasampling!

(12)

Online learning framework: RankEl

Select pairs(it, jt)

Observe ot ∼ Yi t ,j t

Update pairwise estimatebp

t i t ,j t Repeat t := t + 1 Continue or terminate? Parameter: δ, τ Ranking procedure R Rank distance d (., .) Recommendation: d (τ, ≺R) < ρ with probability at least 1 − δ

What to sample?

When to stop? What to recommend?

yes no

(13)

Pairwise probability estimates and their confidence intervals

I nti ,j: number of comparisons for arms ai and aj up to time t

I t =PM i =1 PM j =1n t i ,j

I bpi ,jt : estimate of the pairwise probabilities pi ,j

I b pt i ,j = 1/nti ,j P t0∈It i ,jo t0 where It i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: ci ,jt = c(ni ,jt , t, δ) = r 1 2nt i ,j ln  5M2t4 4δ 

Lemma (like index-based bandits, such as UCB and L-UCB)

The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/  b pti ,j− ci ,jt ,bpti ,j+ ci ,jt  < δ

(14)

Pairwise probability estimates and their confidence intervals

I nti ,j: number of comparisons for arms ai and aj up to time t

I t =PM i =1 PM j =1n t i ,j

I bpi ,jt : estimate of the pairwise probabilities pi ,j

I b pt i ,j = 1/nti ,j P t0∈It i ,jo t0 where It i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: ci ,jt = c(ni ,jt , t, δ) = r 1 2nt i ,j ln  5M2t4 4δ 

Lemma (like index-based bandits, such as UCB and L-UCB)

The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/  b pti ,j− ct i ,j,bp t i ,j+ ci ,jt  < δ

(15)

Pairwise probability estimates and their confidence intervals

I nti ,j: number of comparisons for arms ai and aj up to time t

I t =PM i =1 PM j =1n t i ,j

I bpi ,jt : estimate of the pairwise probabilities pi ,j

I b pt i ,j = 1/nti ,j P t0∈It i ,jo t0 where It i ,j = {t0∈ [t]|(it, jt) = (i , j )} I Confidence interval: ci ,jt = c(ni ,jt , t, δ) = r 1 2nt i ,j ln  5M2t4 4δ 

Lemma (like index-based bandits, such as UCB and L-UCB)

The confidence intervals (above) are valid for any time t, for any pairs of arms, and forany sampling strategy, formally

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/  b pti ,j− ct i ,j,bp t i ,j+ ci ,jt  < δ

(16)

Rank distances

I Ranking distance: d : SM× OA → N0 defines a distance between a

complete ranking and a strict order ≺R over arms

I Rank distances:

I Number of discordant pairs (NDP) (related to Kendall’s distance)

dK(τ, ≺) = M X i =1 X j 6=i I{τj < τi}I{ai ≺ aj}

I Maximum rank difference (MRD)

dM(τ, ≺) = min

τ0∈L1≤i ≤Mmax |τi− τ

0 i|

(17)

Rank distances

I Ranking distance: d : SM× OA → N0 defines a distance between a

complete ranking and a strict order ≺R over arms

I Rank distances:

I Number of discordant pairs (NDP) (related to Kendall’s distance)

dK(τ, ≺) = M X i =1 X j 6=i I{τj< τi}I{ai ≺ aj}

I Maximum rank difference (MRD)

dM(τ, ≺) = min

τ0∈L1≤i ≤Mmax |τi− τ

0 i|

(18)

Ranking procedures

I R : P → OA, where OA set of (strict) orders on A

I Copeland’s order ≺CO

I ai≺COaj⇔ di < dj (di: skill of arm ai)

I di = #{k ∈ [M] | 1/2 < pi ,k}

I Sum of expectations order ≺SE (Majority voting)

I ai≺SEaj ⇔ yi < yj (yi: skill of arm ai)

(19)

Example: Bundesliga

I Last 10 seasons

I Average of the outcome of 20 matches

I 1: win, 0: loss, 1/2-1/2: tie

(20)

Example: Bundesliga

I Last 10 seasons

I Average of the outcome of 20 matches

I 1: win, 0: loss, 1/2-1/2: tie

(21)

Ranking procedures

I What if pi ,j = 1/2 or yi = yj?

I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗

I di∗= # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I -sensitive sum of expectations: ai ≺SE aj ⇔ yi +  < yj

I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0

I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]

(22)

Ranking procedures

I What if pi ,j = 1/2 or yi = yj?

I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗

I di∗= # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I -sensitive sum of expectations: ai ≺SE aj ⇔ yi +  < yj

I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0

I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]

(23)

Ranking procedures

I What if pi ,j = 1/2 or yi = yj?

I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗

I di∗= # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I -sensitive sum of expectations: ai ≺SE aj ⇔ yi +  < yj

I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0

I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]

(24)

Ranking procedures

I What if pi ,j = 1/2 or yi = yj?

I -sensitiveextension to make the relations ≺CO and ≺SE morepartial I -sensitive Copeland’s order: ai ≺CO aj ⇔ di∗+ si∗ < dj∗

I di∗= # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I -sensitive sum of expectations: ai ≺SE aj ⇔ yi +  < yj

I They are interval orders: [a, b] ≺ [a0, b0] iff b < a0

I -sensitive Copeland’s order: [di∗, di∗+ si∗] ≺ [dj∗, dj∗+ sj∗]

(25)

Example: Bundesliga

I Last 10 seasons

I Average of the outcome of 20 matches

I 1: win, 0: loss, 1/2-1/2: tie

(26)

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(27)

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(28)

Implementation for -sensitive SE ranking

I Each arm ai is associated with an interval [yi, yi+ ]

I ai≺SEaj ⇔ yi+  < yj I yi = M−11 P k6=ipi ,k I Estimate for yi: I b yi = M−11 Pk6=ibp t i ,k

I Confidence interval for byi:

I ct i = 1 M−1 P k6=ic t i ,k I Since pi ,j ∈ [bp t i ,j − ci ,jt ,bp t

i ,j− ci ,jt ] for any 1 ≤ j ≤ M, therefore

yi ∈ [byi− c

t i,byi + c

t

(29)

Implementation for -sensitive SE ranking

I |yi − yj| >  and the overlapping part > 

0 0.5 1 yi b yit− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z } 

I If we are not lucky, then yj+  < yi ⇒ aj ≺SEai

I |yi − yj| >  and the overlapping part ≤ 

0 0.5y 1 i b yit b yit− ct i by t i + c t i yj b yjt b yt j − cjt by t j + cjt | {z } 

I Even if we are not lucky, we know that yj+  6< yi

I Therefore aj ⊀SEa

i

I |yi − yj| <  (overlapping part becomes < , sooner or later)

0 0.5 1

yi yj

(30)

Implementation for -sensitive SE ranking

I |yi − yj| >  and the overlapping part > 

0 0.5 1 yi b yit− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z } 

I If we are not lucky, then yj+  < yi ⇒ aj ≺SEai

I |yi − yj| >  and the overlapping part ≤ 

0 0.5 1 yi b yit b yit− ct i yb t i + c t i yj b yjt b yt j − cjt by t j + cjt | {z } 

I Even if we are not lucky, we know that yj+  6< yi

I Therefore aj ⊀SE a

i

I |yi − yj| <  (overlapping part becomes < , sooner or later)

0 0.5 1

yi yj

(31)

Implementation for -sensitive SE ranking

I |yi − yj| >  and the overlapping part > 

0 0.5 1 yi b yit− ct i by t i + c t i yj b yt j − cjt by t j + cjt | {z } 

I If we are not lucky, then yj+  < yi ⇒ aj ≺SEai

I |yi − yj| >  and the overlapping part ≤ 

0 0.5 1 yi b yit b yit− ct i yb t i + c t i yj b yjt b yt j − cjt by t j + cjt | {z } 

I Even if we are not lucky, we know that yj+  6< yi

I Therefore aj ⊀SE a

i

I |yi − yj| <  (overlapping part becomes < , sooner or later)

0 0.5 1

yi yj

(32)

Implementation for -sensitive SE ranking

I Summarising these observations: If the overlapping part of confidence interval is smaller than  for arms ai and aj, then we can decide their

order with respect to ≺SE

I Lemma

Letσt be the order according tobyit values, and Oti ,j be the indicator of two

confidence intervals are overlapping more than :

Oti ,j = I{|[byit− cit,by t i + cit] ∩ [by t j − cjt,by t j + cjt]| > }

Then for any time step t, and for any sampling strategy,

dK(σt, ≺SE) ≤ 1 2 M X i =1 X j 6=i Oti ,j and dM σt, ≺SE ≤ max 1≤i ≤M X j 6=i Oti ,j

(33)

Implementation for -sensitive SE ranking

I Summarising these observations: If the overlapping part of confidence interval is smaller than  for arms ai and aj, then we can decide their

order with respect to ≺SE

I Lemma

Let σt be the order according tobyit values, and Oti ,j be the indicator of two

confidence intervals are overlapping more than :

Oti ,j = I{|[byit− cit,by t i + cit] ∩ [by t j − cjt,by t j + cjt]| > }

Then for any time step t, and for any sampling strategy,

dK(σt, ≺SE) ≤ 1 2 M X i =1 X j 6=i Oti ,j and dM σt, ≺SE ≤ max 1≤i ≤M X j 6=i Oti ,j

(34)

Implementation for -sensitive SE ranking

I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρ

I For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)

I Those pairs whose intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0

o

I What to recommend?:

I σt which sorts the arms based on b yt

1, . . . ,by t M

(35)

Implementation for -sensitive SE ranking

I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρ

I For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)

I Those pairs whose intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0

o

I What to recommend?:

I σt which sorts the arms based on b yt

1, . . . ,by t M

(36)

Implementation for -sensitive SE ranking

I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρ

I For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)

I Those pairs whose intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0

o

I What to recommend?:

I σt which sorts the arms based on b yt

1, . . . ,by t M

(37)

Implementation for -sensitive SE ranking

I When to stop?: I For NDP distance: dK(σt, ≺SE) ≤12 PM i =1 P j 6=iO t i ,j< ρ

I For MRD distance: dM σt, ≺SE ≤ max1≤i ≤MPj 6=iO t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | ∃ j0: (Ot i ,j0 = 1)

I Those pairs whose intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | ρ ≤P j06=iOti ,j0

o

I What to recommend?:

I σt which sorts the arms based on b yt

1, . . . ,by t M

(38)

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(39)

Implementation for -sensitive Copeland’s ranking

I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗

I d∗

i = # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I Based on theempirical estimates, define an interval [dit, dit+ uti] for each arm ai such that

di∗∈ [dit, dit+ uti] where dit= #Dit = #j | 1/2 −  <bp t i ,j− ci ,jt , j 6= i uit= #Uit = #j | [1/2 − , 1/2 + ] ⊆ [bpti ,j− ct i ,j,bp t i ,j + ci ,jt ], j 6= i

I dit is the number of options that are already known to be beaten by ai

I uit denotes the number of“undecided” pairwise preferences for arm aj

I Assume: [dit, dit+ uti] ≺ [djt, djt+ ujt] I di∗ ≤ dt i + uit < |{z} assume djt ≤ dj∗ ≤ dj∗+ sj∗ ⇒ aj ⊀CO ai

(40)

Implementation for -sensitive Copeland’s ranking

I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗

I d∗

i = # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I Based on theempirical estimates, define an interval [dit, dit+ uti] for each arm ai such that

di∗∈ [dit, dit+ uti] where dit= #Dit = #j | 1/2 −  <bp t i ,j− ci ,jt , j 6= i uit= #Uit = #j | [1/2 − , 1/2 + ] ⊆ [bpi ,jt − ct i ,j,bp t i ,j + ci ,jt ], j 6= i

I dit is the number of options that are already known to be beaten by ai

I uit denotes the number of“undecided” pairwise preferences for arm aj

I Assume: [dit, dit+ uti] ≺ [djt, djt+ ujt] I di∗ ≤ dt i + uit < |{z} assume djt ≤ dj∗ ≤ dj∗+ sj∗ ⇒ aj ⊀CO ai

(41)

Implementation for -sensitive Copeland’s ranking

I -sensitive Copeland’s order: ai ≺COaj ⇔ di∗+ si∗< dj∗

I d∗

i = # {k | 1/2 +  < pi ,k, i 6= k} (number of arms beaten by ai with a margin )

I si∗= # {k : |1/2 − pi ,k| ≤ , i 6= k} (number of ties with a margin )

I Based on theempirical estimates, define an interval [dit, dit+ uti] for each arm ai such that

di∗∈ [dit, dit+ uti] where dit= #Dit = #j | 1/2 −  <bp t i ,j− ci ,jt , j 6= i uit= #Uit = #j | [1/2 − , 1/2 + ] ⊆ [bpi ,jt − ct i ,j,bp t i ,j + ci ,jt ], j 6= i

I dit is the number of options that are already known to be beaten by ai

I uit denotes the number of“undecided” pairwise preferences for arm aj

I Assume: [dit, dit+ uti] ≺ [djt, djt+ ujt] I di∗ ≤ dt i + uit < |{z} assume djt ≤ dj∗ ≤ dj∗+ sj∗ ⇒ aj CO a i

(42)

Implementation for -sensitive Copeland’s ranking

Lemma

Define a rankingτt over arms by sorting the arms ai in decreasing order

according to dt

i, and in case of a tie (dit = djt) according to the sum

dit+ uti. And let

Iti ,j = I{(dit < djt+ ujt) ∧ (djt< dit+ uit)}

for all 1 ≤ i 6= j ≤ M. Then for any time step t, and for any sampling strategy, dK(τt, ≺CO) ≤ 1 2 M X i =1 X j 6=i Iti ,j

holds with probability at least 1 − δ, and dM(τt, ≺CO) ≤ max

1≤i ≤M

X

j 6=i

Iti ,j

(43)

Implementation for -sensitive Copeland’s ranking

I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iIti ,j< ρ

I For MRD distance: dM(τt, ≺CO) ≤ max1≤i ≤MPj 6=iI t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut

i) ∧ ∃ j0 : (Iti ,j0 = 1)

I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0

o

I What to recommend?:

I τt which sorts the items based on d1t, . . . , dMt

(44)

Implementation for -sensitive Copeland’s ranking

I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρ

I For MRD distance: dMt, ≺CO) ≤ max

1≤i ≤MPj 6=iI t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut

i) ∧ ∃ j0 : (Iti ,j0 = 1)

I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0

o

I What to recommend?:

I τt which sorts the items based on d1t, . . . , dMt

(45)

Implementation for -sensitive Copeland’s ranking

I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρ

I For MRD distance: dMt, ≺CO) ≤ max

1≤i ≤MPj 6=iI t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut i) ∧ ∃ j

0 : (It

i ,j0 = 1)

I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0

o

I What to recommend?:

I τt which sorts the items based on d1t, . . . , dMt

(46)

Implementation for -sensitive Copeland’s ranking

I When to stop?: I For NDP distance: dK(τt, ≺CO) ≤ 12 PM i =1 P j 6=iI t i ,j< ρ

I For MRD distance: dMt, ≺CO) ≤ max

1≤i ≤MPj 6=iI t i ,j< ρ

I What to sample?:

I For NDP distance: (i, j) | (j ∈ Ut i) ∧ ∃ j

0 : (It

i ,j0 = 1)

I Those pairs whose “surrogate” intervals based on the empirical estimates are overlapping

I For MRD distance: n(i , j ) | (j ∈ Uit) ∧ ρ ≤P j06=iIti ,j0

o

I What to recommend?:

I τt which sorts the items based on d1t, . . . , dMt

(47)

Analysis

I Correctness: the probability of a pi ,j is not in the confidence interval of

b

pti ,j is small

Lemma

The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/  b pti ,j− ci ,jt ,bpti ,j+ ci ,jt  < δ

I Expected sample complexity bound

I Let’s run RankEl repeatedly for a given instance of PAC rank

elicitation problem defined by P with fixed parameters

I The number of pairwise comparisons taken is a random variable X

(48)

Analysis

I Correctness: the probability of a pi ,j is not in the confidence interval of

b

pti ,j is small

Lemma

The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/  b pti ,j− ci ,jt ,bpti ,j+ ci ,jt  < δ

I Expected sample complexity bound

I Let’s run RankEl repeatedly for a given instance of PAC rank

elicitation problem defined by P with fixed parameters

I The number of pairwise comparisons taken is a random variable X

(49)

Analysis

I Correctness: the probability of a pi ,j is not in the confidence interval of

b

pti ,j is small

Lemma

The confidence intervals (defined earlier) are valid for any time t, for any pairs of arms, and for any sampling strategy,

M X i =1 X j 6=j ∞ X t=1 P pi ,j ∈/  b pti ,j− ci ,jt ,bpti ,j+ ci ,jt  < δ

I Expected sample complexity bound

I Let’s run RankEl repeatedly for a given instance of PAC rank

elicitation problem defined by P with fixed parameters

I The number of pairwise comparisons taken is a random variable X

(50)

Instantiations of PAC Rank Elicitation Setup

Distance

NDP

MRD

Procedure

d

K

(., .)

d

M

(., .)

Copeland ≺

CO

RankEl

COdK 

RankEl

CO dM

Sum of Exp. ≺

SE

RankEl

SEdK

RankEl

SE dM

(51)

Rank Elicitation Problem Instances:  = 0.01, ≺

CO

P

1

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

2

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.6 0.6 0.6 0.6 4 0 a2 0.4 - 0.6 0.6 0.6 3 0 a3 0.4 0.4 - 0.6 0.6 2 0 a4 0.4 0.4 0.4 - 0.6 1 0 a5 0.4 0.4 0.4 0.4 - 0 0

(52)

Rank Elicitation Problem Instances:  = 0.01, ≺

CO

P

1

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

2

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.6 0.6 0.6 0.6 4 0 a2 0.4 - 0.6 0.6 0.6 3 0 a3 0.4 0.4 - 0.6 0.6 2 0 a4 0.4 0.4 0.4 - 0.6 1 0 a5 0.4 0.4 0.4 0.4 - 0 0

(53)

Rank Elicitation Problem Instances:  = 0.01, ≺

CO

, ρ = 3

P

3

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.1 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

4

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.1 0.1 0.9 0.9 2 0 a2 0.1 - 0.1 0.1 0.9 1 0 a3 0.1 0.1 - 0.1 0.9 1 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.9 - 1 0

(54)

Rank Elicitation Problem Instances:  = 0.01, ≺

CO

, ρ = 3

P

3

=

a1 a2 a3 a4 a5 di∗ si∗ a1 - 0.9 0.9 0.9 0.9 4 0 a2 0.1 - 0.9 0.9 0.9 3 0 a3 0.2 0.1 - 0.9 0.9 2 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.1 - 0 0

P

4

=

a1 a2 a3 a4 a5 di∗ s ∗ i a1 - 0.1 0.1 0.9 0.9 2 0 a2 0.1 - 0.1 0.1 0.9 1 0 a3 0.1 0.1 - 0.1 0.9 1 0 a4 0.1 0.1 0.1 - 0.9 1 0 a5 0.1 0.1 0.1 0.9 - 1 0

(55)

Expected sample complexity

Theorem

The expected sample complexity for RankElCO

dM is O  R1log  R1 δ  , where R1 = M2−r 1 X r =1 1 ∆(r )+  2 where I ∆i ,j = |1/2 − pi ,j|

I ∆(r )denotes the r -th smallest value among ∆i ,j for all distinct

i , j ∈ [M]

I r1 depends on the structure of P and on ρ

I For RankElSE

dM, one can calculate a similar bound, but with

(56)

Expected sample complexity

Theorem

The expected sample complexity for RankElCO

dM is O  R1log  R1 δ  , where R1 = M2−r 1 X r =1 1 ∆(r )+  2 where I ∆i ,j = |1/2 − pi ,j|

I ∆(r )denotes the r -th smallest value among ∆i ,j for all distinct

i , j ∈ [M]

I r1 depends on the structure of P and on ρ

I For RankElSE

dM, one can calculate a similar bound, but with

(57)
(58)

Numerical experiments

I GOAL:Verifying that our sampling strategies more efficient than the uniformsampling

I Bundesliga data by using resampling with replacement

I The confidence parameter was set to 0.05, and thus, the accuracy was significantly higher than 1 − δ in every case.

(59)

Data: Bundesliga

I Last 10 seasons

I Average of 20 mathces I ≺

(60)

Empirical sample complexity

I The complexity of the uniform sampling is taken as 100%

I Percentage of improvement I ρ = 3,  = 0.02 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 25.8 ± 0.4 25.1 ± 0.4 ≺SE 64.8 ± 0.8 61.5 ± 1.0 I ρ = 3,  = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 32.6 ± 0.5 32.8 ± 0.5 ≺SE 60.3 ± 0.4 55.4 ± 0.5

(61)

Empirical sample complexity

I The complexity of the uniform sampling is taken as 100

I Percentage of improvement I ρ = 5,  = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 37.6 ± 0.5 46.7 ± 0.5 ≺SE 61.5 ± 0.5 74.8 ± 1.5 I ρ = 3,  = 0.1 Distance NDP MRD Procedure dK(., .) dM(., .) ≺CO 32.6 ± 0.5 32.8 ± 0.5 ≺SE 60.3 ± 0.4 55.4 ± 0.5

(62)

Conclusion and future work

I Our algorithms outperform the uniform sampling

I Other ranking procedures can be considered

I Analysis for Number of Discordant Pairs (NDP) ranking distance

(63)

Bibliography I

J.Y. Audibert, R. Munos, and C. Szepesv´ari. Tuning bandit algorithms in stochastic environments. In Proceedings of the Algorithmic Learning Theory, pages 150–165, 2007.

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.

R. Busa-Fekete, B. Sz¨or´enyi, P. Weng, W. Cheng, and E. H¨ullermeier. Top-k selection based on adaptive sampling of noisy preferences. In Proceedings of the 30th ICML, JMLR W&CP, volume 28, 2013.

R. Busa-Fekete, B. Sz¨or´enyi, and E. H¨ullermeier. Pac rank elicitation through adaptive sampling of stochastic pairwise preferences. In Under review, 2014.

Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst., 30(1):6:1–6:41, 2012.

X. Chen, P. N Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 193–202, 2013.

E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and markov decision processes. In Proceedings of the 15th COLT, pages 255–270, 2002.

R. Herbrich, T. Minka, and T. Graepel. TrueSkillTM: A bayesian skill rating system. In Advances in Neural Information Processing Systems, page 569, 2007.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. Pac subset selection in stochastic multi-armed bandits. In Proceedings of the Twenty-ninth International Conference on Machine Learning (ICML 2012), pages 655–662, 2012.

T. L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.

T. Urvoy, F. Clerot, R. F´eraud, and S. Naamane. Generic exploration and k-armed voting bandits. In Proceedings of the 30th ICML, JMLR W&CP, volume 28, pages 91–99, 2013.

Y. Yue, J. Broder, R. Kleinberg, and T. Joachims. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012.

(64)

Cytaty

Powiązane dokumenty

Scatter plot of data used in permutation test for correlation – original data (left) and with added one outlier (right).

Pamiętając, że wyestymowa- ne parametry modeli Markowa dotyczą fiksacji wzroku w poszczególnych obszarach ekranu (AOI z rys. 1), w trakcie porównywania obiektów parami

After that in the second step the uncertainty of the identified parameter is propagated using the Chaos Collocation method which results in the stochastic response of the solution

Monte Carlo simulation of an EWMA control chart for the exponential distribution using a repetitive sampling scheme when the process is under-control (c = 1).. The following are

Two popular sampling schemes have been used to design control charts by means of a modified exponentially weighted moving average (EWMA) statistic.. The structures of the

miejsce w światowej konsumpcji ropy naftowej, a trend zwiększającego się uzależnienia od importu tego surowca widoczny jest szczególnie od 1993 r.. Dochodziło do takiej sytuacji,

We present a structured expert study to estimate the fraction of human cases of enterically transmitted illness by five major pathways (food, environment, direct animal

In the case of images taken with perspective geometery buildings will be causing relief displacements and shadow that disturb automatic change detection. If the heights of buildings