1 Introduction
2 Related work
3 Approach
3.1 Problem formulation
3.2 Balancing exploration and exploitation in pairwise learning to rank
3.2.1 Baseline learning approach
1: Input: D, η, λ, w
0
|
2: for query q
t
(t = 1..T) do
|
3: X = ϕ(D|q
t
) // extract features
|
// construct exploitative result list
|
4: S = w
t-1
T
X
|
5: L = sortDescendingByScore(D, S) |
6: I = L [1:10] |
7: Display I and observe clicked elements C. |
8: Construct all labeled pairs P = (x
a
, x
b
, y) from I and C. |
9: for
i in (1 .. P)do
|
10: if
y
i
w
t-1
T
(x
a_i
− x
b_i
) < 1.0 and y
i
≠ 0.0 then
|
11: Update w
t
as: w
t
= w
t-1 + ηy
i
(x
a_i
− x
b_i
) − ηλ w
t-1
|
12: return
w
t
|
3.2.2 Balancing exploration and exploitation
1: Input: \(D, \eta, \lambda, {\bf w}_0, \epsilon\)
|
2: for query q
t
(t = 1..T) do
|
3: X = ϕ(D|q
t
) // extract features
|
// construct exploitative result list
|
4: S = w
t-1
T
X
|
5: L = sortDescendingByScore(D, S) |
6: \(I[r] \leftarrow \) first element of \(L \notin I\) with probability ε; element randomly sampled without replacement from \(L \setminus I\) with probability 1 − ε |
7: Display I and observe clicked elements C. |
8: Construct all labeled pairs P = (x
a
, x
b
, y) from I and C. |
9: for
i in (1 .. P) do
|
10: if
y
i
(x
a_i
− x
b_i
) w
t-1
T
< 1.0 and y
i
≠ 0.0 then
|
11: Update w
t
as: w
t
= w
t-1 + ηy
i
(x
a_i
− x
b_i
) − ηλ w
t-1
|
12: return
w
t
|
3.3 Balancing exploration and exploitation in listwise learning to rank
3.3.1 Baseline learning approach
1: Input: f(l
1, l
2), α, δ, w
0
|
2: for query q
t
(t = 1..T) do
|
3: Sample unit vector u
t
uniformly. |
4: \({\bf w}^{\prime}_t \leftarrow {\bf w}_t+\delta u_t\)
// generate exploratory
w
|
5: if
\(f(l({\bf w}_t), l({\bf w}^\prime_t))\)
then
|
6: \({\bf w}_{t+1} \leftarrow {\bf w}_t+\alpha u_t\)
// update exploitative
w
|
7: else
|
8: \({\bf w}_{t+1} \leftarrow {\bf w}_t\)
|
9: return
w
t+1
|
3.3.2 Balancing exploration and exploitation
1: Input: l
1, l
2, k
|
2: initialize empty result list I
|
// construct result list
|
3: for rank r in (1 .. 10) do
|
4: \(L \leftarrow l_1\) with probability k, l
2 with probability 1 − k
|
5: \(I[r] \leftarrow \) first element of \(L\notin I\)
|
6: display I and observe clicked elements C
|
7: N = length(C); c
1 = c
2 = 0 |
8: for
i in (1 .. N) do
|
9: if
C[i] ∈ l
1[1:N] then
|
10: c
1 = c
1 + 1 // count clicks on
l
1
|
11: if
C[i] ∈ l
2[1:N] then
|
12: c
2 = c
2 + 1 // count clicks on
l
2
|
// compensate for bias (Eq. 2) |
13: n
1 = | l
1[1:N] ∩ I[1:N] | |
14: n
2 = | l
2[1:N] ∩ I[1:N] | |
15: \(c_2=\frac{n_1}{n_2} * c_2\)
|
16: return
c
1 < c
2
|
4 Experiments
4.1 Evaluation setup
4.2 Click model
Model |
p(c|R) |
p(c|N
R) |
p(s|R) |
p(s|N
R) |
---|---|---|---|---|
Perfect | 1.0 | 0.0 | 0.0 | 0.0 |
Navigational | 0.95 | 0.05 | 0.9 | 0.2 |
Informational | 0.9 | 0.4 | 0.5 | 0.1 |
4.3 Data
4.4 Runs
4.4.1 Pairwise approach
4.4.2 Listwise approach
4.5 Discounting
4.6 Evaluation measures
5 Results and discussion
5.1 Pairwise learning
NDCG@10 | P@10 | MAP | |
---|---|---|---|
HP2003 | 0.820 | 0.106 | 0.753 |
HP2004 | 0.803 | 0.099 | 0.705 |
NP2003 | 0.794 | 0.093 | 0.676 |
NP2004 | 0.796 | 0.095 | 0.672 |
TD2003 | 0.271 | 0.153 | 0.208 |
TD2004 | 0.279 | 0.217 | 0.183 |
OHSUMED | 0.321 | 0.392 | 0.371 |
MQ2007 | 0.377 | 0.343 | 0.412 |
MQ2008 | 0.490 | 0.239 | 0.452 |
5.1.1 Balancing exploration and exploitation
r
| 0.0 | 0.2 | 0.4 | 0.6 | 0.8 | 1.0 |
---|---|---|---|---|---|---|
click model: perfect
| ||||||
HP2003 |
124.75
| 122.32 | 109.82▾
| 89.50▾
| 61.12 ▾
| 1.35 ▾
|
HP2004 |
115.08
| 102.15 ▾
| 87.14 ▾
| 72.64 ▾
| 48.79 ▾
| 1.14 ▾
|
NP2003 |
115.75
| 110.72 | 96.21 ▾
| 87.43 ▾
| 54.94 ▾
| 1.84 ▾
|
NP2004 |
111.26
| 104.43 | 94.37 ▾
| 76.60 ▾
| 52.44 ▾
| 1.17 ▾
|
TD2003 |
103.05
| 98.88 | 85.37 ▾
| 69.65 ▾
| 43.63 ▾
| 8.23 ▾
|
TD2004 |
106.77
| 103.93 | 83.06 ▾
| 65.56 ▾
| 44.50 ▾
| 14.62 ▾
|
OHSUMED | 119.13 |
134.53
▴
| 127.31 ▴
| 119.68 | 111.42 ▾
| 100.44 ▾
|
MQ2007 | 104.12 |
107.06
▴
| 105.50 ▴
| 101.53 ▾
| 95.96 ▾
| 90.13 ▾
|
MQ2008 |
104.01
| 103.38 | 98.96 ▾
| 93.70 ▾
| 87.58 ▾
| 80.60 ▾
|
click model: navigational
| ||||||
HP2003 | 104.99 |
111.01
| 108.15 | 102.48 | 69.91 ▾
| 1.39 ▾
|
HP2004 | 91.69 |
110.26
\(^{\vartriangle}\)
| 107.83 \(^{\vartriangle}\)
| 82.70 | 57.20 ▾
| 1.27 ▾
|
NP2003 |
115.85
| 110.40 | 106.50 | 91.94 ▾
| 61.19 ▾
| 1.88 ▾
|
NP2004 | 93.79 |
116.11
▴
| 106.39 | 89.67 | 64.20 ▾
| 1.07 ▾
|
TD2003 | 64.43 |
85.20
▴
| 77.85 ▴
| 66.59 | 42.80 ▾
| 8.08 ▾
|
TD2004 | 87.59 |
97.90
▴
| 82.91 | 66.31 ▾
| 45.46 ▾
| 14.30 ▾
|
OHSUMED | 127.05 |
129.73
| 123.39 ▾
| 117.64 ▾
| 110.04 ▾
| 100.81 ▾
|
MQ2007 | 101.68 |
102.88
\(^{\vartriangle}\)
| 101.99 | 99.13 ▾
| 95.08 ▾
| 90.05 ▾
|
MQ2008 | 100.16 |
100.41
| 97.18 ▾
| 93.06 ▾
| 87.31 ▾
| 81.23 ▾
|
click model: informational
| ||||||
HP2003 | 7.21 | 40.39 ▴
|
72.60
▴
| 66.52 ▴
| 55.31 ▴
| 1.37 ▾
|
HP2004 | 6.39 | 29.81 ▴
| 51.81 ▴
|
67.20
▴
| 46.30 ▴
| 1.11 ▾
|
NP2003 | 5.75 | 24.39 ▴
| 55.16 ▴
|
60.57
▴
| 45.04 ▴
| 1.90 ▾
|
NP2004 | 5.64 | 23.95 ▴
|
69.22
▴
| 65.58 ▴
| 52.01 ▴
| 1.21 ▾
|
TD2003 | 7.47 | 23.64 ▴
| 43.92 ▴
|
43.96
▴
| 36.25 ▴
| 7.85 |
TD2004 | 17.31 | 50.10 ▴
|
60.12
▴
| 54.58 ▴
| 38.44 ▴
| 14.48 ▿
|
OHSUMED | 102.60 | 121.48 ▴
|
122.06
▴
| 116.55 ▴
| 109.53 ▴
| 101.12 |
MQ2007 | 92.76 | 96.58 ▴
|
98.19
▴
| 96.66 ▴
| 95.43 ▴
| 90.00 ▾
|
MQ2008 | 90.00 | 91.14 |
92.45
▴
| 91.12 | 86.88 ▾
| 81.69 ▾
|
5.2 Listwise learning
NDCG@10 | P@10 | MAP | |
---|---|---|---|
HP2003 | 0.792 | 0.102 | 0.721 |
HP2004 | 0.770 | 0.096 | 0.676 |
NP2003 | 0.761 | 0.090 | 0.649 |
NP2004 | 0.787 | 0.093 | 0.659 |
TD2003 | 0.296 | 0.152 | 0.231 |
TD2004 | 0.298 | 0.236 | 0.206 |
OHSUMED | 0.422 | 0.488 | 0.437 |
MQ2007 | 0.375 | 0.335 | 0.410 |
MQ2008 | 0.488 | 0.238 | 0.447 |
5.2.1 Balancing exploration and exploitation for the listwise approach
k
| 0.5 | 0.4 | 0.3 | 0.2. | 0.1 |
---|---|---|---|---|---|
Click model: perfect
| |||||
HP2003 | 119.91 | 125.71 ▴
| 129.99 ▴
|
130.55
▴
| 128.50 ▴
|
HP2004 | 109.21 | 111.57 | 118.54 ▴
|
119.86
▴
| 116.46 ▴
|
NP2003 | 108.74 | 113.61 ▴
| 117.44 ▴
|
120.46
▴
| 119.06 ▴
|
NP2004 | 112.33 | 119.34 ▴
| 124.47 ▴
|
126.20
▴
| 123.70 ▴
|
TD2003 | 82.00 | 84.24 | 88.20 ▴
|
89.36
▴
| 86.20 ▴
|
TD2004 | 85.67 | 90.23 ▴
| 91.00 ▴
|
91.71
▴
| 88.98 \(^{\vartriangle}\)
|
OHSUMED | 128.12 | 130.40 ▴
| 131.16 ▴
|
133.37
▴
| 131.93 ▴
|
MQ2007 | 96.02 | 97.48 | 98.54 ▴
|
100.28
▴
| 98.32 ▴
|
MQ2008 | 90.97 | 92.99 ▴
| 94.03 ▴
|
95.59
▴
| 95.14 ▴
|
Click model: navigational
| |||||
HP2003 | 102.58 | 109.78 ▴
|
118.84
▴
| 116.38 ▴
| 117.52 ▴
|
HP2004 | 89.61 | 97.08 ▴
| 99.03 ▴
| 103.36 ▴
|
105.69
▴
|
NP2003 | 90.32 | 100.94 ▴
| 105.03 ▴
| 108.15 ▴
|
110.12
▴
|
NP2004 | 99.14 | 104.34 \(^{\vartriangle}\)
| 110.16 ▴
| 112.05 ▴
|
116.00
▴
|
TD2003 | 70.93 | 75.20 ▴
|
77.64
▴
| 77.54 ▴
| 75.70 \(^{\vartriangle}\)
|
TD2004 | 78.83 | 80.17 | 82.40 \(^{\vartriangle}\)
|
83.54
▴
| 80.98 |
OHSUMED | 125.35 | 126.92 \(^{\vartriangle}\)
| 127.37 ▴
|
127.94
▴
| 127.21 |
MQ2007 | 95.50 | 94.99 | 95.70 |
96.02
| 94.94 |
MQ2008 | 89.39 | 90.55 | 91.24 \(^{\vartriangle}\)
|
92.36
▴
| 92.25 ▴
|
Click model: informational
| |||||
HP2003 | 59.53 | 63.91 | 61.43 | 70.11 \(^{\vartriangle}\)
|
71.19
▴
|
HP2004 | 41.12 | 52.88 ▴
| 48.54 \(^{\vartriangle}\)
|
55.88
▴
| 55.16 ▴
|
NP2003 | 53.63 | 53.64 | 57.60 | 58.40 |
69.90
▴
|
NP2004 | 60.59 | 63.38 | 64.17 | 63.23 |
69.96
\(^{\vartriangle}\)
|
TD2003 | 52.78 | 52.95 | 51.58 | 55.76 |
57.30
|
TD2004 | 58.49 | 61.43 | 59.75 | 62.88 \(^{\vartriangle}\)
|
63.37
|
OHSUMED | 121.39 | 123.26 | 124.01 \(^{\vartriangle}\)
|
126.76
▴
| 125.40 ▴
|
MQ2007 | 91.57 |
92.00
| 91.66 | 90.79 | 90.19 |
MQ2008 | 86.06 | 87.26 | 85.83 |
87.62
| 86.29 |