Edit: I was wrong. You can ignore this comment thread.
I think that this:
def reporter_loss(human, θ, θ_reporter): before, action, after = dataset.sample() question = human.pose_question(before, action, after) z, ε = posterior(before, action, after, θ).sample_with_noise() answer = reporter(question, ε, θ_reporter) loss = human.loss_for_answer(before, action, after, question, answer) return loss + lambda * regularizer(question, ε, θ_reporter)
Should actually be this (look at ε and z):
def reporter_loss(human, θ, θ_reporter): before, action, after = dataset.sample() question = human.pose_question(before, action, after) ε, z = posterior(before, action, after, θ).sample_with_noise() answer = reporter(question, z, θ_reporter) loss = human.loss_for_answer(before, action, after, question, answer) return loss + lambda * regularizer(question, z, θ_reporter)
Does this resolve your confusion?
We want to give the reporter ε so that they need to reconstruct z themselves (and therefore have to pay for rerunning the parts of the predictor’s model that are necessary).
Ooops, silly me. Jumping to conclusions when I haven’t even read the text between the code.
Feel free to delete my comment as it may only cause confusion.
Also, I think you probably still want to flip z and ε in the following line:
z, ε = posterior(before, action, after, θ).sample_with_noise()
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
Edit: I was wrong. You can ignore this comment thread.
I think that this:
Should actually be this (look at ε and z):
def reporter_loss(human, θ, θ_reporter):
before, action, after = dataset.sample()
question = human.pose_question(before, action, after)
ε, z = posterior(before, action, after, θ).sample_with_noise()
answer = reporter(question, z, θ_reporter)
loss = human.loss_for_answer(before, action, after, question, answer)
return loss + lambda * regularizer(question, z, θ_reporter)
Does this resolve your confusion?
We want to give the reporter ε so that they need to reconstruct z themselves (and therefore have to pay for rerunning the parts of the predictor’s model that are necessary).
Ooops, silly me. Jumping to conclusions when I haven’t even read the text between the code.
Feel free to delete my comment as it may only cause confusion.
Also, I think you probably still want to flip z and ε in the following line: