In the appendix “Details for penalizing depending on “downstream” variables”, I’m not able to wrap my head around what we can expect the reporter to learn—if anything at all—seeing that it has no dependency on the inputs (elsewhere it is dependent on z sampled from the posterior).
Specifically, the only call to the reporter (in the function reporter_loss in this section) contains no information (about before, action, after) from the predictor at all:
answer = reporter(question, ε, θ_reporter)
(unless “question” includes some context from the current (before, action, after) being considered, which I’m assuming is not the case)
My dumb question then is:
-- Why would this reporter be performant in any way?
My reasoning: For a given question Q (say, “Is the diamond in the room?”) we might have some answers of “Yes” and some of “No” in the dataset, but without the context, we’re essentially training the reporter to map noise that is uncorrelated with/independent of the context to the answer; essentially, for a fixed question Q and fixed realization of the noise RV, the reporter will be uniformly uncertain (or well, it will mirror the statistics in the data) about the value of the answer. Since the noise is independent/uncorrelated, this would be true for every noise value.
The noise ε isn’t independent of the context—it is chosen to be noise that produces the context when fed into the predictor’s model. So you can reconstruct the situation by rerunning select parts of the predictor. But doing so is then expensive, which encourages you to rerun as little of the predictor as possible.
Probably I’m wrong because it does not seem like the reporter function should have access to before and action. But I don’t see how you can reconstruct the context when you cannot calculate the probability distribution from which to sample using the noise.
I would suspect that the code in some way doesn’t reflect your thinking. Currently you could replace the line
without changing anything. My guess is that either this should not be that way or that the reporter should have access to before and action.
(But anyway, I think the overall approach is quite clear. I just don’t get your implementation and would implement it differently, so feel free not to bother much about it.)
You’re right that I’ve done something wrong. I was thinking about predictor being a generative model for everything starting from scratch (this is a setting I often consider), so that you can use ε to reconstruct everything. But the way it’s currently set up in the doc, the predictor is a generative model only for after. So we need to give before and action.
We want to give the reporter ε so that they need to reconstruct z themselves (and therefore have to pay for rerunning the parts of the predictor’s model that are necessary).
Dumb question alert:
In the appendix “Details for penalizing depending on “downstream” variables”, I’m not able to wrap my head around what we can expect the reporter to learn—if anything at all—seeing that it has no dependency on the inputs (elsewhere it is dependent on z sampled from the posterior).
Specifically, the only call to the reporter (in the function reporter_loss in this section) contains no information (about before, action, after) from the predictor at all:
(unless “question” includes some context from the current (before, action, after) being considered, which I’m assuming is not the case)
My dumb question then is:
-- Why would this reporter be performant in any way?
My reasoning: For a given question Q (say, “Is the diamond in the room?”) we might have some answers of “Yes” and some of “No” in the dataset, but without the context, we’re essentially training the reporter to map noise that is uncorrelated with/independent of the context to the answer; essentially, for a fixed question Q and fixed realization of the noise RV, the reporter will be uniformly uncertain (or well, it will mirror the statistics in the data) about the value of the answer. Since the noise is independent/uncorrelated, this would be true for every noise value.
The noise ε isn’t independent of the context—it is chosen to be noise that produces the context when fed into the predictor’s model. So you can reconstruct the situation by rerunning select parts of the predictor. But doing so is then expensive, which encourages you to rerun as little of the predictor as possible.
Sorry, I’m still not sure I got it.
Is it that in the reporter(question, eps, theta_reporter) function you can call the predictor like something like that:
z_part = predict_part(part, before, action, theta).sample_using_noise(eps)
Probably I’m wrong because it does not seem like the reporter function should have access to before and action. But I don’t see how you can reconstruct the context when you cannot calculate the probability distribution from which to sample using the noise.
I would suspect that the code in some way doesn’t reflect your thinking. Currently you could replace the line
z, ε = posterior(before, action, after, θ).sample_with_noise()
from the reporter_loss function with the line
ε = gaussian.sample()
without changing anything. My guess is that either this should not be that way or that the reporter should have access to before and action.
(But anyway, I think the overall approach is quite clear. I just don’t get your implementation and would implement it differently, so feel free not to bother much about it.)
You’re right that I’ve done something wrong. I was thinking about predictor being a generative model for everything starting from scratch (this is a setting I often consider), so that you can use ε to reconstruct everything. But the way it’s currently set up in the doc, the predictor is a generative model only for after. So we need to give before and action.
Edit: I was wrong. You can ignore this comment thread.
I think that this:
Should actually be this (look at ε and z):
def reporter_loss(human, θ, θ_reporter):
before, action, after = dataset.sample()
question = human.pose_question(before, action, after)
ε, z = posterior(before, action, after, θ).sample_with_noise()
answer = reporter(question, z, θ_reporter)
loss = human.loss_for_answer(before, action, after, question, answer)
return loss + lambda * regularizer(question, z, θ_reporter)
Does this resolve your confusion?
We want to give the reporter ε so that they need to reconstruct z themselves (and therefore have to pay for rerunning the parts of the predictor’s model that are necessary).
Ooops, silly me. Jumping to conclusions when I haven’t even read the text between the code.
Feel free to delete my comment as it may only cause confusion.
Also, I think you probably still want to flip z and ε in the following line: