If people train AI systems on random samples of deployment, then “reward” does make sense—it’s just what would happen if you sampled this episode to train on.
I don’t know what this means. Suppose we have an AI which “cares about reward” (as you think of it in this situation). The “episode” consists of the AI copying its network & activations to another off-site server, and then the original lab blows up. The original reward register no longer exists (it got blown up), and the agent is not presently being trained by an RL alg.
What is the “reward” for this situation? What would have happened if we “sampled” this episode during training?
I agree there are all kinds of situations where the generalization of “reward” is ambiguous and lots of different things could happen . But it has a clear interpretation for the typical deployment episode since we can take counterfactuals over the randomization used to select training data.
It’s possible that agents may specifically want to navigate towards situations where RL training is not happening and the notion of reward becomes ambiguous, and indeed this is quite explicitly discussed in the document Richard is replying to.
As far as I can tell the fact that there exist cases where different generalizations of reward behave differently does not undermine the point at all.
Yeah, I think I was wondering about the intended scoping of your statement. I perceive myself to agree with you that there are situations (like LLM training to get an alignment research assistant) where “what if we had sampled during training?” is well-defined and fine. I was wondering if you viewed this as a general question we could ask.
I also agree that Ajeya’s post addresses this “ambiguity” question, which is nice!
I don’t know what this means. Suppose we have an AI which “cares about reward” (as you think of it in this situation). The “episode” consists of the AI copying its network & activations to another off-site server, and then the original lab blows up. The original reward register no longer exists (it got blown up), and the agent is not presently being trained by an RL alg.
What is the “reward” for this situation? What would have happened if we “sampled” this episode during training?
I agree there are all kinds of situations where the generalization of “reward” is ambiguous and lots of different things could happen . But it has a clear interpretation for the typical deployment episode since we can take counterfactuals over the randomization used to select training data.
It’s possible that agents may specifically want to navigate towards situations where RL training is not happening and the notion of reward becomes ambiguous, and indeed this is quite explicitly discussed in the document Richard is replying to.
As far as I can tell the fact that there exist cases where different generalizations of reward behave differently does not undermine the point at all.
Yeah, I think I was wondering about the intended scoping of your statement. I perceive myself to agree with you that there are situations (like LLM training to get an alignment research assistant) where “what if we had sampled during training?” is well-defined and fine. I was wondering if you viewed this as a general question we could ask.
I also agree that Ajeya’s post addresses this “ambiguity” question, which is nice!