I agree there are all kinds of situations where the generalization of “reward” is ambiguous and lots of different things could happen . But it has a clear interpretation for the typical deployment episode since we can take counterfactuals over the randomization used to select training data.
It’s possible that agents may specifically want to navigate towards situations where RL training is not happening and the notion of reward becomes ambiguous, and indeed this is quite explicitly discussed in the document Richard is replying to.
As far as I can tell the fact that there exist cases where different generalizations of reward behave differently does not undermine the point at all.
Yeah, I think I was wondering about the intended scoping of your statement. I perceive myself to agree with you that there are situations (like LLM training to get an alignment research assistant) where “what if we had sampled during training?” is well-defined and fine. I was wondering if you viewed this as a general question we could ask.
I also agree that Ajeya’s post addresses this “ambiguity” question, which is nice!
I agree there are all kinds of situations where the generalization of “reward” is ambiguous and lots of different things could happen . But it has a clear interpretation for the typical deployment episode since we can take counterfactuals over the randomization used to select training data.
It’s possible that agents may specifically want to navigate towards situations where RL training is not happening and the notion of reward becomes ambiguous, and indeed this is quite explicitly discussed in the document Richard is replying to.
As far as I can tell the fact that there exist cases where different generalizations of reward behave differently does not undermine the point at all.
Yeah, I think I was wondering about the intended scoping of your statement. I perceive myself to agree with you that there are situations (like LLM training to get an alignment research assistant) where “what if we had sampled during training?” is well-defined and fine. I was wondering if you viewed this as a general question we could ask.
I also agree that Ajeya’s post addresses this “ambiguity” question, which is nice!