paulfchristiano comments on Richard Ngo’s Shortform

paulfchristiano 26 Dec 2022 20:27 UTC
LW: 4 AF: 4
AF
I agree there are all kinds of situations where the generalization of “reward” is ambiguous and lots of different things could happen . But it has a clear interpretation for the typical deployment episode since we can take counterfactuals over the randomization used to select training data.
It’s possible that agents may specifically want to navigate towards situations where RL training is not happening and the notion of reward becomes ambiguous, and indeed this is quite explicitly discussed in the document Richard is replying to.
As far as I can tell the fact that there exist cases where different generalizations of reward behave differently does not undermine the point at all.
- TurnTrout 3 Jan 2023 5:05 UTC
  LW: 2 AF: 2
  AF Parent
  Yeah, I think I was wondering about the intended scoping of your statement. I perceive myself to agree with you that there are situations (like LLM training to get an alignment research assistant) where “what if we had sampled during training?” is well-defined and fine. I was wondering if you viewed this as a general question we could ask.
  I also agree that Ajeya’s post addresses this “ambiguity” question, which is nice!