I’d say it adds an extra step of indirection where the causal structure of reality gets “blurred out” by an agent’s judgement, and so a reward model strengthens rather than weakens this dynamic?
I’d say it adds an extra step of indirection where the causal structure of reality gets “blurred out” by an agent’s judgement, and so a reward model strengthens rather than weakens this dynamic?