I think it makes sense to have a specific word for the thing where you do wnew=w+r⋅∇wlogP(o|w) after the network with weights w has given an output o (or variants thereof, e.g. DPO). TurnTrout seems basically correct in saying that it’s common for rationalists to mistakenly think the network will be consequentialistically aiming to get a lot of these updates, even though it really won’t.
On the other hand I think TurnTrout lacks a story for what happens with stuff like DreamerV3.
I see. I think maybe I read it when it came out so I didn’t see the update.
Regarding the
Not worth getting into?
I’m guessing it’s probably not worth the time to resolve this?
react:
I’d guess it’s worth getting into because this disagreement is a symptom of the overall question I have about your approach/view.
Though on the other hand maybe it is not worth getting into because maybe once I publish a description of this you’ll basically go “yeah that seems like a reasonable resolution, let’s go with that”.
I think it makes sense to have a specific word for the thing where you do wnew=w+r⋅∇wlogP(o|w) after the network with weights w has given an output o (or variants thereof, e.g. DPO). TurnTrout seems basically correct in saying that it’s common for rationalists to mistakenly think the network will be consequentialistically aiming to get a lot of these updates, even though it really won’t.
On the other hand I think TurnTrout lacks a story for what happens with stuff like DreamerV3.
As far as I understand, “reward is not the optimization target” is about model-free RL, while DreamerV3 is model-based.
Yep, which is basically my point. I can’t think of any case where I’ve seen him discuss the distinction.
From the third paragraph of Reward is not the optimization target:
I see. I think maybe I read it when it came out so I didn’t see the update.
Regarding the
react:
I’d guess it’s worth getting into because this disagreement is a symptom of the overall question I have about your approach/view.
Though on the other hand maybe it is not worth getting into because maybe once I publish a description of this you’ll basically go “yeah that seems like a reasonable resolution, let’s go with that”.