Given my above reply to james.lucassen about explicitly using a regressor LLM as a reward model, does that give better insight?
Or are you skeptical of the AI’s mapping from “world state” into language? I’d argue that we might get away with having the AI natively define its world state as language, a la SayCan.
I have no idea what I mean, on further reflection. I’m as confused as you are on why this is hard if we have an accurate utility function sitting right there. Maybe the idea is that subject to optimization pressure it would fail?
Yeah so I think that’s what the adversarial example/OOD people worry about.
That just seems… like it buys you a lot? And like we should focus more on those problems specifically.
Given my above reply to james.lucassen about explicitly using a regressor LLM as a reward model, does that give better insight?
Or are you skeptical of the AI’s mapping from “world state” into language? I’d argue that we might get away with having the AI natively define its world state as language, a la SayCan.
I have no idea what I mean, on further reflection. I’m as confused as you are on why this is hard if we have an accurate utility function sitting right there. Maybe the idea is that subject to optimization pressure it would fail?
Yeah so I think that’s what the adversarial example/OOD people worry about. That just seems… like it buys you a lot? And like we should focus more on those problems specifically.