Not Relevant comments on “Fragility of Value” vs. LLMs

Not Relevant 13 Apr 2022 19:23 UTC
2 points
Given my above reply to james.lucassen about explicitly using a regressor LLM as a reward model, does that give better insight?
Or are you skeptical of the AI’s mapping from “world state” into language? I’d argue that we might get away with having the AI natively define its world state as language, a la SayCan.
- lc 13 Apr 2022 19:41 UTC
  2 points
  Parent
  I have no idea what I mean, on further reflection. I’m as confused as you are on why this is hard if we have an accurate utility function sitting right there. Maybe the idea is that subject to optimization pressure it would fail?
  - Not Relevant 14 Apr 2022 0:23 UTC
    1 point
    Parent
    Yeah so I think that’s what the adversarial example/OOD people worry about. That just seems… like it buys you a lot? And like we should focus more on those problems specifically.