bideup comments on “AI Alignment” is a Dangerously Overloaded Term

bideup 15 Dec 2023 16:12 UTC
1 point
0
Perhaps, or perhaps not? I might be able to design a gun which shoots bullets in random directions (not on random walks), without being able to choose the direction.

Maybe we can back up a bit, and you could give some intuition for why you expect goals to go on random walks at all?

My default picture is that goals walk around during training and perhaps during a reflective process, and then stabilise somewhere.
- avturchin 15 Dec 2023 16:28 UTC
  2 points
  0
  Parent
  My intuition: imagent LLM-based agent. It has fixed prompt and some context text and use this iteratively. Context part can change and as it changes, it affects interpretation of fixed part of the prompt. Examples are Waluigi and other attacks. This causes goal drift.
  This may have bad consequences as a robot suddenly turns in Waluigi and start kill randomly everyone around. But long-term planning and deceptive alignment requires very fixed goal system.
  - bideup 15 Dec 2023 16:32 UTC
    3 points
    0
    Parent
    Right, makes complete sense in the case of LLM-based agents, I guess I was just thinking about much more directly goal-trained agents.