Seth Herd comments on Aligned AI as a wrapper around an LLM

Seth Herd 26 Mar 2023 7:34 UTC
3 points
0
This sounds way too capable to be safe. Although someone is probably working on this right now, this line of thought getting traction might increase the number of people doing it 10x. Maybe that’s good since GPT4 probably isn’t smart enough to kill us, even with an agent wrapper. It will just scare the pants off of us.

Aligning the wrapper is somewhat similar to my suggestion of aligning an RL critic network head, such as humans seem to use. Align the captain, not the crew. And let the captain use the crew’s smarts without giving them much say in what to do or how to update them.
- cousin_it 26 Mar 2023 13:33 UTC
  3 points
  0
  Parent
  It’d be interesting to figure out where the biggest danger in this setup is coming from. 1) Difficulty of aligning the wrapper 2) Wild behavior from the LLM 3) Something else. And whether there can be spot fixes for some of it.