Christopher King comments on ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King 15 Mar 2023 14:15 UTC
3 points
1
I guess my question is: what other outcome did you expect? I assumed the detecting deceptive alignment thing was supposed to be in a sandbox. What’s the use of finding out it can avoid shutdown after you already deployed it to the real world? To retroactively recommend not to deploying it to the real world?