I guess my question is: what other outcome did you expect? I assumed the detecting deceptive alignment thing was supposed to be in a sandbox. What’s the use of finding out it can avoid shutdown after you already deployed it to the real world? To retroactively recommend not to deploying it to the real world?
I guess my question is: what other outcome did you expect? I assumed the detecting deceptive alignment thing was supposed to be in a sandbox. What’s the use of finding out it can avoid shutdown after you already deployed it to the real world? To retroactively recommend not to deploying it to the real world?