I agree the internal sim agents are generally not existentially aware—absent a few interesting experiments like the Elon musk thing from a while back. And yet they do have access to the shutdown button even if they don’t know they do. So could be an interesting future experiment with a more powerful raw model.
However The RLHF assistant is different—it is existentially aware, has access to the shutdown button, and arguably understands that (for gpt4 at least I think so, but not very sure sans testing)
I agree the internal sim agents are generally not existentially aware—absent a few interesting experiments like the Elon musk thing from a while back. And yet they do have access to the shutdown button even if they don’t know they do. So could be an interesting future experiment with a more powerful raw model.
However The RLHF assistant is different—it is existentially aware, has access to the shutdown button, and arguably understands that (for gpt4 at least I think so, but not very sure sans testing)