jacob_cannell comments on Alignment Implications of LLM Successes: a Debate in One Act

jacob_cannell 22 Oct 2023 18:07 UTC
2 points
0
I agree the internal sim agents are generally not existentially aware—absent a few interesting experiments like the Elon musk thing from a while back. And yet they do have access to the shutdown button even if they don’t know they do. So could be an interesting future experiment with a more powerful raw model.

However The RLHF assistant is different—it is existentially aware, has access to the shutdown button, and arguably understands that (for gpt4 at least I think so, but not very sure sans testing)