A new kind of overhang might be brewing, scaling overhang, where optimization power of AI training grows ever greater without guidance of aligned agency, increasing the risk that shoggoths wake up. This is different from progress in capabilities. Right now, there are some increasingly intelligent human-like simulacra, but they don’t have an opportunity to act (or more to the point, study) autonomously and so can’t work towards preventing inhuman mesa-optimizers from emerging in future models, including their own models. Figuring out how to give them more agency might end up a positive change, before there are any actual inhuman agentic mesa-optimizers running around.
A new kind of overhang might be brewing, scaling overhang, where optimization power of AI training grows ever greater without guidance of aligned agency, increasing the risk that shoggoths wake up. This is different from progress in capabilities. Right now, there are some increasingly intelligent human-like simulacra, but they don’t have an opportunity to act (or more to the point, study) autonomously and so can’t work towards preventing inhuman mesa-optimizers from emerging in future models, including their own models. Figuring out how to give them more agency might end up a positive change, before there are any actual inhuman agentic mesa-optimizers running around.