AI is percolating into every niche it can find. Next are LLM-based agents, which have the potential to replace humanity entirely. But before that happens, there will be superintelligent agent(s), and at that point the future is out of humanity’s hands anyway. So to make it through, “superalignment” has to be solved, either by an incomplete effort that serendipitously proves to be enough, or because the problem was correctly grasped and correctly solved in its totality.
Two levels of superalignment have been discussed, what we might call mundane and civilizational. Mundane superalignment is the task of getting a superintelligence to do anything at all, without having it overthink and end up doing something unexpected and very unwanted. Civilizational superalignment is the task of imparting to an autonomous superintelligence, a value system (or disposition or long-term goal, etc) which would be satisfactory as the governing principle of an entire transhuman civilization.
Eliezer thinks we have little chance of solving even mundane superalignment in time—that we’re on track to create superintelligence without really knowing what we’re doing at all. He thinks that will inevitably kill us all. I think there’s a genuine possibility of superalignment emerging serendipitously, but I don’t know the odds—they could be decent odds, or they could be microscopic.
I also think we have a chance of fully and consciously solving civilizational superalignment in time, if the resources of the era of LLM-based agents are used in the right way. I assume OpenAI plans to do this, possibly Conjecture’s plan falls under this description, and maybe Anthropic could do it too. And then there’s Orthogonal, who are just trying to figure out the theory, with or without AI assistance.
Unknown unknowns may invalidate some or all of this scenario. :-)
Current sense of where we’re going:
AI is percolating into every niche it can find. Next are LLM-based agents, which have the potential to replace humanity entirely. But before that happens, there will be superintelligent agent(s), and at that point the future is out of humanity’s hands anyway. So to make it through, “superalignment” has to be solved, either by an incomplete effort that serendipitously proves to be enough, or because the problem was correctly grasped and correctly solved in its totality.
Two levels of superalignment have been discussed, what we might call mundane and civilizational. Mundane superalignment is the task of getting a superintelligence to do anything at all, without having it overthink and end up doing something unexpected and very unwanted. Civilizational superalignment is the task of imparting to an autonomous superintelligence, a value system (or disposition or long-term goal, etc) which would be satisfactory as the governing principle of an entire transhuman civilization.
Eliezer thinks we have little chance of solving even mundane superalignment in time—that we’re on track to create superintelligence without really knowing what we’re doing at all. He thinks that will inevitably kill us all. I think there’s a genuine possibility of superalignment emerging serendipitously, but I don’t know the odds—they could be decent odds, or they could be microscopic.
I also think we have a chance of fully and consciously solving civilizational superalignment in time, if the resources of the era of LLM-based agents are used in the right way. I assume OpenAI plans to do this, possibly Conjecture’s plan falls under this description, and maybe Anthropic could do it too. And then there’s Orthogonal, who are just trying to figure out the theory, with or without AI assistance.
Unknown unknowns may invalidate some or all of this scenario. :-)