I know that this is a common argument against amplification, but I’ve never found it super compelling.
People often point to evil corporations to show that unaligned behavior can emerge from aligned humans, but I don’t think this analogy is very strong. Humans in fact do not share the same goals and are generally competing with each other over resources and power, which seems like the main source of inadequate equilibria to me.
If everyone in the world was a copy of Eliezer, I don’t think we would have a coordination problem around building AGI. They would probably have an Eliezer government that is constantly looking out for emergent misalignment and suggesting organizational changes to squash it. Since everyone in this world is optimizing for making AGI go well and not for profit or status among their Eliezer peers, all you have to do is tell them what the problem is and what they need to do to fix it. You don’t have to threaten them with jail time or worry that they will exploit loopholes in Eliezer law.
I think it is quite likely that I am missing something here and it would be great if you could flush this argument out a little more or direct me towards a post that does.
I know that this is a common argument against amplification, but I’ve never found it super compelling.
People often point to evil corporations to show that unaligned behavior can emerge from aligned humans, but I don’t think this analogy is very strong. Humans in fact do not share the same goals and are generally competing with each other over resources and power, which seems like the main source of inadequate equilibria to me.
If everyone in the world was a copy of Eliezer, I don’t think we would have a coordination problem around building AGI. They would probably have an Eliezer government that is constantly looking out for emergent misalignment and suggesting organizational changes to squash it. Since everyone in this world is optimizing for making AGI go well and not for profit or status among their Eliezer peers, all you have to do is tell them what the problem is and what they need to do to fix it. You don’t have to threaten them with jail time or worry that they will exploit loopholes in Eliezer law.
I think it is quite likely that I am missing something here and it would be great if you could flush this argument out a little more or direct me towards a post that does.