I’m not sure focusing on individual evil is the right approach. It seems to me that most people become much more evil when they can avoid punishment. A lot of evil is done by organizations, which are made of normal people but act as a shield from punishment. So if we teach AIs to be just as “aligned” as normal people, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history—which is to say, not very well.
I’m not sure focusing on individual evil is the right approach. It seems to me that most people become much more evil when they can avoid punishment. A lot of evil is done by organizations, which are made of normal people but act as a shield from punishment. So if we teach AIs to be just as “aligned” as normal people, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history—which is to say, not very well.