In AI alignment, we face a very similar, but fundamentally easier problem. This is because while with human alignment we have to build structures that can handle existing human minds solely through controlling (some of) their input, in AI alignment we have direct control over the construction of the mind itself including its architecture, the entirety of its training data, its training process and, with perfect interpretability tools, the ability to monitor exactly what it is learning, how it is learning, and to directly edit and control its internal thoughts and knowledge. None of these abilities are (currently) possible with human minds [1]. In theory, this should mean that it is possible to align AI intelligences much better than we can align humans and that our abilities should scale much farther than current social technology. Indeed, since we expect AIs to reach extremely high levels of intelligence and capability compared to humans, if we are stuck with current human alignment methods such as markets, governemnts etc, then we are likely doomed.
I think the problem is easier and harder in this case. A key challenge is that large capabilities differentials basically mean that the law or contracts might not matter much, and this reduces to the Prisoner’s Dilemma, where misaligned goals means the rational action is to defect, not cooperate.
That’s probably the simplest explanation of the hardness of the problem.
I think the problem is easier and harder in this case. A key challenge is that large capabilities differentials basically mean that the law or contracts might not matter much, and this reduces to the Prisoner’s Dilemma, where misaligned goals means the rational action is to defect, not cooperate.
That’s probably the simplest explanation of the hardness of the problem.
Very fair point! I somehow forgot to add a counterpoint like this in there as I intended. Updated now