it doesn’t make sense to talk about “aligning superintelligence”, but rather about “aligning civilization” (or some other entity which has the ability to control outcomes)
The key insight here is that
(1) “Entities which do in fact control outcomes”
and
(2) “Entities which are near-optimal at solving the specific problem of grabbing power and wielding it”
and
(3) “Entities which are good at correctly solving a broad range of information processing/optimization problems”
are three distinct sets of entities which the Yudkowsky/Bostrom/Russell paradigm of AI risk has smooshed into one (“The Godlike AI will be (3) so therefore it will be (2) so therefore it will be (1)!”). But reality may simply not work like that and if you look at the real world, (1), (2) and (3) are all distinct sets.
The gap between (3) and (2) is the advantage of specialization. Problem-solving is not a linear scale of goodness, it’s an expanding cone where advances in some directions are irrelevant to other directions.
The gap between (1) and (2) - the difference between being best at getting power and actually having the most power—is the advantage of the incumbent. Powerful incumbents can be highly suboptimal and still win because of things like network effects, agglomerative effects, defender’s advantage and so on.
There is also another gap here. It’s the gap between making entities that are generically obedient, and making a power-structure that produces good outcomes. What is that gap? Well, entities can be generically obedient but still end up producing bad outcomes because of:
(a) coordination problems (see World War I)
(b) information problems (see things like the promotion of lobotomies or HRT for middle-aged women)
(c) political economy problems (see things like NIMBYism, banning plastic straws, TurboTax corruption)
Problems of type (a) happen when everyone wants a good outcome, but they can’t coordinate on it and defection strategies are dominant so people get the bad Nash Equilibrium
Problems of type (b) happen when everyone obediently walks off a cliff together. Supporting things like HRT for middle-aged or drinking a glass of red wine per week women was backed by science, but the science was actually bunk. People like to copy each other and obedience makes this worse because dissenters are punished more. They’re being disobedient, you see!
Problems of type (c) happen because a small group of people actually benefit from making the world worse, and it often turns out that that small group are the ones who get to decide whether to perpetuate that particular way of making the world worse!
For an example of the crushing advantage of specialization, see this tweet about how a tiny LLM with specialized training for multiplication of large numbers is better at it than cutting-edge general purpose LLMs.
The key insight here is that
(1) “Entities which do in fact control outcomes”
and
(2) “Entities which are near-optimal at solving the specific problem of grabbing power and wielding it”
and
(3) “Entities which are good at correctly solving a broad range of information processing/optimization problems”
are three distinct sets of entities which the Yudkowsky/Bostrom/Russell paradigm of AI risk has smooshed into one (“The Godlike AI will be (3) so therefore it will be (2) so therefore it will be (1)!”). But reality may simply not work like that and if you look at the real world, (1), (2) and (3) are all distinct sets.
The gap between (3) and (2) is the advantage of specialization. Problem-solving is not a linear scale of goodness, it’s an expanding cone where advances in some directions are irrelevant to other directions.
The gap between (1) and (2) - the difference between being best at getting power and actually having the most power—is the advantage of the incumbent. Powerful incumbents can be highly suboptimal and still win because of things like network effects, agglomerative effects, defender’s advantage and so on.
There is also another gap here. It’s the gap between making entities that are generically obedient, and making a power-structure that produces good outcomes. What is that gap? Well, entities can be generically obedient but still end up producing bad outcomes because of:
(a) coordination problems (see World War I)
(b) information problems (see things like the promotion of lobotomies or HRT for middle-aged women)
(c) political economy problems (see things like NIMBYism, banning plastic straws, TurboTax corruption)
Problems of type (a) happen when everyone wants a good outcome, but they can’t coordinate on it and defection strategies are dominant so people get the bad Nash Equilibrium
Problems of type (b) happen when everyone obediently walks off a cliff together. Supporting things like HRT for middle-aged or drinking a glass of red wine per week women was backed by science, but the science was actually bunk. People like to copy each other and obedience makes this worse because dissenters are punished more. They’re being disobedient, you see!
Problems of type (c) happen because a small group of people actually benefit from making the world worse, and it often turns out that that small group are the ones who get to decide whether to perpetuate that particular way of making the world worse!
For an example of the crushing advantage of specialization, see this tweet about how a tiny LLM with specialized training for multiplication of large numbers is better at it than cutting-edge general purpose LLMs.
https://twitter.com/yuntiandeng/status/1836114419480166585