Decentralized AI is the e/acc hope, and the posthuman social science of Robin Hanson argues that power is likely to always remain multipolar, but I doubt that’s how things work. It is unlikely that human intelligence represents a maximum, or that AI is going to remain at its current level indefinitely. Instead we will see AI that defeats human intelligence in all domains, as comprehensively as chess computers beat human champions.
And such an outcome is inherent to the pursuit of AI capabilities. There is no natural situation in which the power elites of the world heartily pursue AI, yet somehow hold back from the creation of superintelligence. Nor is it natural for multiple centers of power to advance towards superintelligence while retaining some parity of power. “Winner takes all” is the natural outcome, for the first to achieve superintelligence—except that, as Eliezer has explained, there is a great risk for the superintelligence pioneer that whatever purposes its human architects had, will be swallowed up by an emergent value system of the superintelligent AI itself.
(I think one should still regard the NSA as the most likely winner of the superintelligence race, even though the American private sector is leading the way, since the NSA has access to everything those companies are doing, and has no intention of surrendering American primacy by allowing some other power to reach the summit first.)
For this reason, I think that our best chance at a future in which we end up with a desirable outcome, not by blind good luck, but because right choices were made before power decisively passed out of human hands, is the original MIRI research agenda of “CEV”, i.e. the design of an AI value system sufficient to act as the seed of an entire transhuman civilization, of a kind that humans would approve (if they had time enough to reflect). We should plan as if power is going to escape human control, and ask ourselves what kind of beings we would want to be running things in our place.
Eliezer in his current blackpilled phase, speaks of a “textbook from the future” in which the presently unknown theory of safe creation of superintelligence is spelt out, as something it would take decades to figure out; also, that obtaining the knowledge in it would be surrounded with peril, as one cannot create superintelligence with the “wrong” values, learn from the mistake, and then start over.
Nonetheless, as long as there is a race to advance AI capabilities (and I expect that this will continue to be the case, right until someone succeeds all too well, and superintelligence is created, ending the era of human sovereignty), we need to have people trying to solve the CEV problem, in the hope that they get the essence of it right, and that the winner of the mind race was paying attention to them.
I think that something like winner-take-all is somewhat implausible, but an oligopoly is quite likely IMO, and to a certain extent I think both Eliezer Yudkowsky and Robin Hanson plausibly got this wrong.
Though I wouldn’t rule out something like a winner take all, so I don’t fully buy my own claim in the first sentence.
I think some of my cruxes for why I’m way less doomy than Eliezer, and overall far less doomy than LWers in general are the following:
I believe we already have a large portion of the insights needed to create safe superintelligence today, and while they require a lot of work to be done, it’s the kind of work that can be done with both money and time, rather than “a new insight is required.”
Eliezer in his current blackpilled phase, speaks of a “textbook from the future” in which the presently unknown theory of safe creation of superintelligence is spelt out, as something it would take decades to figure out; also, that obtaining the knowledge in it would be surrounded with peril, as one cannot create superintelligence with the “wrong” values, learn from the mistake, and then start over.
I’d say the biggest portions of the theory is the following:
Contra Nate Soares, alignment generalizes further than capabilities, for some of the following reasons:
It is way easier to judge whether your values are satisfied than to actually enact on your values, and more generally the pervasive gap between verifying and generation is very helpful for reward models trying to generalize out of distribution.
Value learning comes mostly for free with enough data, and more generally values are simpler and easier to learn than capabilities, which leads to the next point:
Contra evopsych, values aren’t nearly as complicated and fragile as we thought 15-20 years ago, and more importantly, depend more on data than evopsych thought it did, so it means that what values an AI has is strongly affected by what data it received in the past.
Alignment is greatly helped by synthetic data, since we can generate datasets of human values in a wide variety of circumstances, and importantly test this in simulated worlds slightly different from our own:
For values, this arises because we can create large datasets showcasing the ‘correct’ values we want our AI to understand in a wide variety of circumstances and can iteratively refine this by testing the models’ generalization of these values to novel situations. Moreover, we can prevent any ‘value hijacking’ of the model since it will not understand or be able to represent any new or different values.
I also think synthetic data will be hugely important to capabilities progress going forward, so synthetic data alignment has lower taxes than other solutions to alignment.
Decentralized AI is the e/acc hope, and the posthuman social science of Robin Hanson argues that power is likely to always remain multipolar, but I doubt that’s how things work. It is unlikely that human intelligence represents a maximum, or that AI is going to remain at its current level indefinitely. Instead we will see AI that defeats human intelligence in all domains, as comprehensively as chess computers beat human champions.
And such an outcome is inherent to the pursuit of AI capabilities. There is no natural situation in which the power elites of the world heartily pursue AI, yet somehow hold back from the creation of superintelligence. Nor is it natural for multiple centers of power to advance towards superintelligence while retaining some parity of power. “Winner takes all” is the natural outcome, for the first to achieve superintelligence—except that, as Eliezer has explained, there is a great risk for the superintelligence pioneer that whatever purposes its human architects had, will be swallowed up by an emergent value system of the superintelligent AI itself.
(I think one should still regard the NSA as the most likely winner of the superintelligence race, even though the American private sector is leading the way, since the NSA has access to everything those companies are doing, and has no intention of surrendering American primacy by allowing some other power to reach the summit first.)
For this reason, I think that our best chance at a future in which we end up with a desirable outcome, not by blind good luck, but because right choices were made before power decisively passed out of human hands, is the original MIRI research agenda of “CEV”, i.e. the design of an AI value system sufficient to act as the seed of an entire transhuman civilization, of a kind that humans would approve (if they had time enough to reflect). We should plan as if power is going to escape human control, and ask ourselves what kind of beings we would want to be running things in our place.
Eliezer in his current blackpilled phase, speaks of a “textbook from the future” in which the presently unknown theory of safe creation of superintelligence is spelt out, as something it would take decades to figure out; also, that obtaining the knowledge in it would be surrounded with peril, as one cannot create superintelligence with the “wrong” values, learn from the mistake, and then start over.
Nonetheless, as long as there is a race to advance AI capabilities (and I expect that this will continue to be the case, right until someone succeeds all too well, and superintelligence is created, ending the era of human sovereignty), we need to have people trying to solve the CEV problem, in the hope that they get the essence of it right, and that the winner of the mind race was paying attention to them.
I think that something like winner-take-all is somewhat implausible, but an oligopoly is quite likely IMO, and to a certain extent I think both Eliezer Yudkowsky and Robin Hanson plausibly got this wrong.
Though I wouldn’t rule out something like a winner take all, so I don’t fully buy my own claim in the first sentence.
I think some of my cruxes for why I’m way less doomy than Eliezer, and overall far less doomy than LWers in general are the following:
I believe we already have a large portion of the insights needed to create safe superintelligence today, and while they require a lot of work to be done, it’s the kind of work that can be done with both money and time, rather than “a new insight is required.”
I’d say the biggest portions of the theory is the following:
Contra Nate Soares, alignment generalizes further than capabilities, for some of the following reasons:
It is way easier to judge whether your values are satisfied than to actually enact on your values, and more generally the pervasive gap between verifying and generation is very helpful for reward models trying to generalize out of distribution.
Value learning comes mostly for free with enough data, and more generally values are simpler and easier to learn than capabilities, which leads to the next point:
Contra evopsych, values aren’t nearly as complicated and fragile as we thought 15-20 years ago, and more importantly, depend more on data than evopsych thought it did, so it means that what values an AI has is strongly affected by what data it received in the past.
Cf here:
https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/
Alignment is greatly helped by synthetic data, since we can generate datasets of human values in a wide variety of circumstances, and importantly test this in simulated worlds slightly different from our own:
I also think synthetic data will be hugely important to capabilities progress going forward, so synthetic data alignment has lower taxes than other solutions to alignment.
https://www.beren.io/2024-05-11-Alignment-in-the-Age-of-Synthetic-Data/
These are the big insights that I think we have right now, and I think they’re likely enough to enable safe paths to superintelligence.