The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that’s what it was designed to care about.
So I indeed don’t get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn’t need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.
@L Rudolf L can talk on his own, but for me, a crux probably is I don’t expect either unaligned superintelligence singleton or a value aligned superintelligence creating utopia as the space of likely outcomes within the next few decades.
For the unaligned superintelligence point, my basic reasons is I now believe the alignment problem got significantly easier compared to 15 years ago, I’ve become more bullish on AI control working out since o3, and I’ve come to think instrumental convergence is probably correct for some AIs we build in practice, but that instrumental drives are more constrainable on the likely paths to AGI and ASI.
For the alignment point, a big reason for this is I now think a lot of what makes an AI aligned is primarily data, compared to inductive biases, and one of my biggest divergences with the LW community comes down to me thinking that inductive bias is way less necessary for alignment than people usually think, especially compared to 15 years ago.
For AI control, one update I’ve made for o3 is that I believe OpenAI managed to get the RL loop working in domains where outcomes are easily verifiable, but not in domains where verifying is hard, and programming/mathematics are such domains where verifying is easy, but the tie-in is that capabilities will be more spikey/narrow than you may think, and this matters since I believe narrow/tool AI has a relevant role to play in an intelligence explosion, so you can actually affect the outcome by building narrow capabilities AI for a few years, and the fact that AI capabilities are spikey in domains where we can easily verify outcomes is good for eliciting AI capabilities, which is a part of AI control.
For the singleton point, it’s probably because I believe takeoff is both slow and somewhat distributed enough such that multiple superintelligent AIs can arise.
For the value-aligned superintelligence creating a utopia for everyone, my basic reason for why I don’t really believe in this is because I believe value conflicts are effectively irresolvable due to moral subjectivism, which forces the utopia to be a utopia for some people, and I expect the set of people that are in an individual utopia to be small in practice (because value conflicts become more relevant for AIs that can create nation-states all by themselves.)
For why humans are decision makers, this is probably because AI is either controlled or certain companies have chosen to make AIs follow instruction-following drives, and that actually succeeding.
And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that’s what it was designed to care about.
So I indeed don’t get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn’t need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.
@L Rudolf L can talk on his own, but for me, a crux probably is I don’t expect either unaligned superintelligence singleton or a value aligned superintelligence creating utopia as the space of likely outcomes within the next few decades.
For the unaligned superintelligence point, my basic reasons is I now believe the alignment problem got significantly easier compared to 15 years ago, I’ve become more bullish on AI control working out since o3, and I’ve come to think instrumental convergence is probably correct for some AIs we build in practice, but that instrumental drives are more constrainable on the likely paths to AGI and ASI.
For the alignment point, a big reason for this is I now think a lot of what makes an AI aligned is primarily data, compared to inductive biases, and one of my biggest divergences with the LW community comes down to me thinking that inductive bias is way less necessary for alignment than people usually think, especially compared to 15 years ago.
For AI control, one update I’ve made for o3 is that I believe OpenAI managed to get the RL loop working in domains where outcomes are easily verifiable, but not in domains where verifying is hard, and programming/mathematics are such domains where verifying is easy, but the tie-in is that capabilities will be more spikey/narrow than you may think, and this matters since I believe narrow/tool AI has a relevant role to play in an intelligence explosion, so you can actually affect the outcome by building narrow capabilities AI for a few years, and the fact that AI capabilities are spikey in domains where we can easily verify outcomes is good for eliciting AI capabilities, which is a part of AI control.
For the singleton point, it’s probably because I believe takeoff is both slow and somewhat distributed enough such that multiple superintelligent AIs can arise.
For the value-aligned superintelligence creating a utopia for everyone, my basic reason for why I don’t really believe in this is because I believe value conflicts are effectively irresolvable due to moral subjectivism, which forces the utopia to be a utopia for some people, and I expect the set of people that are in an individual utopia to be small in practice (because value conflicts become more relevant for AIs that can create nation-states all by themselves.)
For why humans are decision makers, this is probably because AI is either controlled or certain companies have chosen to make AIs follow instruction-following drives, and that actually succeeding.
And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
Relatedly, are we sure that CEV is computable?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.