Next, you need to consider the consequences of both aligned and misaligned AGIs. And I suspect they net out to much smaller consequences for AGI once you sum up the positives and negatives, assuming a consequentialist ethical system.
I find this sort of argument kinda nonsensical. Like, yes, it’s useful to conceptualise goods and harms as positives and negatives you balance, but in practice you can’t literally put numbers on them and run the sums, especially not with so many uncertainties at stake. It’s always possible to fudge the numbers and decide that some values are unimportant and some are super important and lo and behold, the calculation turns in your favour! In the end it’s no better than deontology or simply saying “I think this is good”; there is no point trying to vest it with a semblance of objectivity that just isn’t there. I am a consequentialist and I think that overall AGI is on the net probably bad for humanity, and I include also some possible outcomes from aligned AGI in there.
I do think there’s a non-trivial probability of that happening, compared to other alignment people who think the chance is effectively epsilon.
I don’t think it’s that improbable either, I just think it’s irresponsible either way when so much is at stake. I think the biggest possible points of failure of the doom argument are:
we just aren’t able to build AGI any soon (but in that case the whole affair turns out to be much ado about nothing), or
we are able to build AGI, but then AGI can’t really push past to ASI. This might be purely chance, or the result of us using approaches that merely “copy” human intelligence but aren’t able to transcend it (for example, if becoming superintelligent would require being trained on text written by superintelligent entities)
So, sure, we may luck out, thought that leaves us “only” with already plenty disruptive human-level AGI. Regardless, this makes the world potentially a much more unstable powder keg. Even without going specifically down the road EY mentions, I think nuclear and MAD analogies do apply because the power in play is just that great (in fact am writing a post on this, will go up tomorrow if I can finish it).
It’s always possible to fudge the numbers and decide that some values are unimportant and some are super important and lo and behold, the calculation turns in your favour! In the end it’s no better than deontology or simply saying “I think this is good”; there is no point trying to vest it with a semblance of objectivity that just isn’t there.
Is this not simply the fallacy of gray?
As saying goes, it’s easy to lie with statistics, but even easier to lie without them. Certainly you can fudge the numbers to make the result say anything, but if you show your work then the fudging gets more obvious.
I agree that laying out your thinking at least forces you to specifically elucidate your values. That way people can criticise the precise assumptions they disagree with, and you can’t easily back out of them. I don’t think the “lying with statistics” saying applies in its original meaning because really this is entirely about subjective terminal values. “Because I like it this way” is essentially what it boils down to no matter how you slice it.
In the end it’s no better than deontology or simply saying “I think this is good”; there is no point trying to vest it with a semblance of objectivity that just isn’t there.
You’re right that it isn’t an objective calculation, and apparently it requires more subjective assumptions, so I’ll agree that we really shouldn’t be treating this as though it’s an objective calculation.
I don’t think it’s that improbable either, I just think it’s irresponsible either way when so much is at stake.
I agree that testing that hypothesis is dangerously irresponsible, given the stakes involved. That’s why I still support alignment work.
I think the biggest things if success without dignity happens, I think it will be due to some of the following factors:
Alignment turns out to be really easy by default, that is something like the naive ideas like RLHF just work, or it turns out that value learning is almost trivial.
Corrigibility is really easy or trivial to do, such that alignment isn’t relevant, because humans can redirect it’s goals easily. In particular, it’s easy to get AIs to respect a shutdown order.
We can’t make AGI, or it’s too hard to progress AGI to ASI.
These are the major factors I view as likely in a success without dignity case, or we survive AGI/ASI via luck.
I find 1 unlikely, 2 almost impossible (or rather, it would imply partial alignment, in which at least you managed to impress Asimov’s Second Law of Robotics into your AGI above all else), and 3 the most likely, but also unstable (what if your 10^8 instances of AGI engineers suddenly achieve a breakthrough after 20 years of work?). So this doesn’t seem particularly satisfying to me.
I find this sort of argument kinda nonsensical. Like, yes, it’s useful to conceptualise goods and harms as positives and negatives you balance, but in practice you can’t literally put numbers on them and run the sums, especially not with so many uncertainties at stake. It’s always possible to fudge the numbers and decide that some values are unimportant and some are super important and lo and behold, the calculation turns in your favour! In the end it’s no better than deontology or simply saying “I think this is good”; there is no point trying to vest it with a semblance of objectivity that just isn’t there. I am a consequentialist and I think that overall AGI is on the net probably bad for humanity, and I include also some possible outcomes from aligned AGI in there.
I don’t think it’s that improbable either, I just think it’s irresponsible either way when so much is at stake. I think the biggest possible points of failure of the doom argument are:
we just aren’t able to build AGI any soon (but in that case the whole affair turns out to be much ado about nothing), or
we are able to build AGI, but then AGI can’t really push past to ASI. This might be purely chance, or the result of us using approaches that merely “copy” human intelligence but aren’t able to transcend it (for example, if becoming superintelligent would require being trained on text written by superintelligent entities)
So, sure, we may luck out, thought that leaves us “only” with already plenty disruptive human-level AGI. Regardless, this makes the world potentially a much more unstable powder keg. Even without going specifically down the road EY mentions, I think nuclear and MAD analogies do apply because the power in play is just that great (in fact am writing a post on this, will go up tomorrow if I can finish it).
Is this not simply the fallacy of gray?
As saying goes, it’s easy to lie with statistics, but even easier to lie without them. Certainly you can fudge the numbers to make the result say anything, but if you show your work then the fudging gets more obvious.
I agree that laying out your thinking at least forces you to specifically elucidate your values. That way people can criticise the precise assumptions they disagree with, and you can’t easily back out of them. I don’t think the “lying with statistics” saying applies in its original meaning because really this is entirely about subjective terminal values. “Because I like it this way” is essentially what it boils down to no matter how you slice it.
You’re right that it isn’t an objective calculation, and apparently it requires more subjective assumptions, so I’ll agree that we really shouldn’t be treating this as though it’s an objective calculation.
I agree that testing that hypothesis is dangerously irresponsible, given the stakes involved. That’s why I still support alignment work.
I think the biggest things if success without dignity happens, I think it will be due to some of the following factors:
Alignment turns out to be really easy by default, that is something like the naive ideas like RLHF just work, or it turns out that value learning is almost trivial.
Corrigibility is really easy or trivial to do, such that alignment isn’t relevant, because humans can redirect it’s goals easily. In particular, it’s easy to get AIs to respect a shutdown order.
We can’t make AGI, or it’s too hard to progress AGI to ASI.
These are the major factors I view as likely in a success without dignity case, or we survive AGI/ASI via luck.
I find 1 unlikely, 2 almost impossible (or rather, it would imply partial alignment, in which at least you managed to impress Asimov’s Second Law of Robotics into your AGI above all else), and 3 the most likely, but also unstable (what if your 10^8 instances of AGI engineers suddenly achieve a breakthrough after 20 years of work?). So this doesn’t seem particularly satisfying to me.