In the end it’s no better than deontology or simply saying “I think this is good”; there is no point trying to vest it with a semblance of objectivity that just isn’t there.
You’re right that it isn’t an objective calculation, and apparently it requires more subjective assumptions, so I’ll agree that we really shouldn’t be treating this as though it’s an objective calculation.
I don’t think it’s that improbable either, I just think it’s irresponsible either way when so much is at stake.
I agree that testing that hypothesis is dangerously irresponsible, given the stakes involved. That’s why I still support alignment work.
I think the biggest things if success without dignity happens, I think it will be due to some of the following factors:
Alignment turns out to be really easy by default, that is something like the naive ideas like RLHF just work, or it turns out that value learning is almost trivial.
Corrigibility is really easy or trivial to do, such that alignment isn’t relevant, because humans can redirect it’s goals easily. In particular, it’s easy to get AIs to respect a shutdown order.
We can’t make AGI, or it’s too hard to progress AGI to ASI.
These are the major factors I view as likely in a success without dignity case, or we survive AGI/ASI via luck.
I find 1 unlikely, 2 almost impossible (or rather, it would imply partial alignment, in which at least you managed to impress Asimov’s Second Law of Robotics into your AGI above all else), and 3 the most likely, but also unstable (what if your 10^8 instances of AGI engineers suddenly achieve a breakthrough after 20 years of work?). So this doesn’t seem particularly satisfying to me.
You’re right that it isn’t an objective calculation, and apparently it requires more subjective assumptions, so I’ll agree that we really shouldn’t be treating this as though it’s an objective calculation.
I agree that testing that hypothesis is dangerously irresponsible, given the stakes involved. That’s why I still support alignment work.
I think the biggest things if success without dignity happens, I think it will be due to some of the following factors:
Alignment turns out to be really easy by default, that is something like the naive ideas like RLHF just work, or it turns out that value learning is almost trivial.
Corrigibility is really easy or trivial to do, such that alignment isn’t relevant, because humans can redirect it’s goals easily. In particular, it’s easy to get AIs to respect a shutdown order.
We can’t make AGI, or it’s too hard to progress AGI to ASI.
These are the major factors I view as likely in a success without dignity case, or we survive AGI/ASI via luck.
I find 1 unlikely, 2 almost impossible (or rather, it would imply partial alignment, in which at least you managed to impress Asimov’s Second Law of Robotics into your AGI above all else), and 3 the most likely, but also unstable (what if your 10^8 instances of AGI engineers suddenly achieve a breakthrough after 20 years of work?). So this doesn’t seem particularly satisfying to me.