Inner alignment failure is a phenomenon that has happened in existing AI systems, weak as they are. So we know it can happen. We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?
We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?
Yes, because if the first superhuman AGI is aligned, and if it performs a pivotal act to prevent misaligned AGI from being created, then we will avert existential catastrophe.
If there is a 99.99% chance of that happening, then we should be quite sanguine about AI x-risk. On the other hand, if there is only a 0.01% chance, then we should be very worried.
I think it’s inappropriate to call evolution a “hill-climbing process” in this context, since those words seem optimized to sneak in parallels to SGD. Separately, I think that evolution is a bad analogy for AGI training.
Inner alignment failure is a phenomenon that has happened in existing AI systems, weak as they are. So we know it can happen. We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?
Yes, because if the first superhuman AGI is aligned, and if it performs a pivotal act to prevent misaligned AGI from being created, then we will avert existential catastrophe.
If there is a 99.99% chance of that happening, then we should be quite sanguine about AI x-risk. On the other hand, if there is only a 0.01% chance, then we should be very worried.
It’s hard to guess, but it happened when the only one known to us general intelligence was created by a hill-climbing process.
I think it’s inappropriate to call evolution a “hill-climbing process” in this context, since those words seem optimized to sneak in parallels to SGD. Separately, I think that evolution is a bad analogy for AGI training.
This seems super important to the argument! Do you know if it’s been discussed in detail anywhere else?