I’ve been very heavily involved in the (online) rationalist community for a few months now, and like many others, I have found myself quite freaked out by the apparent despair/lack of hope that seems to be sweeping the community. When people who are smarter than you start getting scared, it seems wise to be concerned as well, even if you don’t fully understand the danger. Nonetheless, it’s important not to get swept up in the crowd. I’ve been trying to get a grasp on why so many seem so hopeless, and these are the assumptions I believe they are making (trivial assumptions included, for completeness; there may be some overlap in this list):
AGI is possible to create.
AGI will be created within the next century or so, possibly even within the next few years.
If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.
First of all, is my list of seemingly necessary assumptions correct?
If so, it seems to me that most of these are far from proven statements of fact, and in fact are all heavily debated. Assumption 8 in particular seems to highlight this, as if a strong enough case could be made for each of the previous assumptions, it would be fairly easy to convince most intelligent researchers, which we don’t seem to observe.
A historical example which bears some similarities to the current situation may be Godel’s resolution to Hilbert’s program. He was able to show unarguably that no consistent finite system of axioms is capable of proving all truths, at which point the mathematical community was able to advance beyond the limitations of early formalism. As far as I am aware, no similarly strong argument exists for even one of the assumptions listed above.
Given all of this, and the fact that there are so many uncertainties here, I don’t understand why so many researchers (most prominently Eliezer Yudkowsky, but there are countless more) seem so certain that we are doomed. I find it hard to believe that all alignment ideas presented so far show no promise, considering I’ve yet to see a slam-dunk argument presented for why even a single modern alignment proposals can’t work. (Yes, I’ve seen proofs against straw-man proposals, but not really any undertaken by a current expert in the field). This may very well be due to my own ignorance/ relative newness, however, and if so, please correct me!
I’d like to hear the steelmanned argument for why alignment is hopeless, and Yudkowsky’s announcement that “I’ve tried and couldn’t solve it” without more details doesn’t really impress me. My suspicion is I’m simply missing out on some crucial context, so consider this thread a chance to share your best arguments for AGI-related pessimism. (Later in the week I’ll post a thread from the opposite direction, in order to balance things out).
EDIT: Read the comments section if you have the time; there’s some really good discussion there, and I was successfully convinced of a few specifics that I’m not sure how to incorporate into the original text. 🙃
[Question] Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe
I’ve been very heavily involved in the (online) rationalist community for a few months now, and like many others, I have found myself quite freaked out by the apparent despair/lack of hope that seems to be sweeping the community. When people who are smarter than you start getting scared, it seems wise to be concerned as well, even if you don’t fully understand the danger. Nonetheless, it’s important not to get swept up in the crowd. I’ve been trying to get a grasp on why so many seem so hopeless, and these are the assumptions I believe they are making (trivial assumptions included, for completeness; there may be some overlap in this list):
AGI is possible to create.
AGI will be created within the next century or so, possibly even within the next few years.
If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.
First of all, is my list of seemingly necessary assumptions correct?
If so, it seems to me that most of these are far from proven statements of fact, and in fact are
allheavily debated. Assumption 8 in particular seems to highlight this, as if a strong enough case could be made for each of the previous assumptions, it would be fairly easy to convince most intelligent researchers, which we don’t seem to observe.A historical example which bears some similarities to the current situation may be Godel’s resolution to Hilbert’s program. He was able to show unarguably that no consistent finite system of axioms is capable of proving all truths, at which point the mathematical community was able to advance beyond the limitations of early formalism. As far as I am aware, no similarly strong argument exists for even one of the assumptions listed above.
Given all of this, and the fact that there are so many uncertainties here, I don’t understand why so many researchers (most prominently Eliezer Yudkowsky, but there are countless more) seem so certain that we are doomed. I find it hard to believe that all alignment ideas presented so far show no promise, considering I’ve yet to see a slam-dunk argument presented for why even a single modern alignment proposals can’t work. (Yes, I’ve seen proofs against straw-man proposals, but not really any undertaken by a current expert in the field). This may very well be due to my own ignorance/ relative newness, however, and if so, please correct me!
I’d like to hear the steelmanned argument for why alignment is hopeless, and Yudkowsky’s announcement that “I’ve tried and couldn’t solve it” without more details doesn’t really impress me. My suspicion is I’m simply missing out on some crucial context, so consider this thread a chance to share your best arguments for AGI-related pessimism. (Later in the week I’ll post a thread from the opposite direction, in order to balance things out).
EDIT: Read the comments section if you have the time; there’s some really good discussion there, and I was successfully convinced of a few specifics that I’m not sure how to incorporate into the original text. 🙃