There are a bunch of things that differ between part I and part II, I believe they are correlated with each other but not at all perfectly. In the post I’m intending to illustrate what I believe some plausible failures look like, in a way intended to capture a bunch of the probability space. I’m illustrating these kinds of bad generalizations and ways in which the resulting failures could be catastrophic. I don’t really know what “making the claim” means, but I would say that any ways in which the story isn’t realistic are interesting to me (and we’ve already discussed many, and my views have—unsurprisingly!---changed considerably in the details over the last 2 years), whether they are about the generalizations or the impacts.
I do think that the “going out with a whimper” scenario may ultimately transition into something abrupt, unless people don’t have their act together enough to even put up a fight (which I do think is fairly likely conditioned on catastrophe, and may be the most likely failure mode).
It seems like you at least need to explain why in that situation we can’t continue to work on the alignment problem and replace the agents with better-aligned AI systems in the future
We can continue to work on the alignment problem and continue to fail to solve it, e.g. because the problem is very challenging or impossible or because we don’t end up putting in a giant excellent effort (e.g. if we spent a billion dollars a year on alignment right now it seems plausible it would be a catastrophic mess of people working on irrelevant stuff, generating lots of noise while we continue to make important progress at a very slow rate).
The most important reason this is possible is that change is accelerating radically, e.g. I believe that it’s quite plausible we will not have massive investment in these problems until we are 5-10 years away from a singularity and so just don’t have much time.
If you are saying “Well why not wait until after the singularity?” then yes, I do think that eventually it doesn’t look like this. But that can just look like people failing to get their act together, and then eventually when they try to replace deployed AI systems they fail. Depending on how generalization works that may look like a failure (as in scenario 2) or everything may just look dandy from the human perspective because they are now permanently unable to effectively perceive or act in the real world (especially off of earth). I basically think that all bets are off if humans just try to sit tight while an incomprehensible AI world-outside-the-gates goes through a growth explosion.
I think there’s a perspective where the post-singularity failure is still the important thing to talk about, and that’s an error I made in writing the post. I skipped it because there is no real action after the singularity—the damage is irreversibly done, all of the high-stakes decisions are behind us—but it still matters for people trying to wrap their heads around what’s going on. And moreover, the only reason it looks that way to me is because I’m bringing in a ton of background empirical assumptions (e.g. I believe that massive acceleration in growth is quite likely), and the story will justifiably sound very different to someone who isn’t coming in with those assumptions.
I think there’s a perspective where the post-singularity failure is still the important thing to talk about, and that’s an error I made in writing the post. I skipped it because there is no real action after the singularity—the damage is irreversibly done, all of the high-stakes decisions are behind us—but it still matters for people trying to wrap their heads around what’s going on. And moreover, the only reason it looks that way to me is because I’m bringing in a ton of background empirical assumptions (e.g. I believe that massive acceleration in growth is quite likely), and the story will justifiably sound very different to someone who isn’t coming in with those assumptions.
Fwiw I think I didn’t realize you weren’t making claims about what post-singularity looked like, and that was part of my confusion about this post. Interpreting it as “what’s happening until the singularity” makes more sense. (And I think I’m mostly fine with the claim that it isn’t that important to think about what happens after the singularity.)
There are a bunch of things that differ between part I and part II, I believe they are correlated with each other but not at all perfectly. In the post I’m intending to illustrate what I believe some plausible failures look like, in a way intended to capture a bunch of the probability space. I’m illustrating these kinds of bad generalizations and ways in which the resulting failures could be catastrophic. I don’t really know what “making the claim” means, but I would say that any ways in which the story isn’t realistic are interesting to me (and we’ve already discussed many, and my views have—unsurprisingly!---changed considerably in the details over the last 2 years), whether they are about the generalizations or the impacts.
I do think that the “going out with a whimper” scenario may ultimately transition into something abrupt, unless people don’t have their act together enough to even put up a fight (which I do think is fairly likely conditioned on catastrophe, and may be the most likely failure mode).
We can continue to work on the alignment problem and continue to fail to solve it, e.g. because the problem is very challenging or impossible or because we don’t end up putting in a giant excellent effort (e.g. if we spent a billion dollars a year on alignment right now it seems plausible it would be a catastrophic mess of people working on irrelevant stuff, generating lots of noise while we continue to make important progress at a very slow rate).
The most important reason this is possible is that change is accelerating radically, e.g. I believe that it’s quite plausible we will not have massive investment in these problems until we are 5-10 years away from a singularity and so just don’t have much time.
If you are saying “Well why not wait until after the singularity?” then yes, I do think that eventually it doesn’t look like this. But that can just look like people failing to get their act together, and then eventually when they try to replace deployed AI systems they fail. Depending on how generalization works that may look like a failure (as in scenario 2) or everything may just look dandy from the human perspective because they are now permanently unable to effectively perceive or act in the real world (especially off of earth). I basically think that all bets are off if humans just try to sit tight while an incomprehensible AI world-outside-the-gates goes through a growth explosion.
I think there’s a perspective where the post-singularity failure is still the important thing to talk about, and that’s an error I made in writing the post. I skipped it because there is no real action after the singularity—the damage is irreversibly done, all of the high-stakes decisions are behind us—but it still matters for people trying to wrap their heads around what’s going on. And moreover, the only reason it looks that way to me is because I’m bringing in a ton of background empirical assumptions (e.g. I believe that massive acceleration in growth is quite likely), and the story will justifiably sound very different to someone who isn’t coming in with those assumptions.
Fwiw I think I didn’t realize you weren’t making claims about what post-singularity looked like, and that was part of my confusion about this post. Interpreting it as “what’s happening until the singularity” makes more sense. (And I think I’m mostly fine with the claim that it isn’t that important to think about what happens after the singularity.)