While I agree that the COVID response was worse than it could have been, I think there are several important disanalogies between the COVID-19 pandemic and a soft AI takeoff scenario:
1. Many new problems arose during this pandemic for which we did not have historical experience, e.g. in supply chains. (Perhaps we had historical precedent in the Spanish flu, but that was sufficiently long ago that I don’t expect those lessons to generalize, or for us to remember those lessons.) In contrast, I expect that with AI alignment the problems will not change much as the AI systems become more powerful. Certainly the effects of misaligned powerful AI systems will change dramatically and be harder to mitigate, but I expect the underlying causes of misalignment will not change much, and that’s what we need to gain consensus about and find solutions for. EDIT: Note that with COVID we failed even at existing, known problems (see Raemon’s comment thread below), so this point doesn’t really explain away our failure with COVID.
2. It looks like most institutions took action in about 2 months (mid-Jan to mid-March). While this was (I assume) too slow for COVID, it seems more than sufficient for AI alignment under soft takeoff, where I expect we will have multiple years for the relevant decision-making. However, unlike COVID, there probably won’t be a specific crisis mode that leads to quick(er) decision-making: while there might be warning signs that suggest that action needs to be taken for AI systems, it may not lead to action specifically targeted at AI x-risk.
3. It seems that the model in this post is that we should have learned from past epidemics, and applied it to solve this pandemic. However, in AI alignment, the hope is to learn from failures of narrow AI systems, and use that to prevent failures in more powerful AI systems. I would be pretty pessimistic if AI alignment was banking on noticing failures in powerful AI systems and then quickly mobilizing institutions to mitigate those failures, rather than preventing the failures in the first place. The analogous actions for COVID would be things we could have done before we knew about COVID to mitigate some unknown future pandemic. I don’t know enough about epidemiology to say whether or not there were cost-effective actions that should have been taken ahead of time, but note that any such argument should be evaluated ex ante (i.e. from the perspective where we don’t know that COVID-19 would happen).
Separately, it seems like humanity has in fact significantly mitigated the effects of COVID (something like we reduced deaths to a fraction of what they “could have been”), so if you want to take an extremely outside view approach, you should predict that with AI alignment we’ll mitigate the worst effects but there will still be some pretty bad effects, which still argues for not-extinction. (I don’t personally buy this reasoning; I mention it as a response to people who say “look at all of our civilization’s failures, therefore we should predict failure at AI alignment too”.)
1. Many new problems arose during this pandemic for which we did not have historical experience, e.g. in supply chains. (Perhaps we had historical precedent in the Spanish flu, but that was sufficiently long ago that I don’t expect those lessons to generalize, or for us to remember those lessons.) In contrast, I expect that with AI alignment the problems will not change much as the AI systems become more powerful. Certainly the effects of misaligned powerful AI systems will change dramatically and be harder to mitigate, but I expect the underlying causes of misalignment will not change much, and that’s what we need to gain consensus about and find solutions for.
Wait… you think there will be fewer novel problems arising during AI (a completely unprecedented phenomenon) than in Covid? Even in my most relaxed, responsible slow-takeoff scenarios, that seems like an extremely surprising claim.
I’m also somewhat confused what facts you think we didn’t know about covid that prevented us from preparing – I don’t currently have examples of such facts in mind. (The fact that some countries seem to be doing just fine makes it look to me like its totally doable to have solved covid given the information we had at the time, or at least to have responded dramatically more adequately than many countries did).
you think there will be fewer novel problems arising during AI (a completely unprecedented phenomenon) than in Covid?
Relative to our position now, there will be more novel problems from powerful AI systems than for COVID.
Relative to our position e.g. two years before the “point of no return” (perhaps the deployment of the AI system that will eventually lead to extinction), there will be fewer novel problems than for COVID, at least if we are talking about the underlying causes of misalignment.
(The difference is that with AI alignment we’re trying to prevent misaligned powerful AI systems from being deployed, whereas with pandemics we don’t have the option of preventing “powerful diseases” from arising; we instead have to mitigate their effects.)
I agree that powerful AI systems will lead to more novel problems in their effects on society than COVID did, but that’s mostly irrelevant if your goal is to make sure you don’t have a superintelligent AI system that is trying to hurt you.
I’m also somewhat confused what facts you think we didn’t know about covid that prevented us from preparing
I think it is plausible that we “could have” completely suppressed COVID, and that mostly wouldn’t have required facts we didn’t know, and the fact that we didn’t do that is at least a weak sign of inadequacy.
I think given that we didn’t suppress COVID, mitigating its damage probably involved new problems that we didn’t have solutions for before. As an example, I would guess that in past epidemics the solution to “we have a mask shortage” would have been “buy masks from <country without the epidemic>”, but that no longer works for COVID. But really the intuition is more like “life is very different in this pandemic relative to previous epidemics; it would be shocking if this didn’t make the problem harder in some way that we failed to foresee”.
I think given that we didn’t suppress COVID, mitigating its damage probably involved new problems that we didn’t have solutions for before.
Hmm. This just doesn’t seem like what was going on to me at all. I think I disagree a lot about this, and it seems less about “how things will shake out in Slow AI Takeoff” and more about “how badly and obviously-in-advance and easily-preventably did we screw up our covid response.”
(I expect we also disagree about how Slow Takeoff would look, but I don’t think that’s the cruxy bit for me here).
I’m sort of hesitant to jump into the “why covid obviously looks like mass institutional failure, given a very straightforward, well understood scenario” argument because I feel like it’s been hashed out a lot in the past 3 months and I’m not sure where to go with it – I’m assuming you’ve read the relevant arguments and didn’t find them convincing.
The sort of things I have in mind include:
FDA actively hampers efforts to scale up testing
Hospitals don’t start re-using PPE, when it was clear they were going to have to start doing so in a month
Everyone delays 3 weeks before declaring lockdowns, at a time where the simple math clearly indicated we needed to lock down promptly if we wanted to have a chance at squashing.
Media actively downplays risk and attributes it to racism
CDC and WHO making actively misleading statements
These problems all seemed fairly straightforward and understood. There might also be novel problems going on but they don’t seem necessary to hypothesize given the above types of failure.
Ah, I see. I agree with this and do think it cuts against my point #1, but not points #2 and #3. Edited the top-level comment to note this.
I’m sort of hesitant to jump into the “why covid obviously looks like mass institutional failure, given a very straightforward, well understood scenario” argument because I feel like it’s been hashed out a lot in the past 3 months and I’m not sure where to go with it – I’m assuming you’ve read the relevant arguments and didn’t find them convincing.
Tbc, I find it quite likely that there was mass institutional failure with COVID; I’m mostly arguing that soft takeoff is sufficiently different from COVID that we shouldn’t necessarily expect the same mass institutional failure in the case of soft takeoff. (This is similar to Matthew’s argument that the pandemic shares more properties with fast takeoff than with slow takeoff.)
I do definitely expect different institutional failure in the case of Soft Takeoff. But it sort of depends on what level of abstraction you’re looking at the institutional failure through. Like, the FDA won’t be involved. But there’s a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of “Wield the one hammer we know how to wield to justify our jobs.” (In a large company, it’s possible that regulatory body could be a department inside the org, rather than a government agency)
In reasonably good outcomes, the decisions are mostly being made by tech companies full of specialists who well understand the problem. In that case the institutional failures will look more like “what ways do tech companies normally screw up due to internal politics?”
There’s a decent chance the military or someone will try to commandeer the project, in which case more typical government institutional failures will become more relevant.
One thing that seems significant is that 2 years prior to The Big Transition, you’ll have multiple companies with similar-ish tech. And some of them will be appropriately cautious (like New Zealand, Singapore), and others will not have the political wherewithal to slow down and think carefully and figure out what inconvenient things they need to do and do them (like many other countries in covid)
Yeah, these sorts of stories seem possible, and it also seems possible that institutions try some terrible policies, notice that they’re terrible, and then fix them. Like, this description:
But there’s a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of “Wield the one hammer we know how to wield to justify our jobs.” (In a large company, it’s possible that regulatory body could be a department inside the org, rather than a government agency)
just doesn’t seem to match my impression of non-EAs-or-rationalists working on AI governance. It’s possible that people in government are much less competent than people at think tanks, but this would be fairly surprising to me. In addition, while I can’t explain FDA decisions, I still pretty strongly penalize views that ascribe huge very-consequential-by-their-goals irrationality to small groups of humans working full time on something.
(Note I would defend the claim that institutions work well enough that in a slow takeoff world the probability of extinction is < 80%, and probably < 50%, just on the basis that if AI alignment turned out to be impossible, we can coordinate not to build powerful AI.)
Are you saying you think that wasn’t a fair characterization of the FDA, or that the hypothetical AI Governance bodies would be different from the FDA?
(The statement was certainly not very fair to the FDA, and I do expect there was more going on under the hood than that motivation. But, I do broadly think governing bodies do what they are incentivized to do, which includes justifying themselves, especially after being around a couple decades and gradually being infiltrated by careerists)
However, in AI alignment, the hope is to learn from failures of narrow AI systems, and use that to prevent failures in more powerful AI systems.
This also jumped out at me as being only a subset of what I think of as “AI alignment”; like, ontological collapse doesn’t seem to have been a failure of narrow AI systems. [By ‘ontological collapse’, I mean the problem where the AI knows how to value ‘humans’, and then it discovers that ‘humans’ aren’t fundamental and ‘atoms’ are fundamental, and now it’s not obvious how its preferences will change.]
Perhaps you mean “AI alignment in the slow takeoff frame”, where ‘narrow’ is less a binary judgment and more of a continuous judgment; then it seems more compelling, but I still think the baseline prediction should be doom if we can only ever solve problems after encountering them.
Perhaps you mean “AI alignment in the slow takeoff frame”, where ‘narrow’ is less a binary judgment and more of a continuous judgment
I do mean this.
This also jumped out at me as being only a subset of what I think of as “AI alignment”; like, ontological collapse doesn’t seem to have been a failure of narrow AI systems.
I’d predict that either ontological collapse won’t be a problem, or we’ll notice it in AI systems that are less general than humans. (After all, humans have in fact undergone ontological collapse, so presumably AI systems will also have undergone it by the time they reach human level generality.)
I still think the baseline prediction should be doom if we can only ever solve problems after encountering them.
This depends on what you count as “encountering a problem”.
At one extreme, you might look at Faulty Reward Functions in the Wild and this counts as “encountering” the problem “If you train using PPO with such-and-such hyperparameters on the score reward function in the CoastRunners game then on this specific level the boat might get into a cycle of getting turbo boosts instead of finishing the race”. If this is what it means to encounter a problem, then I agree the baseline prediction should be doom if we only solve problems after encountering them.
At the other extreme, maybe you look at it and this counts as “encountering” the problem “sometimes AI systems are not beneficial to humans”. So, if you solve this problem (which we’ve already encountered), then almost tautologically you’ve solved AI alignment.
I’m not sure how to make further progress on this disagreement.
While I agree that the COVID response was worse than it could have been, I think there are several important disanalogies between the COVID-19 pandemic and a soft AI takeoff scenario:
1. Many new problems arose during this pandemic for which we did not have historical experience, e.g. in supply chains. (Perhaps we had historical precedent in the Spanish flu, but that was sufficiently long ago that I don’t expect those lessons to generalize, or for us to remember those lessons.) In contrast, I expect that with AI alignment the problems will not change much as the AI systems become more powerful. Certainly the effects of misaligned powerful AI systems will change dramatically and be harder to mitigate, but I expect the underlying causes of misalignment will not change much, and that’s what we need to gain consensus about and find solutions for. EDIT: Note that with COVID we failed even at existing, known problems (see Raemon’s comment thread below), so this point doesn’t really explain away our failure with COVID.
2. It looks like most institutions took action in about 2 months (mid-Jan to mid-March). While this was (I assume) too slow for COVID, it seems more than sufficient for AI alignment under soft takeoff, where I expect we will have multiple years for the relevant decision-making. However, unlike COVID, there probably won’t be a specific crisis mode that leads to quick(er) decision-making: while there might be warning signs that suggest that action needs to be taken for AI systems, it may not lead to action specifically targeted at AI x-risk.
3. It seems that the model in this post is that we should have learned from past epidemics, and applied it to solve this pandemic. However, in AI alignment, the hope is to learn from failures of narrow AI systems, and use that to prevent failures in more powerful AI systems. I would be pretty pessimistic if AI alignment was banking on noticing failures in powerful AI systems and then quickly mobilizing institutions to mitigate those failures, rather than preventing the failures in the first place. The analogous actions for COVID would be things we could have done before we knew about COVID to mitigate some unknown future pandemic. I don’t know enough about epidemiology to say whether or not there were cost-effective actions that should have been taken ahead of time, but note that any such argument should be evaluated ex ante (i.e. from the perspective where we don’t know that COVID-19 would happen).
Separately, it seems like humanity has in fact significantly mitigated the effects of COVID (something like we reduced deaths to a fraction of what they “could have been”), so if you want to take an extremely outside view approach, you should predict that with AI alignment we’ll mitigate the worst effects but there will still be some pretty bad effects, which still argues for not-extinction. (I don’t personally buy this reasoning; I mention it as a response to people who say “look at all of our civilization’s failures, therefore we should predict failure at AI alignment too”.)
Wait… you think there will be fewer novel problems arising during AI (a completely unprecedented phenomenon) than in Covid? Even in my most relaxed, responsible slow-takeoff scenarios, that seems like an extremely surprising claim.
I’m also somewhat confused what facts you think we didn’t know about covid that prevented us from preparing – I don’t currently have examples of such facts in mind. (The fact that some countries seem to be doing just fine makes it look to me like its totally doable to have solved covid given the information we had at the time, or at least to have responded dramatically more adequately than many countries did).
Relative to our position now, there will be more novel problems from powerful AI systems than for COVID.
Relative to our position e.g. two years before the “point of no return” (perhaps the deployment of the AI system that will eventually lead to extinction), there will be fewer novel problems than for COVID, at least if we are talking about the underlying causes of misalignment.
(The difference is that with AI alignment we’re trying to prevent misaligned powerful AI systems from being deployed, whereas with pandemics we don’t have the option of preventing “powerful diseases” from arising; we instead have to mitigate their effects.)
I agree that powerful AI systems will lead to more novel problems in their effects on society than COVID did, but that’s mostly irrelevant if your goal is to make sure you don’t have a superintelligent AI system that is trying to hurt you.
I think it is plausible that we “could have” completely suppressed COVID, and that mostly wouldn’t have required facts we didn’t know, and the fact that we didn’t do that is at least a weak sign of inadequacy.
I think given that we didn’t suppress COVID, mitigating its damage probably involved new problems that we didn’t have solutions for before. As an example, I would guess that in past epidemics the solution to “we have a mask shortage” would have been “buy masks from <country without the epidemic>”, but that no longer works for COVID. But really the intuition is more like “life is very different in this pandemic relative to previous epidemics; it would be shocking if this didn’t make the problem harder in some way that we failed to foresee”.
Hmm. This just doesn’t seem like what was going on to me at all. I think I disagree a lot about this, and it seems less about “how things will shake out in Slow AI Takeoff” and more about “how badly and obviously-in-advance and easily-preventably did we screw up our covid response.”
(I expect we also disagree about how Slow Takeoff would look, but I don’t think that’s the cruxy bit for me here).
I’m sort of hesitant to jump into the “why covid obviously looks like mass institutional failure, given a very straightforward, well understood scenario” argument because I feel like it’s been hashed out a lot in the past 3 months and I’m not sure where to go with it – I’m assuming you’ve read the relevant arguments and didn’t find them convincing.
The sort of things I have in mind include:
FDA actively hampers efforts to scale up testing
Hospitals don’t start re-using PPE, when it was clear they were going to have to start doing so in a month
Everyone delays 3 weeks before declaring lockdowns, at a time where the simple math clearly indicated we needed to lock down promptly if we wanted to have a chance at squashing.
Media actively downplays risk and attributes it to racism
CDC and WHO making actively misleading statements
These problems all seemed fairly straightforward and understood. There might also be novel problems going on but they don’t seem necessary to hypothesize given the above types of failure.
Ah, I see. I agree with this and do think it cuts against my point #1, but not points #2 and #3. Edited the top-level comment to note this.
Tbc, I find it quite likely that there was mass institutional failure with COVID; I’m mostly arguing that soft takeoff is sufficiently different from COVID that we shouldn’t necessarily expect the same mass institutional failure in the case of soft takeoff. (This is similar to Matthew’s argument that the pandemic shares more properties with fast takeoff than with slow takeoff.)
Ah, okay. I think I need to at least think a bit harder to figure out if I still disagree in that case.
I do definitely expect different institutional failure in the case of Soft Takeoff. But it sort of depends on what level of abstraction you’re looking at the institutional failure through. Like, the FDA won’t be involved. But there’s a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of “Wield the one hammer we know how to wield to justify our jobs.” (In a large company, it’s possible that regulatory body could be a department inside the org, rather than a government agency)
In reasonably good outcomes, the decisions are mostly being made by tech companies full of specialists who well understand the problem. In that case the institutional failures will look more like “what ways do tech companies normally screw up due to internal politics?”
There’s a decent chance the military or someone will try to commandeer the project, in which case more typical government institutional failures will become more relevant.
One thing that seems significant is that 2 years prior to The Big Transition, you’ll have multiple companies with similar-ish tech. And some of them will be appropriately cautious (like New Zealand, Singapore), and others will not have the political wherewithal to slow down and think carefully and figure out what inconvenient things they need to do and do them (like many other countries in covid)
Yeah, these sorts of stories seem possible, and it also seems possible that institutions try some terrible policies, notice that they’re terrible, and then fix them. Like, this description:
just doesn’t seem to match my impression of non-EAs-or-rationalists working on AI governance. It’s possible that people in government are much less competent than people at think tanks, but this would be fairly surprising to me. In addition, while I can’t explain FDA decisions, I still pretty strongly penalize views that ascribe huge very-consequential-by-their-goals irrationality to small groups of humans working full time on something.
(Note I would defend the claim that institutions work well enough that in a slow takeoff world the probability of extinction is < 80%, and probably < 50%, just on the basis that if AI alignment turned out to be impossible, we can coordinate not to build powerful AI.)
Are you saying you think that wasn’t a fair characterization of the FDA, or that the hypothetical AI Governance bodies would be different from the FDA?
(The statement was certainly not very fair to the FDA, and I do expect there was more going on under the hood than that motivation. But, I do broadly think governing bodies do what they are incentivized to do, which includes justifying themselves, especially after being around a couple decades and gradually being infiltrated by careerists)
I am mostly confused, but I expect that if I learned more I would say that it wasn’t a fair characterization of the FDA.
This also jumped out at me as being only a subset of what I think of as “AI alignment”; like, ontological collapse doesn’t seem to have been a failure of narrow AI systems. [By ‘ontological collapse’, I mean the problem where the AI knows how to value ‘humans’, and then it discovers that ‘humans’ aren’t fundamental and ‘atoms’ are fundamental, and now it’s not obvious how its preferences will change.]
Perhaps you mean “AI alignment in the slow takeoff frame”, where ‘narrow’ is less a binary judgment and more of a continuous judgment; then it seems more compelling, but I still think the baseline prediction should be doom if we can only ever solve problems after encountering them.
I do mean this.
I’d predict that either ontological collapse won’t be a problem, or we’ll notice it in AI systems that are less general than humans. (After all, humans have in fact undergone ontological collapse, so presumably AI systems will also have undergone it by the time they reach human level generality.)
This depends on what you count as “encountering a problem”.
At one extreme, you might look at Faulty Reward Functions in the Wild and this counts as “encountering” the problem “If you train using PPO with such-and-such hyperparameters on the score reward function in the CoastRunners game then on this specific level the boat might get into a cycle of getting turbo boosts instead of finishing the race”. If this is what it means to encounter a problem, then I agree the baseline prediction should be doom if we only solve problems after encountering them.
At the other extreme, maybe you look at it and this counts as “encountering” the problem “sometimes AI systems are not beneficial to humans”. So, if you solve this problem (which we’ve already encountered), then almost tautologically you’ve solved AI alignment.
I’m not sure how to make further progress on this disagreement.