Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer’s pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.
Maybe I’m not understanding what you mean by “competitive”. On my model, if counterfactually it were possible to align AGI systems that are exactly as powerful as the strongest unaligned AGI systems, e.g. five years after the invention of AGI, then you’d need to do a pivotal act with the aligned AGI system immediately or you die.
So competitiveness of aligned AGI systems doesn’t seem like the main crux to me; the crux is more like ‘if you muddle along and don’t do anything radical, do all the misaligned systems just happen to not be able to find any way to kill all humans?’. Equal capability doesn’t solve the problem when attackers are advantaged.
It sounds like your view is “given continued technological change, we need strong international coordination to avoid extinction, and that requires a ‘pivotal act.’”
But that “pivotal act” is a long time in the subjective future, the case for it being a single “act” is weak, the kinds of pivotal acts being discussed seem totally inappropriate in this regime, and the discussion overall feels pretty inappropriate with very little serious thought by the participants.
For example, my sense from this discourse is that MIRI folks think a strong world government is more likely to come from an AI lab taking over the world than from a more boring looking process of gradual political change or conflict amongst states (and that this is the large majority of how discussed pivotal acts address the problem you are mentioning). I disagree with that,, don’t think it’s been argued for, and don’t think the surprisingness of the claim has even been acknowledge and engaged with.
I disagree with the whole spirit of the sentence “misaligned systems just happen not to be able to find a way to kill all humans:”
I don’t think it’s about misaligned AI. I agree with the mainstream opinion that if competitive alignment is solved, humans deliberately causing trouble represent a larger share of the problem than misaligned AI.
“Just happen not to be able to” is a construction with some strong presuppositions baked in. I could have written the same sentence about terrorists, they “just happen not to be able to” find a way to kill all humans. Yes over time this will get easier, but it’s not like something magical happens, the offense-defense balance and vulnerability to terrorism will gradually increase.
In the world where new technologies destroy the world, I think the default response is a combination of:
We build technologies that improve robustness to particular destructive technologies (especially bioterrorism in the near term, but on the scale of subjective decades I agree that new technologies will arise).
States enforce laws and treaties limiting access to particular destructive technology or making it harder for people to destroy the world (again, likely to be a stopgap over the scale of subjective centuries if not before).
For technologies where it’s impossible to make narrow agreements to restrict access to destructive technologies, then we aim for stronger general agreements and maybe strong world government. (I do think this happens eventually. Here’s a post where I think through some of these issues for myself.)
Overall, these don’t seem like problems current humans need to deal with. I’m very excited for some people to be thinking through these problems, because I do think that helps put us in a better position to solve these problems in the future (and solving them pre-AI would remove the need for technical solutions to alignment!). But I don’t currently think they have a big effect on how we think about the alignment problem.
I don’t think it’s about misaligned AI. I agree with the mainstream opinion that if competitive alignment is solved, humans deliberately causing trouble represent a larger share of the problem than misaligned AI.
Why is this a mainstream opinion? Where does this “mainstream” label come from? I don’t think almost anyone in the broader world has any opinions on this scenario, and from the people I’ve talked to in AI Alignment, this really doesn’t strike me as a topic I’ve seen any kind of consensus on. This to me just sounds like you are labeling people you agree with as “mainstream”. I don’t currently see a point in using words like “mainstream” and (the implied) “fringe” in contexts like this.
I disagree with that,, don’t think it’s been argued for, and don’t think the surprisingness of the claim has even been acknowledge and engaged with.
This also seems to me to randomly throw in an elevated burden of proof, claiming that this claim is surprising, but that your implied opposite claim is not surprising, without any evidence. I find your claims in this domain really surprising, and I also haven’t seen you “acknowledge the surprisingness of [your] claim”. And I wouldn’t expect you to, because to you your claims presumably aren’t surprising.
Claiming that someone “hasn’t acknowledged the surprisingness of their claim” feels like a weird double-counting of both trying to dock someone points for being wrong, and trying to dock them points for not acknowledging that they are wrong, which feel like the same thing to me (just the latter feels like it relies somewhat more on the absurdity heuristic, which seems bad to me in contexts like this).
I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.
I’m happy to leave it up to the reader to decide if the claim (“world government likely to come from AI lab rather than boring political change”) is surprising. I’m also happy if people read my sentence as an expression of my opinion and explanation of why I’m engaging with other parts of Eliezer’s views rather than as an additional argument.
I agree some parts of my comment are just expressions of frustration rather than useful contributions.
I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.
Thanks, that clarifies things. I did misunderstand that sentence to refer to something like the “AI Alignment mainstream”, which feels like a confusing abstraction to me, though I feel like I could have figured it out if I had thought a bit harder before commenting.
For the record, my current model is that “AI ethics” or “AI policy” doesn’t really have a consistent model here, so I am not really sure whether I agree with you that this is indeed the opinion of most of the AI ethics or AI policy community. E.g. I can easily imagine both an AI ethics article saying that if we have really powerful AI, the most important thing is not misuse risk, but moral personhood of the AIs, or the “broader societal impact of the AIs”, both of which feel more misalignment shaped, but I really don’t know (my model of AI ethics people think that whether the AI is misaligned has an effect of whether it “deserves” moral personhood).
I do expect the AI policy community to be more focused on misuse, because they have a lot of influence from national security, which sure is generally focused on misuse and “weapons” as an abstraction, but I again don’t really trust my models here. During the cold war a lot of the policy community ended up in a weird virtue signaling arms race that ended up having a strong consensus in favor of a weird flavor of cosmopolitanism, which I really didn’t expect when I first started looking into this, so I don’t really trust my models of what actual consensus will be when it comes to transformative AI (and don’t really trust current local opinions on AI to be good proxies for that).
long time in the subjective future [...] subjective decades [...] subjective centuries
What is subjective time? Is the idea that human-imitating AI will be sufficiently faithful to what humans would do, such that if AI does something that humans would have done in ten years, we say it happened in a “subjective decade” (which could be much shorter in sidereal time, i.e., the actual subjective time of existing biological humans)?
This argument implicitly measures developments by calendar time—how many years elapsed between the development of AI and the development of destructive physical technology? If we haven’t gotten our house in order by 2045, goes the argument, then what chance do we have of getting our house in order by 2047?
But in the worlds where AI radically increases the pace of technological progress, this is the wrong way to measure. In those worlds science isn’t being done by humans, it is being done by a complex ecology of interacting machines moving an order of magnitude faster than modern society. Probably it’s not just science: everything is getting done by a complex ecology of interacting machines at unprecedented speed.
If we want to ask about “how much stuff will happen”, or “how much change we will see”, it is more appropriate to think about subjective time: how much thinking and acting actually got done? It doesn’t really matter how many times the earth went around the sun.
I’m not thinking of AI that is faithful to what humans would do, just AI that at all represents human interests well enough that “the AI had 100 years to think” is meaningful. If you don’t have such an AI, then (i) we aren’t in the competitive AI alignment world, (ii) you are probably dead anyway.
If you think in terms of calendar time, then yes everything happens incredibly quickly. It’s weird to me that Rob is even talking about “5 years” (though I have no idea what AGI means, so maybe?). I would usually guess that 5 calendar years after TAI is probably post-singularity, so effectively many subjective millennia and so the world is unlikely to closely resemble our world (at least with respect to governance of new technologies).
Maybe I’m not understanding what you mean by “competitive”. On my model, if counterfactually it were possible to align AGI systems that are exactly as powerful as the strongest unaligned AGI systems, e.g. five years after the invention of AGI, then you’d need to do a pivotal act with the aligned AGI system immediately or you die.
So competitiveness of aligned AGI systems doesn’t seem like the main crux to me; the crux is more like ‘if you muddle along and don’t do anything radical, do all the misaligned systems just happen to not be able to find any way to kill all humans?’. Equal capability doesn’t solve the problem when attackers are advantaged.
It sounds like your view is “given continued technological change, we need strong international coordination to avoid extinction, and that requires a ‘pivotal act.’”
But that “pivotal act” is a long time in the subjective future, the case for it being a single “act” is weak, the kinds of pivotal acts being discussed seem totally inappropriate in this regime, and the discussion overall feels pretty inappropriate with very little serious thought by the participants.
For example, my sense from this discourse is that MIRI folks think a strong world government is more likely to come from an AI lab taking over the world than from a more boring looking process of gradual political change or conflict amongst states (and that this is the large majority of how discussed pivotal acts address the problem you are mentioning). I disagree with that,, don’t think it’s been argued for, and don’t think the surprisingness of the claim has even been acknowledge and engaged with.
I disagree with the whole spirit of the sentence “misaligned systems just happen not to be able to find a way to kill all humans:”
I don’t think it’s about misaligned AI. I agree with the mainstream opinion that if competitive alignment is solved, humans deliberately causing trouble represent a larger share of the problem than misaligned AI.
“Just happen not to be able to” is a construction with some strong presuppositions baked in. I could have written the same sentence about terrorists, they “just happen not to be able to” find a way to kill all humans. Yes over time this will get easier, but it’s not like something magical happens, the offense-defense balance and vulnerability to terrorism will gradually increase.
In the world where new technologies destroy the world, I think the default response is a combination of:
We build technologies that improve robustness to particular destructive technologies (especially bioterrorism in the near term, but on the scale of subjective decades I agree that new technologies will arise).
States enforce laws and treaties limiting access to particular destructive technology or making it harder for people to destroy the world (again, likely to be a stopgap over the scale of subjective centuries if not before).
For technologies where it’s impossible to make narrow agreements to restrict access to destructive technologies, then we aim for stronger general agreements and maybe strong world government. (I do think this happens eventually. Here’s a post where I think through some of these issues for myself.)
Overall, these don’t seem like problems current humans need to deal with. I’m very excited for some people to be thinking through these problems, because I do think that helps put us in a better position to solve these problems in the future (and solving them pre-AI would remove the need for technical solutions to alignment!). But I don’t currently think they have a big effect on how we think about the alignment problem.
Why is this a mainstream opinion? Where does this “mainstream” label come from? I don’t think almost anyone in the broader world has any opinions on this scenario, and from the people I’ve talked to in AI Alignment, this really doesn’t strike me as a topic I’ve seen any kind of consensus on. This to me just sounds like you are labeling people you agree with as “mainstream”. I don’t currently see a point in using words like “mainstream” and (the implied) “fringe” in contexts like this.
This also seems to me to randomly throw in an elevated burden of proof, claiming that this claim is surprising, but that your implied opposite claim is not surprising, without any evidence. I find your claims in this domain really surprising, and I also haven’t seen you “acknowledge the surprisingness of [your] claim”. And I wouldn’t expect you to, because to you your claims presumably aren’t surprising.
Claiming that someone “hasn’t acknowledged the surprisingness of their claim” feels like a weird double-counting of both trying to dock someone points for being wrong, and trying to dock them points for not acknowledging that they are wrong, which feel like the same thing to me (just the latter feels like it relies somewhat more on the absurdity heuristic, which seems bad to me in contexts like this).
I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.
I’m happy to leave it up to the reader to decide if the claim (“world government likely to come from AI lab rather than boring political change”) is surprising. I’m also happy if people read my sentence as an expression of my opinion and explanation of why I’m engaging with other parts of Eliezer’s views rather than as an additional argument.
I agree some parts of my comment are just expressions of frustration rather than useful contributions.
Thanks, that clarifies things. I did misunderstand that sentence to refer to something like the “AI Alignment mainstream”, which feels like a confusing abstraction to me, though I feel like I could have figured it out if I had thought a bit harder before commenting.
For the record, my current model is that “AI ethics” or “AI policy” doesn’t really have a consistent model here, so I am not really sure whether I agree with you that this is indeed the opinion of most of the AI ethics or AI policy community. E.g. I can easily imagine both an AI ethics article saying that if we have really powerful AI, the most important thing is not misuse risk, but moral personhood of the AIs, or the “broader societal impact of the AIs”, both of which feel more misalignment shaped, but I really don’t know (my model of AI ethics people think that whether the AI is misaligned has an effect of whether it “deserves” moral personhood).
I do expect the AI policy community to be more focused on misuse, because they have a lot of influence from national security, which sure is generally focused on misuse and “weapons” as an abstraction, but I again don’t really trust my models here. During the cold war a lot of the policy community ended up in a weird virtue signaling arms race that ended up having a strong consensus in favor of a weird flavor of cosmopolitanism, which I really didn’t expect when I first started looking into this, so I don’t really trust my models of what actual consensus will be when it comes to transformative AI (and don’t really trust current local opinions on AI to be good proxies for that).
What is subjective time? Is the idea that human-imitating AI will be sufficiently faithful to what humans would do, such that if AI does something that humans would have done in ten years, we say it happened in a “subjective decade” (which could be much shorter in sidereal time, i.e., the actual subjective time of existing biological humans)?
… ah, I see you address this in the linked post on “Handling Destructive Technology”:
I’m not thinking of AI that is faithful to what humans would do, just AI that at all represents human interests well enough that “the AI had 100 years to think” is meaningful. If you don’t have such an AI, then (i) we aren’t in the competitive AI alignment world, (ii) you are probably dead anyway.
If you think in terms of calendar time, then yes everything happens incredibly quickly. It’s weird to me that Rob is even talking about “5 years” (though I have no idea what AGI means, so maybe?). I would usually guess that 5 calendar years after TAI is probably post-singularity, so effectively many subjective millennia and so the world is unlikely to closely resemble our world (at least with respect to governance of new technologies).