This is a frequent disagreement I have with Eliezer and he seems to consistently find my view either perplexing or obviously misguided. So I guess this is as good a place as any to express that view:
I want AI to do a wide variety of things like:
Run factories, write software, manage militaries etc, and so these things about as well as an unaligned AI such that humans can reasonably expect to hold their own in a conflict with a smaller coalition using unaligned AI liberally.
Help make further progress on alignment, design/negotiate/enforce agreements between labs that reduce the risk of deploying unaligned AI, etc. and do these things about as well as possible given the underlying ML technology.
Generally make policy, enforce the law, forecast and respond to technological risks, make grants and run projects and do the whole EA thing, etc. in a way that helps us navigate alignment and also the other risks that will emerge rapidly in a faster-moving world. I think the good outcome is “like humans but faster.”
I think that the technical problem “build competitive AI that doesn’t disempower humanity” is generally better than a problem like “build AI that can build nanotech without disempowering humanity,” in the sense that I strongly think people should have “competitive AI alignment” in mind as a goal day to day, rather than trying to tell a story about how their AI does some particular pivotal act. This is a non-trivial methodological claim (though so is Eliezer’s).
Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer’s pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.
This is partly because I think the kind of story that Eliezer tells about AI building nanotechnology or brain emulations (or whatever other pivotal act he is imagining) doesn’t reflect how automated R&D is likely to actually look. It looks like it’s totally plausible for many kinds of limited systems to greatly accelerate R&D, and when Eliezer starts making concrete claims about what different kinds of systems can and can’t do I think he’s on pretty shaky ground (and e.g. I think this is where it’s most likely he’s going to be wrong if he tries to cash out this view as predictions about what AI can and can’t do in the near term). This is probably the second big disagreement, and is extremely important for these discussions.
I am supportive of Eliezer’s general interest in pushing people to talk concretely, and I think that there is an important sense in which you really should be highly skeptical of any abstract story you can’t make concrete. I think that some proposals wouldn’t meaningfully reduce risk because proponents don’t have a realistic concrete scenario in mind, and their optimistic scenario is only able to seem realistic because it avoids being concrete. Unfortunately, I think that Eliezer jumps for this explanation way too quickly in general (I think this an instance of Eliezer having a library of 10-100 cognitive errors that he attributes whenever possible as an explanation for a disagreement). I think this can make it really painful to talk with Eliezer.
When Eliezer talks concretely about possible futures it feels to me like he wants to have a very simplified story of the world, and is very unhappy when answers to “what does the AI do in 2030?” are anywhere near as complicated as “what do humans do in 2020?” For example, I think his methodology for talking about the world, and his practical method for diagnosing when someone has no concrete picture, would basically not work for someone living in 1800 who had a crystal ball looking at 2000. This would be easier to discuss in the context of more details about discussions with Eliezer.
This is exacerbated by Eliezer’s desire to focus on what you might call the “endgame,” asking about what’s happening in a world where AI greatly outstrips humanity. I suspect that this world is mostly shaped by AIs, whether things are going well or poorly, and so it really is more like living in 1600 and talking about 2000. It’s meaningful to talk concretely about such a wildly different world full of people who know things you don’t. But I think you need to be aware of that when thinking about how to decide whether a story is realistic or not.
As mentioned, I think those two things are likely downstream of Eliezer having high conviction that AI needs to do a narrow pivotal task in order to be safely alignable. But I think that the substance here is in a set of claims about technical alignment, AI capabilities, and policy, and that it’s burying the lede to frame this as being about “people don’t think concretely about what their AI needs to do.”
I think a fair number of people are confused by this kind of question from Eliezer because it’s so obvious or natural that aligned AI would be used for a very broad variety of tasks and that it’s obviously hard to talk specifics without making predictably-false claims about the future. There’s still a game that Eliezer is inviting them to play, but he should more understand that this game is not a natural and simple game on other perspectives, so it’s going to take time to communicate it, especially when Eliezer is constantly making a bunch of background assumptions that other people aren’t into.
Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer’s pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.
Maybe I’m not understanding what you mean by “competitive”. On my model, if counterfactually it were possible to align AGI systems that are exactly as powerful as the strongest unaligned AGI systems, e.g. five years after the invention of AGI, then you’d need to do a pivotal act with the aligned AGI system immediately or you die.
So competitiveness of aligned AGI systems doesn’t seem like the main crux to me; the crux is more like ‘if you muddle along and don’t do anything radical, do all the misaligned systems just happen to not be able to find any way to kill all humans?’. Equal capability doesn’t solve the problem when attackers are advantaged.
It sounds like your view is “given continued technological change, we need strong international coordination to avoid extinction, and that requires a ‘pivotal act.’”
But that “pivotal act” is a long time in the subjective future, the case for it being a single “act” is weak, the kinds of pivotal acts being discussed seem totally inappropriate in this regime, and the discussion overall feels pretty inappropriate with very little serious thought by the participants.
For example, my sense from this discourse is that MIRI folks think a strong world government is more likely to come from an AI lab taking over the world than from a more boring looking process of gradual political change or conflict amongst states (and that this is the large majority of how discussed pivotal acts address the problem you are mentioning). I disagree with that,, don’t think it’s been argued for, and don’t think the surprisingness of the claim has even been acknowledge and engaged with.
I disagree with the whole spirit of the sentence “misaligned systems just happen not to be able to find a way to kill all humans:”
I don’t think it’s about misaligned AI. I agree with the mainstream opinion that if competitive alignment is solved, humans deliberately causing trouble represent a larger share of the problem than misaligned AI.
“Just happen not to be able to” is a construction with some strong presuppositions baked in. I could have written the same sentence about terrorists, they “just happen not to be able to” find a way to kill all humans. Yes over time this will get easier, but it’s not like something magical happens, the offense-defense balance and vulnerability to terrorism will gradually increase.
In the world where new technologies destroy the world, I think the default response is a combination of:
We build technologies that improve robustness to particular destructive technologies (especially bioterrorism in the near term, but on the scale of subjective decades I agree that new technologies will arise).
States enforce laws and treaties limiting access to particular destructive technology or making it harder for people to destroy the world (again, likely to be a stopgap over the scale of subjective centuries if not before).
For technologies where it’s impossible to make narrow agreements to restrict access to destructive technologies, then we aim for stronger general agreements and maybe strong world government. (I do think this happens eventually. Here’s a post where I think through some of these issues for myself.)
Overall, these don’t seem like problems current humans need to deal with. I’m very excited for some people to be thinking through these problems, because I do think that helps put us in a better position to solve these problems in the future (and solving them pre-AI would remove the need for technical solutions to alignment!). But I don’t currently think they have a big effect on how we think about the alignment problem.
I don’t think it’s about misaligned AI. I agree with the mainstream opinion that if competitive alignment is solved, humans deliberately causing trouble represent a larger share of the problem than misaligned AI.
Why is this a mainstream opinion? Where does this “mainstream” label come from? I don’t think almost anyone in the broader world has any opinions on this scenario, and from the people I’ve talked to in AI Alignment, this really doesn’t strike me as a topic I’ve seen any kind of consensus on. This to me just sounds like you are labeling people you agree with as “mainstream”. I don’t currently see a point in using words like “mainstream” and (the implied) “fringe” in contexts like this.
I disagree with that,, don’t think it’s been argued for, and don’t think the surprisingness of the claim has even been acknowledge and engaged with.
This also seems to me to randomly throw in an elevated burden of proof, claiming that this claim is surprising, but that your implied opposite claim is not surprising, without any evidence. I find your claims in this domain really surprising, and I also haven’t seen you “acknowledge the surprisingness of [your] claim”. And I wouldn’t expect you to, because to you your claims presumably aren’t surprising.
Claiming that someone “hasn’t acknowledged the surprisingness of their claim” feels like a weird double-counting of both trying to dock someone points for being wrong, and trying to dock them points for not acknowledging that they are wrong, which feel like the same thing to me (just the latter feels like it relies somewhat more on the absurdity heuristic, which seems bad to me in contexts like this).
I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.
I’m happy to leave it up to the reader to decide if the claim (“world government likely to come from AI lab rather than boring political change”) is surprising. I’m also happy if people read my sentence as an expression of my opinion and explanation of why I’m engaging with other parts of Eliezer’s views rather than as an additional argument.
I agree some parts of my comment are just expressions of frustration rather than useful contributions.
I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.
Thanks, that clarifies things. I did misunderstand that sentence to refer to something like the “AI Alignment mainstream”, which feels like a confusing abstraction to me, though I feel like I could have figured it out if I had thought a bit harder before commenting.
For the record, my current model is that “AI ethics” or “AI policy” doesn’t really have a consistent model here, so I am not really sure whether I agree with you that this is indeed the opinion of most of the AI ethics or AI policy community. E.g. I can easily imagine both an AI ethics article saying that if we have really powerful AI, the most important thing is not misuse risk, but moral personhood of the AIs, or the “broader societal impact of the AIs”, both of which feel more misalignment shaped, but I really don’t know (my model of AI ethics people think that whether the AI is misaligned has an effect of whether it “deserves” moral personhood).
I do expect the AI policy community to be more focused on misuse, because they have a lot of influence from national security, which sure is generally focused on misuse and “weapons” as an abstraction, but I again don’t really trust my models here. During the cold war a lot of the policy community ended up in a weird virtue signaling arms race that ended up having a strong consensus in favor of a weird flavor of cosmopolitanism, which I really didn’t expect when I first started looking into this, so I don’t really trust my models of what actual consensus will be when it comes to transformative AI (and don’t really trust current local opinions on AI to be good proxies for that).
long time in the subjective future [...] subjective decades [...] subjective centuries
What is subjective time? Is the idea that human-imitating AI will be sufficiently faithful to what humans would do, such that if AI does something that humans would have done in ten years, we say it happened in a “subjective decade” (which could be much shorter in sidereal time, i.e., the actual subjective time of existing biological humans)?
This argument implicitly measures developments by calendar time—how many years elapsed between the development of AI and the development of destructive physical technology? If we haven’t gotten our house in order by 2045, goes the argument, then what chance do we have of getting our house in order by 2047?
But in the worlds where AI radically increases the pace of technological progress, this is the wrong way to measure. In those worlds science isn’t being done by humans, it is being done by a complex ecology of interacting machines moving an order of magnitude faster than modern society. Probably it’s not just science: everything is getting done by a complex ecology of interacting machines at unprecedented speed.
If we want to ask about “how much stuff will happen”, or “how much change we will see”, it is more appropriate to think about subjective time: how much thinking and acting actually got done? It doesn’t really matter how many times the earth went around the sun.
I’m not thinking of AI that is faithful to what humans would do, just AI that at all represents human interests well enough that “the AI had 100 years to think” is meaningful. If you don’t have such an AI, then (i) we aren’t in the competitive AI alignment world, (ii) you are probably dead anyway.
If you think in terms of calendar time, then yes everything happens incredibly quickly. It’s weird to me that Rob is even talking about “5 years” (though I have no idea what AGI means, so maybe?). I would usually guess that 5 calendar years after TAI is probably post-singularity, so effectively many subjective millennia and so the world is unlikely to closely resemble our world (at least with respect to governance of new technologies).
So I guess this is as good a place as any to express that view
Meta point: seems sad to me if many arguments on topics are spread across many posts in a way that’d be hard for a person to track down e.g. all the arguments regarding generalization/not-generalization.
This makes me want something like the Arbital wiki vision where you can find not just settled facts, but also the list of arguments/considerations in either direction on disputed topics.
Plausibly the existing LW/AF wiki-tag system could do this as far as format/software goes, we just need to get people creating pages for all the concepts/disagreements and then properly tagging things and distilling things. This is an addition to better pages for relatively more settled ideas like “inner alignment”.
All of this is a plausible thing for the LW team to try to make happen. [Focus for the next month or two is (a) ensuring that amidst all the great discussion of AI, LessWrong doesn’t lose its identity as a site for Rationality/epistemics/pursuing truth/accurate models across all domains, (b) fostering epistemics in the new wave of alignment researchers (and community builders), though I am quite uncertain about many aspects of this goal/plan.]
I have just recently been wondering where we stand on the very basic description of the problem criteria for productive conversations. Of late our conversations seem to have more of the flavor of proposal for solution → criticism of solution, which of course is fine if we have the problem described; but if that were the case why do so many criticisms take the form of disagreements over the nature of the problem?
A very reasonable objection is that there are too many unknowns at work, so people are working on those. But this feels like one meta-problem, so the same reasoning should apply and we want a description of the meta-problem.
I suppose it might be fair to say we are currently working on competing descriptions of the meta-problem. Note to self: doing another survey of the recent conversations with this in mind might be clarifying.
Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer’s pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.
Isn’t the core thing here that Eliezer expects that a local, hard-takeoff is possible? He thinks that a single AI system can rapidly gain enormous power relative to the rest of the world (either by recursive self improvement, or by seizing compute, or by just deploying on more computers)
If this is possible thing for an AGI system to do, it seems like ensuring a human future requires that you’re able to prevent an unaligned AGI from undergoing a hard takeoff.
If you have aligned systems that are competitive in a number of different domains, that doesn’t matter if 1) local hard takeoff is on the table and 2) you aren’t able to produce systems whose alignment is robust to a hard takeoff.
It seems like the pivotal act ideology is a natural consequence of 1) expecting hard takeoff and 2) thinking that alignment is hard, full stop. Whether or not aligned systems will be competitive doesn’t come into it. Or by “competitive” do you mean, specifically “competitive, even across the huge relative capability gain of a hard takeoff”?
It seems like Eliezer’s chain of argument is:
[Hard takeoff is likely]
=>
[You need a pivotal act to preempt unaligned superintelligence]
=>
[Your safe AI design needs to be able to do something concrete that can enable a pivotal act in order to be of strategic relevance.]
=>
[When doing AI safety work, you need to be thinking about the concrete actions that your system will do]
Even if competitiveness is likely tractable, we might have more influence over some worlds where competitiveness is intractable. I don’t think this overwhelms a large disagreement about tractability of a fully competitive alignment solution, but I think there’s something valuable about how pivotal act plans work under the weakest assumptions about alignment difficulty.
If powerful AIs are deployed in worlds mostly shaped by slightly less powerful AIs, you basically need competitiveness to be able to take any “pivotal action” because all the free energy will have been eaten by less powerful AIs.
It looks like it’s totally plausible for many kinds of limited systems to greatly accelerate R&D
Do you have any concrete example from any current alignment work where it would be helpful to have some future AI technology? Or else any other way in which such technologies will be useful for alignment? Would it be something like “we trained it listen to humans, it wasn’t perfect, but we looked at its utility function with transparency tools and that gave us ideas”? Oh, and it should be more useful for alignment than it is useful for creating something that can defeat safety measures at the moment, right? Because I don’t get how, for example, better hardware or coding assistance or money wouldn’t just result in faster development of something misaligned. And so I don’t get how everyone competing on developing AI is helping matters—wouldn’t existence of half as capable AI before the one that ended the world just made world-ending AI to appear earlier? Like, it could dramatically change things, but why would it change things for the better if no one planned it?
I think that future AI technology could automate my job. I think it could also automate capability researchers’ jobs. (It could also help in lots of other ways, but this point seems sufficient to highlight the difference between our views.)
I don’t think that being more useful for alignment is a necessary claim for my position. We are talking about what we want our aligned AIs to do for us, and hence what we should have in mind while doing AI alignment research. If we think AI accelerates technological progress across the board, then the answer “we want our AI to keep accelerating good stuff happening in the world at the same rate that it accelerates dangerous technology” seems like it’s valid.
And it will be ok to have unaligned capabilities, because government will stop them, maybe using existing aligned AI technology, and it will do it in the future but not now because future AI technology will be better in demonstrating risk? Why do you think that default response of humanity to increasing offense-defense balance and vulnerability to terrorism will be correct? Why, for example, capability detection can’t be insufficient at the time when multiple actors arrive at world-destroying capabilities for regulators to stop them?
This is a frequent disagreement I have with Eliezer and he seems to consistently find my view either perplexing or obviously misguided. So I guess this is as good a place as any to express that view:
I want AI to do a wide variety of things like:
Run factories, write software, manage militaries etc, and so these things about as well as an unaligned AI such that humans can reasonably expect to hold their own in a conflict with a smaller coalition using unaligned AI liberally.
Help make further progress on alignment, design/negotiate/enforce agreements between labs that reduce the risk of deploying unaligned AI, etc. and do these things about as well as possible given the underlying ML technology.
Generally make policy, enforce the law, forecast and respond to technological risks, make grants and run projects and do the whole EA thing, etc. in a way that helps us navigate alignment and also the other risks that will emerge rapidly in a faster-moving world. I think the good outcome is “like humans but faster.”
I think that the technical problem “build competitive AI that doesn’t disempower humanity” is generally better than a problem like “build AI that can build nanotech without disempowering humanity,” in the sense that I strongly think people should have “competitive AI alignment” in mind as a goal day to day, rather than trying to tell a story about how their AI does some particular pivotal act. This is a non-trivial methodological claim (though so is Eliezer’s).
Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer’s pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.
This is partly because I think the kind of story that Eliezer tells about AI building nanotechnology or brain emulations (or whatever other pivotal act he is imagining) doesn’t reflect how automated R&D is likely to actually look. It looks like it’s totally plausible for many kinds of limited systems to greatly accelerate R&D, and when Eliezer starts making concrete claims about what different kinds of systems can and can’t do I think he’s on pretty shaky ground (and e.g. I think this is where it’s most likely he’s going to be wrong if he tries to cash out this view as predictions about what AI can and can’t do in the near term). This is probably the second big disagreement, and is extremely important for these discussions.
I am supportive of Eliezer’s general interest in pushing people to talk concretely, and I think that there is an important sense in which you really should be highly skeptical of any abstract story you can’t make concrete. I think that some proposals wouldn’t meaningfully reduce risk because proponents don’t have a realistic concrete scenario in mind, and their optimistic scenario is only able to seem realistic because it avoids being concrete. Unfortunately, I think that Eliezer jumps for this explanation way too quickly in general (I think this an instance of Eliezer having a library of 10-100 cognitive errors that he attributes whenever possible as an explanation for a disagreement). I think this can make it really painful to talk with Eliezer.
When Eliezer talks concretely about possible futures it feels to me like he wants to have a very simplified story of the world, and is very unhappy when answers to “what does the AI do in 2030?” are anywhere near as complicated as “what do humans do in 2020?” For example, I think his methodology for talking about the world, and his practical method for diagnosing when someone has no concrete picture, would basically not work for someone living in 1800 who had a crystal ball looking at 2000. This would be easier to discuss in the context of more details about discussions with Eliezer.
This is exacerbated by Eliezer’s desire to focus on what you might call the “endgame,” asking about what’s happening in a world where AI greatly outstrips humanity. I suspect that this world is mostly shaped by AIs, whether things are going well or poorly, and so it really is more like living in 1600 and talking about 2000. It’s meaningful to talk concretely about such a wildly different world full of people who know things you don’t. But I think you need to be aware of that when thinking about how to decide whether a story is realistic or not.
As mentioned, I think those two things are likely downstream of Eliezer having high conviction that AI needs to do a narrow pivotal task in order to be safely alignable. But I think that the substance here is in a set of claims about technical alignment, AI capabilities, and policy, and that it’s burying the lede to frame this as being about “people don’t think concretely about what their AI needs to do.”
I think a fair number of people are confused by this kind of question from Eliezer because it’s so obvious or natural that aligned AI would be used for a very broad variety of tasks and that it’s obviously hard to talk specifics without making predictably-false claims about the future. There’s still a game that Eliezer is inviting them to play, but he should more understand that this game is not a natural and simple game on other perspectives, so it’s going to take time to communicate it, especially when Eliezer is constantly making a bunch of background assumptions that other people aren’t into.
Maybe I’m not understanding what you mean by “competitive”. On my model, if counterfactually it were possible to align AGI systems that are exactly as powerful as the strongest unaligned AGI systems, e.g. five years after the invention of AGI, then you’d need to do a pivotal act with the aligned AGI system immediately or you die.
So competitiveness of aligned AGI systems doesn’t seem like the main crux to me; the crux is more like ‘if you muddle along and don’t do anything radical, do all the misaligned systems just happen to not be able to find any way to kill all humans?’. Equal capability doesn’t solve the problem when attackers are advantaged.
It sounds like your view is “given continued technological change, we need strong international coordination to avoid extinction, and that requires a ‘pivotal act.’”
But that “pivotal act” is a long time in the subjective future, the case for it being a single “act” is weak, the kinds of pivotal acts being discussed seem totally inappropriate in this regime, and the discussion overall feels pretty inappropriate with very little serious thought by the participants.
For example, my sense from this discourse is that MIRI folks think a strong world government is more likely to come from an AI lab taking over the world than from a more boring looking process of gradual political change or conflict amongst states (and that this is the large majority of how discussed pivotal acts address the problem you are mentioning). I disagree with that,, don’t think it’s been argued for, and don’t think the surprisingness of the claim has even been acknowledge and engaged with.
I disagree with the whole spirit of the sentence “misaligned systems just happen not to be able to find a way to kill all humans:”
I don’t think it’s about misaligned AI. I agree with the mainstream opinion that if competitive alignment is solved, humans deliberately causing trouble represent a larger share of the problem than misaligned AI.
“Just happen not to be able to” is a construction with some strong presuppositions baked in. I could have written the same sentence about terrorists, they “just happen not to be able to” find a way to kill all humans. Yes over time this will get easier, but it’s not like something magical happens, the offense-defense balance and vulnerability to terrorism will gradually increase.
In the world where new technologies destroy the world, I think the default response is a combination of:
We build technologies that improve robustness to particular destructive technologies (especially bioterrorism in the near term, but on the scale of subjective decades I agree that new technologies will arise).
States enforce laws and treaties limiting access to particular destructive technology or making it harder for people to destroy the world (again, likely to be a stopgap over the scale of subjective centuries if not before).
For technologies where it’s impossible to make narrow agreements to restrict access to destructive technologies, then we aim for stronger general agreements and maybe strong world government. (I do think this happens eventually. Here’s a post where I think through some of these issues for myself.)
Overall, these don’t seem like problems current humans need to deal with. I’m very excited for some people to be thinking through these problems, because I do think that helps put us in a better position to solve these problems in the future (and solving them pre-AI would remove the need for technical solutions to alignment!). But I don’t currently think they have a big effect on how we think about the alignment problem.
Why is this a mainstream opinion? Where does this “mainstream” label come from? I don’t think almost anyone in the broader world has any opinions on this scenario, and from the people I’ve talked to in AI Alignment, this really doesn’t strike me as a topic I’ve seen any kind of consensus on. This to me just sounds like you are labeling people you agree with as “mainstream”. I don’t currently see a point in using words like “mainstream” and (the implied) “fringe” in contexts like this.
This also seems to me to randomly throw in an elevated burden of proof, claiming that this claim is surprising, but that your implied opposite claim is not surprising, without any evidence. I find your claims in this domain really surprising, and I also haven’t seen you “acknowledge the surprisingness of [your] claim”. And I wouldn’t expect you to, because to you your claims presumably aren’t surprising.
Claiming that someone “hasn’t acknowledged the surprisingness of their claim” feels like a weird double-counting of both trying to dock someone points for being wrong, and trying to dock them points for not acknowledging that they are wrong, which feel like the same thing to me (just the latter feels like it relies somewhat more on the absurdity heuristic, which seems bad to me in contexts like this).
I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.
I’m happy to leave it up to the reader to decide if the claim (“world government likely to come from AI lab rather than boring political change”) is surprising. I’m also happy if people read my sentence as an expression of my opinion and explanation of why I’m engaging with other parts of Eliezer’s views rather than as an additional argument.
I agree some parts of my comment are just expressions of frustration rather than useful contributions.
Thanks, that clarifies things. I did misunderstand that sentence to refer to something like the “AI Alignment mainstream”, which feels like a confusing abstraction to me, though I feel like I could have figured it out if I had thought a bit harder before commenting.
For the record, my current model is that “AI ethics” or “AI policy” doesn’t really have a consistent model here, so I am not really sure whether I agree with you that this is indeed the opinion of most of the AI ethics or AI policy community. E.g. I can easily imagine both an AI ethics article saying that if we have really powerful AI, the most important thing is not misuse risk, but moral personhood of the AIs, or the “broader societal impact of the AIs”, both of which feel more misalignment shaped, but I really don’t know (my model of AI ethics people think that whether the AI is misaligned has an effect of whether it “deserves” moral personhood).
I do expect the AI policy community to be more focused on misuse, because they have a lot of influence from national security, which sure is generally focused on misuse and “weapons” as an abstraction, but I again don’t really trust my models here. During the cold war a lot of the policy community ended up in a weird virtue signaling arms race that ended up having a strong consensus in favor of a weird flavor of cosmopolitanism, which I really didn’t expect when I first started looking into this, so I don’t really trust my models of what actual consensus will be when it comes to transformative AI (and don’t really trust current local opinions on AI to be good proxies for that).
What is subjective time? Is the idea that human-imitating AI will be sufficiently faithful to what humans would do, such that if AI does something that humans would have done in ten years, we say it happened in a “subjective decade” (which could be much shorter in sidereal time, i.e., the actual subjective time of existing biological humans)?
… ah, I see you address this in the linked post on “Handling Destructive Technology”:
I’m not thinking of AI that is faithful to what humans would do, just AI that at all represents human interests well enough that “the AI had 100 years to think” is meaningful. If you don’t have such an AI, then (i) we aren’t in the competitive AI alignment world, (ii) you are probably dead anyway.
If you think in terms of calendar time, then yes everything happens incredibly quickly. It’s weird to me that Rob is even talking about “5 years” (though I have no idea what AGI means, so maybe?). I would usually guess that 5 calendar years after TAI is probably post-singularity, so effectively many subjective millennia and so the world is unlikely to closely resemble our world (at least with respect to governance of new technologies).
Meta point: seems sad to me if many arguments on topics are spread across many posts in a way that’d be hard for a person to track down e.g. all the arguments regarding generalization/not-generalization.
This makes me want something like the Arbital wiki vision where you can find not just settled facts, but also the list of arguments/considerations in either direction on disputed topics.
Plausibly the existing LW/AF wiki-tag system could do this as far as format/software goes, we just need to get people creating pages for all the concepts/disagreements and then properly tagging things and distilling things. This is an addition to better pages for relatively more settled ideas like “inner alignment”.
All of this is a plausible thing for the LW team to try to make happen. [Focus for the next month or two is (a) ensuring that amidst all the great discussion of AI, LessWrong doesn’t lose its identity as a site for Rationality/epistemics/pursuing truth/accurate models across all domains, (b) fostering epistemics in the new wave of alignment researchers (and community builders), though I am quite uncertain about many aspects of this goal/plan.]
I have just recently been wondering where we stand on the very basic description of the problem criteria for productive conversations. Of late our conversations seem to have more of the flavor of proposal for solution → criticism of solution, which of course is fine if we have the problem described; but if that were the case why do so many criticisms take the form of disagreements over the nature of the problem?
A very reasonable objection is that there are too many unknowns at work, so people are working on those. But this feels like one meta-problem, so the same reasoning should apply and we want a description of the meta-problem.
I suppose it might be fair to say we are currently working on competing descriptions of the meta-problem. Note to self: doing another survey of the recent conversations with this in mind might be clarifying.
Stampy’s QA format might be a reasonable fit, given that we’re aiming to become an single point of access for alignment.
Isn’t the core thing here that Eliezer expects that a local, hard-takeoff is possible? He thinks that a single AI system can rapidly gain enormous power relative to the rest of the world (either by recursive self improvement, or by seizing compute, or by just deploying on more computers)
If this is possible thing for an AGI system to do, it seems like ensuring a human future requires that you’re able to prevent an unaligned AGI from undergoing a hard takeoff.
If you have aligned systems that are competitive in a number of different domains, that doesn’t matter if 1) local hard takeoff is on the table and 2) you aren’t able to produce systems whose alignment is robust to a hard takeoff.
It seems like the pivotal act ideology is a natural consequence of 1) expecting hard takeoff and 2) thinking that alignment is hard, full stop. Whether or not aligned systems will be competitive doesn’t come into it. Or by “competitive” do you mean, specifically “competitive, even across the huge relative capability gain of a hard takeoff”?
It seems like Eliezer’s chain of argument is:
[Hard takeoff is likely]
=>
[You need a pivotal act to preempt unaligned superintelligence]
=>
[Your safe AI design needs to be able to do something concrete that can enable a pivotal act in order to be of strategic relevance.]
=>
[When doing AI safety work, you need to be thinking about the concrete actions that your system will do]
Even if competitiveness is likely tractable, we might have more influence over some worlds where competitiveness is intractable. I don’t think this overwhelms a large disagreement about tractability of a fully competitive alignment solution, but I think there’s something valuable about how pivotal act plans work under the weakest assumptions about alignment difficulty.
If powerful AIs are deployed in worlds mostly shaped by slightly less powerful AIs, you basically need competitiveness to be able to take any “pivotal action” because all the free energy will have been eaten by less powerful AIs.
Do you have any concrete example from any current alignment work where it would be helpful to have some future AI technology? Or else any other way in which such technologies will be useful for alignment? Would it be something like “we trained it listen to humans, it wasn’t perfect, but we looked at its utility function with transparency tools and that gave us ideas”? Oh, and it should be more useful for alignment than it is useful for creating something that can defeat safety measures at the moment, right? Because I don’t get how, for example, better hardware or coding assistance or money wouldn’t just result in faster development of something misaligned. And so I don’t get how everyone competing on developing AI is helping matters—wouldn’t existence of half as capable AI before the one that ended the world just made world-ending AI to appear earlier? Like, it could dramatically change things, but why would it change things for the better if no one planned it?
I think that future AI technology could automate my job. I think it could also automate capability researchers’ jobs. (It could also help in lots of other ways, but this point seems sufficient to highlight the difference between our views.)
I don’t think that being more useful for alignment is a necessary claim for my position. We are talking about what we want our aligned AIs to do for us, and hence what we should have in mind while doing AI alignment research. If we think AI accelerates technological progress across the board, then the answer “we want our AI to keep accelerating good stuff happening in the world at the same rate that it accelerates dangerous technology” seems like it’s valid.
And it will be ok to have unaligned capabilities, because government will stop them, maybe using existing aligned AI technology, and it will do it in the future but not now because future AI technology will be better in demonstrating risk? Why do you think that default response of humanity to increasing offense-defense balance and vulnerability to terrorism will be correct? Why, for example, capability detection can’t be insufficient at the time when multiple actors arrive at world-destroying capabilities for regulators to stop them?