I have a couple object-level disagreements including relevance of evolution / nature of inner alignment problem and difficulty of attaining corrigibility. But leaving those aside, I wouldn’t have exactly written this kind of document myself, because I’m not quite sure what the purpose is. It seems to be trying to do a lot of different things for different audiences, where I think more narrowly-tailored documents would be better.
So, here are four useful things to do, and whether I’m personally doing them:
First, there is a mass of people who think AGI risk is trivial and stupid and p(doom) ≈ 0, and they can gleefully race to build AGI, or do other things that will speed the development of AGI (like improve PyTorch, or study the neocortex), and they can totally ignore the field of AGI safety, and when they have AGI algorithms they can mess around with them without a care in the world.
It would be very good to convince those people that AGI control is a serious and hard and currently-unsolved (and interesting!) problem, and that p(doom) will remain high (say, >>10%) unless and until we solve it.
I think this is a specific audience that warrants a narrowly-tailored document, e.g. avoiding jargon and addressing the basics very well.
That’s a big part of what I was going for in this post, for example. (And more generally, that whole series.)
Second, there are people who are thoughtful and well-informed about AGI risk in general, but not sold on the “pivotal act” idea. If they had an AGI, they would do things that pattern-match to “cautious scientists doing very careful experiments in a dangerous domain”, but they would not do things that pattern-match to “aggressively and urgently use their new tool to prevent the imminent end of the world, by any means necessary, even if it’s super-illegal and aggressive and somewhat dangerous and everyone will hate them”.
(I’m using “pivotal act” in a slightly broader sense that also includes “giving a human-level AGI autonomy to undergo recursive self-improvement and invent and deploy its own new technology”, since the latter has the same sort of dangerous properties and aggressive feel about it as a proper “pivotal act”.)
(Well, it’s possible that there are people sold on the “pivotal act” idea who wouldn’t say it publicly.)
Last week I did a little exercise of trying to guess p(doom), conditional on the two assumptions in this other comment. I got well over 99%, but I noted with interest that only a minority of my p(doom) was coming from “no one knows how to keep an AGI under control” (which I’m less pessimistic about than Eliezer, heck maybe I’m even as high as 20% that we can keep an AGI under control :-P , and I’m hoping that further research will increase that), whereas a majority of my p(doom) was coming from “there will be cautious responsible actors who will follow the rules and be modest and not do pivotal acts, and there will also be some reckless actors who will create out-of-control omnicidal AGIs”.
So it seems extremely important to figure out whether a “pivotal act” is in fact necessary for a good future. And if it is (a big “if”!), then it likewise seems extremely important to get relevant decisionmaking people on board with that.
I think it would be valuable to have a document narrowly tailored to this topic, finding the cruxes and arguments and counter-arguments etc. For example, I think this is a topic that looks very different in a Paul-Christriano-style future (gradual multipolar takeoff, near-misses, “corrigible AI assistants”, “strategy stealing assumption”, etc.) then in the world that I expect (decisive first-mover advantage).
But I don’t really feel qualified to write anything like that myself, at least not before talking to lots of people, and it also might be the kind of thing that’s better as a conversation than a blog post.
Third, there are people (e.g. leadership at OpenAI & DeepMind) making decisions that trade off between “AGI is invented soon” versus “AGI is invented by us people who are at least trying to avoid catastrophe and be altruistic”. Insofar as I think they’re making bad tradeoffs, I would like to convince them of that.
Again, it would be useful to have a document narrowly tailored to this topic. I’m not planning to write one, but perhaps I’m sorta addressing it indirectly when I share my idiosyncratic models of exactly what technical work I think needs to be done before we can align an AGI.
Fourth, there are people who have engaged with the AGI alignment / safety literature / discourse but are pursuing directions that will not solve the problem. It would be very valuable to spread common knowledge that those approaches are doomed. But if I were going to do that, it would (again) be a separate narrowly-tailored document, perhaps either organized by challenge that the approaches are not up to the task of solving, or organized by research program that I’m criticizing, naming names. I have dabbled in this kind of thing (example), but don’t have any immediate plan to do it more, let alone systematically. I think that would be extremely time-consuming.
I have a couple object-level disagreements including relevance of evolution / nature of inner alignment problem and difficulty of attaining corrigibility. But leaving those aside, I wouldn’t have exactly written this kind of document myself, because I’m not quite sure what the purpose is. It seems to be trying to do a lot of different things for different audiences, where I think more narrowly-tailored documents would be better.
So, here are four useful things to do, and whether I’m personally doing them:
First, there is a mass of people who think AGI risk is trivial and stupid and p(doom) ≈ 0, and they can gleefully race to build AGI, or do other things that will speed the development of AGI (like improve PyTorch, or study the neocortex), and they can totally ignore the field of AGI safety, and when they have AGI algorithms they can mess around with them without a care in the world.
It would be very good to convince those people that AGI control is a serious and hard and currently-unsolved (and interesting!) problem, and that p(doom) will remain high (say, >>10%) unless and until we solve it.
I think this is a specific audience that warrants a narrowly-tailored document, e.g. avoiding jargon and addressing the basics very well.
That’s a big part of what I was going for in this post, for example. (And more generally, that whole series.)
Second, there are people who are thoughtful and well-informed about AGI risk in general, but not sold on the “pivotal act” idea. If they had an AGI, they would do things that pattern-match to “cautious scientists doing very careful experiments in a dangerous domain”, but they would not do things that pattern-match to “aggressively and urgently use their new tool to prevent the imminent end of the world, by any means necessary, even if it’s super-illegal and aggressive and somewhat dangerous and everyone will hate them”.
(I’m using “pivotal act” in a slightly broader sense that also includes “giving a human-level AGI autonomy to undergo recursive self-improvement and invent and deploy its own new technology”, since the latter has the same sort of dangerous properties and aggressive feel about it as a proper “pivotal act”.)
(Well, it’s possible that there are people sold on the “pivotal act” idea who wouldn’t say it publicly.)
Last week I did a little exercise of trying to guess p(doom), conditional on the two assumptions in this other comment. I got well over 99%, but I noted with interest that only a minority of my p(doom) was coming from “no one knows how to keep an AGI under control” (which I’m less pessimistic about than Eliezer, heck maybe I’m even as high as 20% that we can keep an AGI under control :-P , and I’m hoping that further research will increase that), whereas a majority of my p(doom) was coming from “there will be cautious responsible actors who will follow the rules and be modest and not do pivotal acts, and there will also be some reckless actors who will create out-of-control omnicidal AGIs”.
So it seems extremely important to figure out whether a “pivotal act” is in fact necessary for a good future. And if it is (a big “if”!), then it likewise seems extremely important to get relevant decisionmaking people on board with that.
I think it would be valuable to have a document narrowly tailored to this topic, finding the cruxes and arguments and counter-arguments etc. For example, I think this is a topic that looks very different in a Paul-Christriano-style future (gradual multipolar takeoff, near-misses, “corrigible AI assistants”, “strategy stealing assumption”, etc.) then in the world that I expect (decisive first-mover advantage).
But I don’t really feel qualified to write anything like that myself, at least not before talking to lots of people, and it also might be the kind of thing that’s better as a conversation than a blog post.
Third, there are people (e.g. leadership at OpenAI & DeepMind) making decisions that trade off between “AGI is invented soon” versus “AGI is invented by us people who are at least trying to avoid catastrophe and be altruistic”. Insofar as I think they’re making bad tradeoffs, I would like to convince them of that.
Again, it would be useful to have a document narrowly tailored to this topic. I’m not planning to write one, but perhaps I’m sorta addressing it indirectly when I share my idiosyncratic models of exactly what technical work I think needs to be done before we can align an AGI.
Fourth, there are people who have engaged with the AGI alignment / safety literature / discourse but are pursuing directions that will not solve the problem. It would be very valuable to spread common knowledge that those approaches are doomed. But if I were going to do that, it would (again) be a separate narrowly-tailored document, perhaps either organized by challenge that the approaches are not up to the task of solving, or organized by research program that I’m criticizing, naming names. I have dabbled in this kind of thing (example), but don’t have any immediate plan to do it more, let alone systematically. I think that would be extremely time-consuming.