I do have a lot of reservations about Leopold’s plan. But one positive feature it does have, it proposes to rely on a multitude of “limited weakly-superhuman artificial alignment researchers” and makes a reasonable case that those can be obtained in a form factor which is alignable and controllable. So his plan does seem to have a good chance to overcome the factor that AI existential safety research is a
field that has not been particularly productive or fast in the past
and also to overcome other factors requiring overreliance on humans and on current human ability.
I do have a lot of reservations about the “prohibition plan” as well. One of those reservations is as follows. You and Leopold seem to share the assumption that huge GPU farms or equivalently strong compute are necessary for superintelligence. Surely, having huge GPU farms is the path of least resistance, those farms facilitate fast advances, and while this path is relatively open, people and orgs will mostly choose it, and can be partially controlled via their reliance on that path (one can impose various compliance requirements and such).
But what would happen if one effectively closes that path? There will be huge selection pressure to look for alternative routes, to invest more heavily in those algorithmic breakthroughs which can work with modest GPU power or even with CPUs. When one thinks about this kind of prohibition, one tends to look at the relatively successful history of control over nuclear proliferation, but the reality might end up looking more like our drug war (ultimately unsuccessful, bringing many drugs outside government regulations, and resulting in both more dangerous and, in a number of cases, also more potent drugs).
I am sure that a strong prohibition attempt would buy us some time, but I am not sure it would reduce the overall risk. The resulting situation, when a half of AI practitioners would find themselves in the opposition to the resulting “new world order” and would be looking for various opportunities to circumvent the prohibition, while at the same time the mainstream imposing the prohibition is presumably not arming itself with those next generations of stronger and stronger AI systems (if we are really talking about full moratorium), does not look promising in the long run (I would expect that the opposition would eventually succeed at building prohibited systems and will use them to upend the world order they dislike, while perhaps running higher level of existential risk because of the lack of regulation and coordination).
I hope people will step back from solely focusing on advocating for policy-level prescriptions (as none of the existing policy-level prescriptions look particularly promising at the moment) and invest some of their time in continuing object-level discussions of AI existential safety without predefined political ends.
I don’t think we have discussed the object-level of AI existential safety nearly enough. There might be overlooked approaches and overlooked ways of thinking, and if we split into groups such that each of those groups has firmly made up its mind about its favored presumably optimal set of policy-level prescriptions and about assumptions underlying those policy-level prescriptions, we are unlikely to make much progress on the object-level.
It probably should be a mixture of public and private discussions (it might be easier to talk frankly in more private settings these days for a number of reasons).
one positive feature it does have, it proposes to rely on a multitude of “limited weakly-superhuman artificial alignment researchers” and makes a reasonable case that those can be obtained in a form factor which is alignable and controllable.
I don’t find this convincing. I think the target “dumb enough to be safe, honest, trustworthy, relatively non-agentic, etc., but smart enough to be super helpful for alignment” is narrow (or just nonexistent, using the methods we’re likely to have on hand).
Even if this exists, verification seems extraordinarily difficult: how do we know that the system is being honest? Separately, how do we verify that its solutions are correct? Checking answers is sometimes easier than generating them, but only to a limited degree, and alignment seems like a case where checking is particularly difficult.
You and Leopold seem to share the assumption that huge GPU farms or equivalently strong compute are necessary for superintelligence.
Nope! I don’t assume that.
I do think that it’s likely the first world-endangering AI is trained using more compute than was used to train GPT-4; but I’m certainly not confident of that prediction, and I don’t think it’s possible to make reasonable predictions (given our current knowledge state) about how much more compute might be needed.
(“Needed” for the first world-endangeringly powerful AI humans actually build, that is. I feel confident that you can in principle build world-endangeringly powerful AI with far less compute than was used to train GPT-4; but the first lethally powerful AI systems humans actually build will presumably be far from the limits of what’s physically possible!)
But what would happen if one effectively closes that path? There will be huge selection pressure to look for alternative routes, to invest more heavily in those algorithmic breakthroughs which can work with modest GPU power or even with CPUs.
Agreed. This is why I support humanity working on things like human enhancement and (plausibly) AI alignment, in parallel with working on an international AI development pause. I don’t think that a pause on its own is a permanent solution, though if we’re lucky and the laws are well-designed I imagine it could buy humanity quite a few decades.
I hope people will step back from solely focusing on advocating for policy-level prescriptions (as none of the existing policy-level prescriptions look particularly promising at the moment) and invest some of their time in continuing object-level discussions of AI existential safety without predefined political ends.
FWIW, MIRI does already think of “generally spreading reasonable discussion of the problem, and trying to increase the probability that someone comes up with some new promising idea for addressing x-risk” as a top organizational priority.
The usual internal framing is some version of “we have our own current best guess at how to save the world, but our idea is a massive longshot, and not the sort of basket humanity should put all its eggs in”. I think “AI pause + some form of cognitive enhancement” should be a top priority, but I also consider it a top priority for humanity to try to find other potential paths to a good future.
I don’t find this convincing. I think the target “dumb enough to be safe, honest, trustworthy, relatively non-agentic, etc., but smart enough to be super helpful for alignment” is narrow (or just nonexistent, using the methods we’re likely to have on hand).
Even if this exists, verification seems extraordinarily difficult: how do we know that the system is being honest? Separately, how do we verify that its solutions are correct? Checking answers is sometimes easier than generating them, but only to a limited degree, and alignment seems like a case where checking is particularly difficult.
It’s also important to keep in mind that on Leopold’s model (and my own), these problems need to be solved under a ton of time pressure. To maintain a lead, the USG in Leopold’s scenario will often need to figure out some of these “under what circumstances can we trust this highly novel system and believe its alignment answers?” issues in a matter of weeks or perhaps months, so that the overall alignment project can complete in a very short window of time. This is not a situation where we’re imagining having a ton of time to develop mastery and deep understanding of these new models. (Or mastery of the alignment problem sufficient to verify when a new idea is on the right track or not.)
I do have a lot of reservations about Leopold’s plan. But one positive feature it does have, it proposes to rely on a multitude of “limited weakly-superhuman artificial alignment researchers” and makes a reasonable case that those can be obtained in a form factor which is alignable and controllable. So his plan does seem to have a good chance to overcome the factor that AI existential safety research is a
and also to overcome other factors requiring overreliance on humans and on current human ability.
I do have a lot of reservations about the “prohibition plan” as well. One of those reservations is as follows. You and Leopold seem to share the assumption that huge GPU farms or equivalently strong compute are necessary for superintelligence. Surely, having huge GPU farms is the path of least resistance, those farms facilitate fast advances, and while this path is relatively open, people and orgs will mostly choose it, and can be partially controlled via their reliance on that path (one can impose various compliance requirements and such).
But what would happen if one effectively closes that path? There will be huge selection pressure to look for alternative routes, to invest more heavily in those algorithmic breakthroughs which can work with modest GPU power or even with CPUs. When one thinks about this kind of prohibition, one tends to look at the relatively successful history of control over nuclear proliferation, but the reality might end up looking more like our drug war (ultimately unsuccessful, bringing many drugs outside government regulations, and resulting in both more dangerous and, in a number of cases, also more potent drugs).
I am sure that a strong prohibition attempt would buy us some time, but I am not sure it would reduce the overall risk. The resulting situation, when a half of AI practitioners would find themselves in the opposition to the resulting “new world order” and would be looking for various opportunities to circumvent the prohibition, while at the same time the mainstream imposing the prohibition is presumably not arming itself with those next generations of stronger and stronger AI systems (if we are really talking about full moratorium), does not look promising in the long run (I would expect that the opposition would eventually succeed at building prohibited systems and will use them to upend the world order they dislike, while perhaps running higher level of existential risk because of the lack of regulation and coordination).
I hope people will step back from solely focusing on advocating for policy-level prescriptions (as none of the existing policy-level prescriptions look particularly promising at the moment) and invest some of their time in continuing object-level discussions of AI existential safety without predefined political ends.
I don’t think we have discussed the object-level of AI existential safety nearly enough. There might be overlooked approaches and overlooked ways of thinking, and if we split into groups such that each of those groups has firmly made up its mind about its favored presumably optimal set of policy-level prescriptions and about assumptions underlying those policy-level prescriptions, we are unlikely to make much progress on the object-level.
It probably should be a mixture of public and private discussions (it might be easier to talk frankly in more private settings these days for a number of reasons).
I don’t find this convincing. I think the target “dumb enough to be safe, honest, trustworthy, relatively non-agentic, etc., but smart enough to be super helpful for alignment” is narrow (or just nonexistent, using the methods we’re likely to have on hand).
Even if this exists, verification seems extraordinarily difficult: how do we know that the system is being honest? Separately, how do we verify that its solutions are correct? Checking answers is sometimes easier than generating them, but only to a limited degree, and alignment seems like a case where checking is particularly difficult.
Nope! I don’t assume that.
I do think that it’s likely the first world-endangering AI is trained using more compute than was used to train GPT-4; but I’m certainly not confident of that prediction, and I don’t think it’s possible to make reasonable predictions (given our current knowledge state) about how much more compute might be needed.
(“Needed” for the first world-endangeringly powerful AI humans actually build, that is. I feel confident that you can in principle build world-endangeringly powerful AI with far less compute than was used to train GPT-4; but the first lethally powerful AI systems humans actually build will presumably be far from the limits of what’s physically possible!)
Agreed. This is why I support humanity working on things like human enhancement and (plausibly) AI alignment, in parallel with working on an international AI development pause. I don’t think that a pause on its own is a permanent solution, though if we’re lucky and the laws are well-designed I imagine it could buy humanity quite a few decades.
FWIW, MIRI does already think of “generally spreading reasonable discussion of the problem, and trying to increase the probability that someone comes up with some new promising idea for addressing x-risk” as a top organizational priority.
The usual internal framing is some version of “we have our own current best guess at how to save the world, but our idea is a massive longshot, and not the sort of basket humanity should put all its eggs in”. I think “AI pause + some form of cognitive enhancement” should be a top priority, but I also consider it a top priority for humanity to try to find other potential paths to a good future.
It’s also important to keep in mind that on Leopold’s model (and my own), these problems need to be solved under a ton of time pressure. To maintain a lead, the USG in Leopold’s scenario will often need to figure out some of these “under what circumstances can we trust this highly novel system and believe its alignment answers?” issues in a matter of weeks or perhaps months, so that the overall alignment project can complete in a very short window of time. This is not a situation where we’re imagining having a ton of time to develop mastery and deep understanding of these new models. (Or mastery of the alignment problem sufficient to verify when a new idea is on the right track or not.)