It’s way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn’t have the global infrastructure. I see no benefit to public alarm—EA already has plenty of funding.
We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them. Make the plans obviously good ideas and they will probably be persuasive. Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.
Earth does have the global infrastructure, we just don’t have access to it because we have not yet persuaded a critical mass of experts. AWS can just stop anyone from renting GPUs without their code being checked, and beyond that if you can create public consensus via iteration-based refined messaging, you make sure everyone knows the consequences of doing it.
People should absolutely be figuring out prosaic plans, and core alignment researchers probably shouldn’t stop doing their work. However, it’s simply not true that all capable labs (or those that will be capable soon) will even take a meeting with AI safety people, given the current belief environment. E.g. who do you call at BAAI?
It does? What do you mean? The only thing I can think of is the UN, and recent events don’t make it very likely they’d engage in coordinated action on anything.
If you convince the CCP, the US government, and not that many other players that this is really serious, it becomes very difficult to source chips elsewhere.
The CCP and the US government both make their policy decisions based on whatever (a weirdly-sampled subset of) their experts tell them.
Those experts update primarily on their colleagues.
So we just need to get two superpowers who currently feel they are in a zero sum competition with each other to stop trying to advance in an area that gives them a potentially infinite advantage? Seems a very classic case of the kind of coordination problems that are difficult to solve, with high rewards for defecting.
We have, partially managed to do this for nuclear and biological weapons. But only with a massive oversight infrastructure that doesn’t exist for AI. And relying on physical evidence and materials control that doesn’t exist for AI. It’s not impossible, but it would require a similar level of concerted international effort that was used for nuclear weapons. Which took a long time, so possibly doesn’t fit with your short timeline
If we do as well with preventing AGI as we have with nuclear non-proliferation, we fail. And, nuclear non-proliferation has been more effective than some other regimes (chemical weapons, drugs, trade in endangered animals, carbon emissions, etc.). In addition, because of the need for relatively scarce elements, control over nuclear weapons is easier than control over AI.
And, as others have noted the incentives for develpong AI are far stronger than for developing nuclear weapons.
What makes you think we fail if it looks like nukes? If everyone agrees on alignment difficulty and we have few actors, it is not unreasonable for no one to push the button, just like they don’t with MAD.
There are currently nine countries who have deployed nuclear weapons. At least four of those nine are countries that the non-proliferation regime would have preferred to prevent having nuclear weapons.
An equivalent result in AGI would have four entities deploying AGI. (And in the AGI context, the problem is deployment not using the AGI in any particular way.)
Note that 8 of those countries have never used nukes, and all 9 of them if you start after the IAEA was founded.
Most people think if 500 entities had nukes, they would be used more. But with few, MAD can work. AGI doesn’t have MAD, but it has a similar dynamic if you convince everyone of the alignment problem.
But… there isn’t reward for defecting? Like, in a concrete actual sense. The only basis for defection is incomplete information. If people think there is a reward, they’re in some literal sense incorrect, and the truth is ultimately easier to defend. Why not (wisely, concertedly, principledly) defend it?
And there are extremely concrete reasons to create that international effort for oversight (e.g. of compute), given convergence on the truth. The justifications, conditioned on the truth, are at least as great if not greater than the nuclear case.
Reward is not creation of uncontrolled AGI. Reward is creation of powerful not-yet-AGI systems which can drastically accelerate technical, scientific or military progress of country.
It’s pretty huge potential upside, and consequences of other superpower developing such technology can be catastrophic. So countries have both reward for defecting and risk to lose everything if other country defects.
Yes, such “AI race” is very dangerous. But so was nuclear arms race, and countries still did it.
Oh I don’t think anyone is going to be convinced not to build not-yet-AGI.
But it seems totally plausible to convince people not to build systems that they think have a real possibility of killing them, which, again, consequentialists will do because we don’t know how to build an off-switch.
Prevent agglomerations of data center scale compute via supply chain monitoring, do mass expert education, create a massive social stigma (like with human bio experimentation), and I think we buy ourselves a decade easily.
Pushing which button? They’re deploying systems and competing on how capable those systems are. How do they know the systems they’re deploying are safe? How do they define “not-unbounded-utility-maximizers” (and why is it not a solution to the whole alignment problem)? What about your “alignment-pilled” world is different from today’s world, wherein large institutions already prefer not to kill themselves?
Wait, there are lots of things that aren’t unbounded utility maximizers—just because they’re “uncompetitive” doesn’t mean that non-suicidal actors won’t stick to them. AlphaGo isn’t! The standard LessWrong critique is that such systems don’t provide pivotal acts, but the whole point of governance is not to need to rely on pivotal acts.
The difference from this world is that in this world large institutions are largely unaware of alignment failure modes and will thus likely deploy unbounded utility maximizers.
So you have a crisp concept called “unbounded utility maximizer” so that some AI systems are, some AI systems aren’t, and the ones that aren’t are safe. Your plan is to teach everyone where that sharp conceptual boundary is, and then what? Convince them to walk back over the line and stay there?
Do you think your mission is easier or harder than nuclear disarmament?
The alignment problem isn’t a political opinion, it’s a mathematical truth. If they understand it, they can and will want to work the line out for themselves, with the scientific community publicly working to help any who want it.
Nuclear disarmament is hard because if someone else defects you die. But the point here is that if you defect you also die. So the decision matrix on the value of defecting is different, especially if you know other people also know their cost of defection is high.
We actually don’t worry about that that much. Nothing close to the 60s, before the IAEA and second strike capabilities. These days we mostly worry about escalation cycles, i.e. unpredictable responses by counter parties to minor escalations and continuously upping the ante to save face.
There isn’t an obvious equivalent escalation cycle for somebody debating with themselves whether to destroy themselves or not. (The closer we get to alignment, the less true this is, btw.)
It’s way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn’t have the global infrastructure. I see no benefit to public alarm—EA already has plenty of funding.
We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them. Make the plans obviously good ideas and they will probably be persuasive. Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.
Earth does have the global infrastructure, we just don’t have access to it because we have not yet persuaded a critical mass of experts. AWS can just stop anyone from renting GPUs without their code being checked, and beyond that if you can create public consensus via iteration-based refined messaging, you make sure everyone knows the consequences of doing it.
People should absolutely be figuring out prosaic plans, and core alignment researchers probably shouldn’t stop doing their work. However, it’s simply not true that all capable labs (or those that will be capable soon) will even take a meeting with AI safety people, given the current belief environment. E.g. who do you call at BAAI?
It does? What do you mean? The only thing I can think of is the UN, and recent events don’t make it very likely they’d engage in coordinated action on anything.
If you convince the CCP, the US government, and not that many other players that this is really serious, it becomes very difficult to source chips elsewhere.
The CCP and the US government both make their policy decisions based on whatever (a weirdly-sampled subset of) their experts tell them.
Those experts update primarily on their colleagues.
So we just need to get two superpowers who currently feel they are in a zero sum competition with each other to stop trying to advance in an area that gives them a potentially infinite advantage? Seems a very classic case of the kind of coordination problems that are difficult to solve, with high rewards for defecting.
We have, partially managed to do this for nuclear and biological weapons. But only with a massive oversight infrastructure that doesn’t exist for AI. And relying on physical evidence and materials control that doesn’t exist for AI. It’s not impossible, but it would require a similar level of concerted international effort that was used for nuclear weapons. Which took a long time, so possibly doesn’t fit with your short timeline
If we do as well with preventing AGI as we have with nuclear non-proliferation, we fail. And, nuclear non-proliferation has been more effective than some other regimes (chemical weapons, drugs, trade in endangered animals, carbon emissions, etc.). In addition, because of the need for relatively scarce elements, control over nuclear weapons is easier than control over AI.
And, as others have noted the incentives for develpong AI are far stronger than for developing nuclear weapons.
What makes you think we fail if it looks like nukes? If everyone agrees on alignment difficulty and we have few actors, it is not unreasonable for no one to push the button, just like they don’t with MAD.
There are currently nine countries who have deployed nuclear weapons. At least four of those nine are countries that the non-proliferation regime would have preferred to prevent having nuclear weapons.
An equivalent result in AGI would have four entities deploying AGI. (And in the AGI context, the problem is deployment not using the AGI in any particular way.)
Note that 8 of those countries have never used nukes, and all 9 of them if you start after the IAEA was founded.
Most people think if 500 entities had nukes, they would be used more. But with few, MAD can work. AGI doesn’t have MAD, but it has a similar dynamic if you convince everyone of the alignment problem.
But… there isn’t reward for defecting? Like, in a concrete actual sense. The only basis for defection is incomplete information. If people think there is a reward, they’re in some literal sense incorrect, and the truth is ultimately easier to defend. Why not (wisely, concertedly, principledly) defend it?
And there are extremely concrete reasons to create that international effort for oversight (e.g. of compute), given convergence on the truth. The justifications, conditioned on the truth, are at least as great if not greater than the nuclear case.
Reward is not creation of uncontrolled AGI. Reward is creation of powerful not-yet-AGI systems which can drastically accelerate technical, scientific or military progress of country.
It’s pretty huge potential upside, and consequences of other superpower developing such technology can be catastrophic. So countries have both reward for defecting and risk to lose everything if other country defects.
Yes, such “AI race” is very dangerous. But so was nuclear arms race, and countries still did it.
Oh I don’t think anyone is going to be convinced not to build not-yet-AGI.
But it seems totally plausible to convince people not to build systems that they think have a real possibility of killing them, which, again, consequentialists will do because we don’t know how to build an off-switch.
Is there any concrete proposal that meets your specification? “don’t kill yourself with AGI, please”?
Prevent agglomerations of data center scale compute via supply chain monitoring, do mass expert education, create a massive social stigma (like with human bio experimentation), and I think we buy ourselves a decade easily.
How does that distinguish between AGI and not-yet-AGI? How does that prevent an arms race?
An arms race to what? If we alignment-pill the arms-racers, they understand that pushing the button means certain death.
If your point is an arms race on not-unbounded-utility-maximizers, yeah afaict that’s inevitable… but not nearly as bad?
Pushing which button? They’re deploying systems and competing on how capable those systems are. How do they know the systems they’re deploying are safe? How do they define “not-unbounded-utility-maximizers” (and why is it not a solution to the whole alignment problem)? What about your “alignment-pilled” world is different from today’s world, wherein large institutions already prefer not to kill themselves?
Wait, there are lots of things that aren’t unbounded utility maximizers—just because they’re “uncompetitive” doesn’t mean that non-suicidal actors won’t stick to them. AlphaGo isn’t! The standard LessWrong critique is that such systems don’t provide pivotal acts, but the whole point of governance is not to need to rely on pivotal acts.
The difference from this world is that in this world large institutions are largely unaware of alignment failure modes and will thus likely deploy unbounded utility maximizers.
So you have a crisp concept called “unbounded utility maximizer” so that some AI systems are, some AI systems aren’t, and the ones that aren’t are safe. Your plan is to teach everyone where that sharp conceptual boundary is, and then what? Convince them to walk back over the line and stay there?
Do you think your mission is easier or harder than nuclear disarmament?
The alignment problem isn’t a political opinion, it’s a mathematical truth. If they understand it, they can and will want to work the line out for themselves, with the scientific community publicly working to help any who want it.
Nuclear disarmament is hard because if someone else defects you die. But the point here is that if you defect you also die. So the decision matrix on the value of defecting is different, especially if you know other people also know their cost of defection is high.
If you launch the nukes, you also die, and we spend a lot of time worrying about that. Why?
We actually don’t worry about that that much. Nothing close to the 60s, before the IAEA and second strike capabilities. These days we mostly worry about escalation cycles, i.e. unpredictable responses by counter parties to minor escalations and continuously upping the ante to save face.
There isn’t an obvious equivalent escalation cycle for somebody debating with themselves whether to destroy themselves or not. (The closer we get to alignment, the less true this is, btw.)