Oh I don’t think anyone is going to be convinced not to build not-yet-AGI.
But it seems totally plausible to convince people not to build systems that they think have a real possibility of killing them, which, again, consequentialists will do because we don’t know how to build an off-switch.
Prevent agglomerations of data center scale compute via supply chain monitoring, do mass expert education, create a massive social stigma (like with human bio experimentation), and I think we buy ourselves a decade easily.
Pushing which button? They’re deploying systems and competing on how capable those systems are. How do they know the systems they’re deploying are safe? How do they define “not-unbounded-utility-maximizers” (and why is it not a solution to the whole alignment problem)? What about your “alignment-pilled” world is different from today’s world, wherein large institutions already prefer not to kill themselves?
Wait, there are lots of things that aren’t unbounded utility maximizers—just because they’re “uncompetitive” doesn’t mean that non-suicidal actors won’t stick to them. AlphaGo isn’t! The standard LessWrong critique is that such systems don’t provide pivotal acts, but the whole point of governance is not to need to rely on pivotal acts.
The difference from this world is that in this world large institutions are largely unaware of alignment failure modes and will thus likely deploy unbounded utility maximizers.
So you have a crisp concept called “unbounded utility maximizer” so that some AI systems are, some AI systems aren’t, and the ones that aren’t are safe. Your plan is to teach everyone where that sharp conceptual boundary is, and then what? Convince them to walk back over the line and stay there?
Do you think your mission is easier or harder than nuclear disarmament?
The alignment problem isn’t a political opinion, it’s a mathematical truth. If they understand it, they can and will want to work the line out for themselves, with the scientific community publicly working to help any who want it.
Nuclear disarmament is hard because if someone else defects you die. But the point here is that if you defect you also die. So the decision matrix on the value of defecting is different, especially if you know other people also know their cost of defection is high.
We actually don’t worry about that that much. Nothing close to the 60s, before the IAEA and second strike capabilities. These days we mostly worry about escalation cycles, i.e. unpredictable responses by counter parties to minor escalations and continuously upping the ante to save face.
There isn’t an obvious equivalent escalation cycle for somebody debating with themselves whether to destroy themselves or not. (The closer we get to alignment, the less true this is, btw.)
Oh I don’t think anyone is going to be convinced not to build not-yet-AGI.
But it seems totally plausible to convince people not to build systems that they think have a real possibility of killing them, which, again, consequentialists will do because we don’t know how to build an off-switch.
Is there any concrete proposal that meets your specification? “don’t kill yourself with AGI, please”?
Prevent agglomerations of data center scale compute via supply chain monitoring, do mass expert education, create a massive social stigma (like with human bio experimentation), and I think we buy ourselves a decade easily.
How does that distinguish between AGI and not-yet-AGI? How does that prevent an arms race?
An arms race to what? If we alignment-pill the arms-racers, they understand that pushing the button means certain death.
If your point is an arms race on not-unbounded-utility-maximizers, yeah afaict that’s inevitable… but not nearly as bad?
Pushing which button? They’re deploying systems and competing on how capable those systems are. How do they know the systems they’re deploying are safe? How do they define “not-unbounded-utility-maximizers” (and why is it not a solution to the whole alignment problem)? What about your “alignment-pilled” world is different from today’s world, wherein large institutions already prefer not to kill themselves?
Wait, there are lots of things that aren’t unbounded utility maximizers—just because they’re “uncompetitive” doesn’t mean that non-suicidal actors won’t stick to them. AlphaGo isn’t! The standard LessWrong critique is that such systems don’t provide pivotal acts, but the whole point of governance is not to need to rely on pivotal acts.
The difference from this world is that in this world large institutions are largely unaware of alignment failure modes and will thus likely deploy unbounded utility maximizers.
So you have a crisp concept called “unbounded utility maximizer” so that some AI systems are, some AI systems aren’t, and the ones that aren’t are safe. Your plan is to teach everyone where that sharp conceptual boundary is, and then what? Convince them to walk back over the line and stay there?
Do you think your mission is easier or harder than nuclear disarmament?
The alignment problem isn’t a political opinion, it’s a mathematical truth. If they understand it, they can and will want to work the line out for themselves, with the scientific community publicly working to help any who want it.
Nuclear disarmament is hard because if someone else defects you die. But the point here is that if you defect you also die. So the decision matrix on the value of defecting is different, especially if you know other people also know their cost of defection is high.
If you launch the nukes, you also die, and we spend a lot of time worrying about that. Why?
We actually don’t worry about that that much. Nothing close to the 60s, before the IAEA and second strike capabilities. These days we mostly worry about escalation cycles, i.e. unpredictable responses by counter parties to minor escalations and continuously upping the ante to save face.
There isn’t an obvious equivalent escalation cycle for somebody debating with themselves whether to destroy themselves or not. (The closer we get to alignment, the less true this is, btw.)