(Responding in a consolidated way just to this comment.)
Ok, got it. I don’t think the US government will be able and willing to coordinate and enforce a worldwide moratorium on superhuman TAI development, if we get to just-barely TAI, at least not without plans that leverage that just-barely TAI in unsafe ways which violate the safety invariants of this plan. It might become more willing than it is now (though I’m not hugely optimistic), but I currently don’t think as an institution it’s capable of executing on that kind of plan and don’t see why that will change in the next five years.
Another way to put this is that the story for needing much smarter AIs is presumably that you need to build crazy weapons/defenses to defend against someone else’s crazily powerful AI.
I think I disagree with the framing (“crazy weapons/defenses”) but it does seem like you need some kind of qualitatively new technology. This could very well be social technology, rather than something more material.
Building insane weapons/defenses requires US government consent (unless you’re commiting massive crimes which seems like a bad idea).
I don’t think this is actually true, except in the trivial sense where we have a legal system that allows the government to decide approximately arbitrary behaviors are post-facto illegal if it feels strongly enough about it. Most new things are not explicitly illegal. But even putting that aside[1], I think this is ignoring the legal routes by which a qualitatively superhuman TAI might find to ending the Acute Risk Period, if it was so motivated.
(A reminder that I am not claiming this is Anthropic’s plan, nor would I endorse someone trying to build ASI to execute on this kind of plan.)
TBC, I don’t think there are plausible alternatives to at least some US government involvement which don’t require commiting a bunch of massive crimes.
I think there’s a very large difference between plans that involve nominal US government signoff on private actors doing things, in order to avoid comitting massive crimes (or to avoid the appearance of doing so), plans that involve the US government mostly just slowing things down or stopping people from doing things, and plans that involve the US government actually being the entity that makes high-context decisions about e.g. what values to to optimize for, given a slot into which to put values.
I agree that stories which require building things that look very obviously like “insane weapons/defenses” seem bad, both for obvious deontological reasons, but also I wouldn’t expect them to work well enough be worth it even under “naive” consequentialist analysis.
if we get to just-barely TAI, at least not without plans that leverage that just-barely TAI in unsafe ways which violate the safety invariants of this plan
I’m basically imagining being able to use controlled AIs which aren’t qualitatively smarter than humans for whatever R&D purposes we want. (Though not applications like (e.g.) using smart AIs to pilot drone armies live.) Some of these applications will be riskier than others, but I think this can be done while managing risk to a moderate degree.
Bootstrapping to some extent should also be possible where you use the first controlled AIs to improve the safety of later deployments (both improving control and possibly alignment).
With (properly motivated) qualitatively wildly superhuman AI, you can end the Acute Risk Period using means which aren’t massive crimes despite not collaborating with the US government. This likely involves novel social technology. More minimally, if you did have a sufficiently aligned AI of this power level, you could just get it to work on ending the Acute Risk Period in a basically legal and non-norms-violating way. (Where e.g. super persuasion would clearly violate norms.)
I think that even having the ability to easily take over the world as a private actor is pretty norms violating. I’m unsure about the claim that if you put this aside, there is a way to end the acute risk period (edit: without US government collaboration and) without needing truly insanely smart AIs. I suppose that if you go smart enough this is possible though pre-existing norms also just get more confusing in the regime where you can steer the world to whatever outcome you want.
So overall, I’m not sure I disagree with this perspective exactly. I think the overriding consideration for me is that this seems like a crazy and risky proposal at multiple levels.
To be clear, you are explicitly not endorsing this as a plan nor claiming this is Anthropic’s plan.
Something like that, though I’m much less sure about “non-norms-violating”, because many possible solutions seem like they’d involve something qualitatively new (and therefore de-facto norm-violating, like nearly all new technology). Maybe a very superhuman TAI could arrange matters such that things just seem to randomly end up going well rather than badly, without introducing any new[1] social or material technology, but that does seem quite a bit harder.
I’m pretty uncertain about, if something like that ended up looking norm-violating, it’d be norm-violating like Uber was[2], or like super-persuasian. That question seems very contingent on empirical questions that I think we don’t have much insight into, right now.
I’m unsure about the claim that if you put this aside, there is a way to end the acute risk period without needing truly insanely smart AIs.
I didn’t mean to make the claim that there’s a way to end the acute risk period without needing truly insanely smart AIs (if you put aside centrally-illegal methods); rather, that an AI would probably need to be relatively low on the “smarter than humans” scale to need to resort to methods that were obviously illegal to end the acute risk period.
(Responding in a consolidated way just to this comment.)
Ok, got it. I don’t think the US government will be able and willing to coordinate and enforce a worldwide moratorium on superhuman TAI development, if we get to just-barely TAI, at least not without plans that leverage that just-barely TAI in unsafe ways which violate the safety invariants of this plan. It might become more willing than it is now (though I’m not hugely optimistic), but I currently don’t think as an institution it’s capable of executing on that kind of plan and don’t see why that will change in the next five years.
I think I disagree with the framing (“crazy weapons/defenses”) but it does seem like you need some kind of qualitatively new technology. This could very well be social technology, rather than something more material.
I don’t think this is actually true, except in the trivial sense where we have a legal system that allows the government to decide approximately arbitrary behaviors are post-facto illegal if it feels strongly enough about it. Most new things are not explicitly illegal. But even putting that aside[1], I think this is ignoring the legal routes by which a qualitatively superhuman TAI might find to ending the Acute Risk Period, if it was so motivated.
(A reminder that I am not claiming this is Anthropic’s plan, nor would I endorse someone trying to build ASI to execute on this kind of plan.)
I think there’s a very large difference between plans that involve nominal US government signoff on private actors doing things, in order to avoid comitting massive crimes (or to avoid the appearance of doing so), plans that involve the US government mostly just slowing things down or stopping people from doing things, and plans that involve the US government actually being the entity that makes high-context decisions about e.g. what values to to optimize for, given a slot into which to put values.
I agree that stories which require building things that look very obviously like “insane weapons/defenses” seem bad, both for obvious deontological reasons, but also I wouldn’t expect them to work well enough be worth it even under “naive” consequentialist analysis.
I’m basically imagining being able to use controlled AIs which aren’t qualitatively smarter than humans for whatever R&D purposes we want. (Though not applications like (e.g.) using smart AIs to pilot drone armies live.) Some of these applications will be riskier than others, but I think this can be done while managing risk to a moderate degree.
Bootstrapping to some extent should also be possible where you use the first controlled AIs to improve the safety of later deployments (both improving control and possibly alignment).
Is your perspective something like:
I think that even having the ability to easily take over the world as a private actor is pretty norms violating. I’m unsure about the claim that if you put this aside, there is a way to end the acute risk period (edit: without US government collaboration and) without needing truly insanely smart AIs. I suppose that if you go smart enough this is possible though pre-existing norms also just get more confusing in the regime where you can steer the world to whatever outcome you want.
So overall, I’m not sure I disagree with this perspective exactly. I think the overriding consideration for me is that this seems like a crazy and risky proposal at multiple levels.
To be clear, you are explicitly not endorsing this as a plan nor claiming this is Anthropic’s plan.
Something like that, though I’m much less sure about “non-norms-violating”, because many possible solutions seem like they’d involve something qualitatively new (and therefore de-facto norm-violating, like nearly all new technology). Maybe a very superhuman TAI could arrange matters such that things just seem to randomly end up going well rather than badly, without introducing any new[1] social or material technology, but that does seem quite a bit harder.
I’m pretty uncertain about, if something like that ended up looking norm-violating, it’d be norm-violating like Uber was[2], or like super-persuasian. That question seems very contingent on empirical questions that I think we don’t have much insight into, right now.
I didn’t mean to make the claim that there’s a way to end the acute risk period without needing truly insanely smart AIs (if you put aside centrally-illegal methods); rather, that an AI would probably need to be relatively low on the “smarter than humans” scale to need to resort to methods that were obviously illegal to end the acute risk period.
In ways that are obvious to humans.
Minus the part where Uber was pretty obviously illegal in many places where it operated.