I agree with large parts of this comment, but am confused by this:
I think you should instead plan on not building such systems as there isn’t a clear reason why you need such systems and they seem super dangerous. That’s not to say that you shouldn’t also do research into aligning such systems, I just think the focus should instead be on measures to avoid needing to build them.
While I don’t endorse it due to disagreeing with some (stated and unstated) premises, I think there’s a locally valid line of reasoning that goes something like this:
if Anthropic finds itself in a world where it’s successfully built not-vastly-superhuman TAI, it seems pretty likely that other actors have also done so, or will do so relatively soon
it is now legible (to those paying attention) that we are in the Acute Risk Period
most other actors who have or will soon have TAI will be less safety-conscious than Anthropic
if nobody ends the Acute Risk Period, it seems pretty likely that one of those actors will do something stupid (like turn over their AI R&D efforts to their unaligned TAI), and then we all die
not-vastly-superhuman TAI will not be sufficient to prevent those actors from doing something stupid that ends the world
unfortunately, it seems like we have no choice but to make sure we’re the first to build superhuman TAI, to make sure the Acute Risk Period has a good ending
This seems like the pretty straightforward argument for racing, and if you have a pretty specific combination of beliefs about alignment difficulty, coordination difficulty, capability profiles, etc, I think it basically checks out.
I don’t know what set of beliefs implies that it’s much more important to avoid building superhuman TAI once you have just-barely TAI, than to avoid building just-barely TAI in the first place. (In particular, how does this end up with the world in a stable equilibrium that doesn’t immediately get knocked over by the second actor to reach TAI?)
I don’t know what set of beliefs implies that it’s much more important to avoid building superhuman TAI once you have just-barely TAI, than to avoid building just-barely TAI in the first place.
AIs which aren’t qualitatively much smarter than humans seem plausible to use reasonably effectively while keeping risk decently low (though still unacceptably risky in objective/absolute terms). Keeping risk low seems to require substantial effort, though it seems maybe achievable. Even with token effort, I think risk is “only” around 25% with such AIs because default methods likely avoid egregious misalignment (perhaps 30% chance of egregious misalignment with token effort and then some chance you get lucky for a 25% chance of risk overall).
Then given this, I have two objections to the story you seem to present:
AIs which aren’t qualitatively smarter than humans seem very useful and with some US government support could suffice to prevent proliferation. (Both greatly reduce the cost of non-proliferation while also substantially increasing willingness to pay with demos etc.)
Plans that don’t involve US government support while building crazy weapons/defense with wildly superhuman AIs involve commiting massive crimes and I think we should have a policy against this.
Another way to put this is that the story for needing much smarter AIs is presumably that you need to build crazy weapons/defenses to defend against someone else’s crazily powerful AI. Building insane weapons/defenses requires US government consent (unless you’re commiting massive crimes which seems like a bad idea). Thus, you might as well go all the way to preventing much smarter AIs from being built (by anyone) for a while which seems possible with some US government support and the use of these human-ish level AIs.
(Responding in a consolidated way just to this comment.)
Ok, got it. I don’t think the US government will be able and willing to coordinate and enforce a worldwide moratorium on superhuman TAI development, if we get to just-barely TAI, at least not without plans that leverage that just-barely TAI in unsafe ways which violate the safety invariants of this plan. It might become more willing than it is now (though I’m not hugely optimistic), but I currently don’t think as an institution it’s capable of executing on that kind of plan and don’t see why that will change in the next five years.
Another way to put this is that the story for needing much smarter AIs is presumably that you need to build crazy weapons/defenses to defend against someone else’s crazily powerful AI.
I think I disagree with the framing (“crazy weapons/defenses”) but it does seem like you need some kind of qualitatively new technology. This could very well be social technology, rather than something more material.
Building insane weapons/defenses requires US government consent (unless you’re commiting massive crimes which seems like a bad idea).
I don’t think this is actually true, except in the trivial sense where we have a legal system that allows the government to decide approximately arbitrary behaviors are post-facto illegal if it feels strongly enough about it. Most new things are not explicitly illegal. But even putting that aside[1], I think this is ignoring the legal routes by which a qualitatively superhuman TAI might find to ending the Acute Risk Period, if it was so motivated.
(A reminder that I am not claiming this is Anthropic’s plan, nor would I endorse someone trying to build ASI to execute on this kind of plan.)
TBC, I don’t think there are plausible alternatives to at least some US government involvement which don’t require commiting a bunch of massive crimes.
I think there’s a very large difference between plans that involve nominal US government signoff on private actors doing things, in order to avoid comitting massive crimes (or to avoid the appearance of doing so), plans that involve the US government mostly just slowing things down or stopping people from doing things, and plans that involve the US government actually being the entity that makes high-context decisions about e.g. what values to to optimize for, given a slot into which to put values.
I agree that stories which require building things that look very obviously like “insane weapons/defenses” seem bad, both for obvious deontological reasons, but also I wouldn’t expect them to work well enough be worth it even under “naive” consequentialist analysis.
if we get to just-barely TAI, at least not without plans that leverage that just-barely TAI in unsafe ways which violate the safety invariants of this plan
I’m basically imagining being able to use controlled AIs which aren’t qualitatively smarter than humans for whatever R&D purposes we want. (Though not applications like (e.g.) using smart AIs to pilot drone armies live.) Some of these applications will be riskier than others, but I think this can be done while managing risk to a moderate degree.
Bootstrapping to some extent should also be possible where you use the first controlled AIs to improve the safety of later deployments (both improving control and possibly alignment).
With (properly motivated) qualitatively wildly superhuman AI, you can end the Acute Risk Period using means which aren’t massive crimes despite not collaborating with the US government. This likely involves novel social technology. More minimally, if you did have a sufficiently aligned AI of this power level, you could just get it to work on ending the Acute Risk Period in a basically legal and non-norms-violating way. (Where e.g. super persuasion would clearly violate norms.)
I think that even having the ability to easily take over the world as a private actor is pretty norms violating. I’m unsure about the claim that if you put this aside, there is a way to end the acute risk period (edit: without US government collaboration and) without needing truly insanely smart AIs. I suppose that if you go smart enough this is possible though pre-existing norms also just get more confusing in the regime where you can steer the world to whatever outcome you want.
So overall, I’m not sure I disagree with this perspective exactly. I think the overriding consideration for me is that this seems like a crazy and risky proposal at multiple levels.
To be clear, you are explicitly not endorsing this as a plan nor claiming this is Anthropic’s plan.
Something like that, though I’m much less sure about “non-norms-violating”, because many possible solutions seem like they’d involve something qualitatively new (and therefore de-facto norm-violating, like nearly all new technology). Maybe a very superhuman TAI could arrange matters such that things just seem to randomly end up going well rather than badly, without introducing any new[1] social or material technology, but that does seem quite a bit harder.
I’m pretty uncertain about, if something like that ended up looking norm-violating, it’d be norm-violating like Uber was[2], or like super-persuasian. That question seems very contingent on empirical questions that I think we don’t have much insight into, right now.
I’m unsure about the claim that if you put this aside, there is a way to end the acute risk period without needing truly insanely smart AIs.
I didn’t mean to make the claim that there’s a way to end the acute risk period without needing truly insanely smart AIs (if you put aside centrally-illegal methods); rather, that an AI would probably need to be relatively low on the “smarter than humans” scale to need to resort to methods that were obviously illegal to end the acute risk period.
My proposal would roughly be that the US government (in collaboration with allies etc) enforces no one building AI which are qualitatively smarter than humans and this should be the default plan.
(This might be doable without government support via coordination between multiple labs, but I basically doubt it.)
Their could be multiple AI projects backed by the US+allies or just one, either could be workable in principle, though multiple seems tricky.
TBC, I don’t think there are plausible alternatives to at least some US government involvement which don’t require commiting a bunch of massive crimes.
I have a policy against commiting or recommending commiting massive crimes.
I agree with large parts of this comment, but am confused by this:
While I don’t endorse it due to disagreeing with some (stated and unstated) premises, I think there’s a locally valid line of reasoning that goes something like this:
if Anthropic finds itself in a world where it’s successfully built not-vastly-superhuman TAI, it seems pretty likely that other actors have also done so, or will do so relatively soon
it is now legible (to those paying attention) that we are in the Acute Risk Period
most other actors who have or will soon have TAI will be less safety-conscious than Anthropic
if nobody ends the Acute Risk Period, it seems pretty likely that one of those actors will do something stupid (like turn over their AI R&D efforts to their unaligned TAI), and then we all die
not-vastly-superhuman TAI will not be sufficient to prevent those actors from doing something stupid that ends the world
unfortunately, it seems like we have no choice but to make sure we’re the first to build superhuman TAI, to make sure the Acute Risk Period has a good ending
This seems like the pretty straightforward argument for racing, and if you have a pretty specific combination of beliefs about alignment difficulty, coordination difficulty, capability profiles, etc, I think it basically checks out.
I don’t know what set of beliefs implies that it’s much more important to avoid building superhuman TAI once you have just-barely TAI, than to avoid building just-barely TAI in the first place. (In particular, how does this end up with the world in a stable equilibrium that doesn’t immediately get knocked over by the second actor to reach TAI?)
AIs which aren’t qualitatively much smarter than humans seem plausible to use reasonably effectively while keeping risk decently low (though still unacceptably risky in objective/absolute terms). Keeping risk low seems to require substantial effort, though it seems maybe achievable. Even with token effort, I think risk is “only” around 25% with such AIs because default methods likely avoid egregious misalignment (perhaps 30% chance of egregious misalignment with token effort and then some chance you get lucky for a 25% chance of risk overall).
Then given this, I have two objections to the story you seem to present:
AIs which aren’t qualitatively smarter than humans seem very useful and with some US government support could suffice to prevent proliferation. (Both greatly reduce the cost of non-proliferation while also substantially increasing willingness to pay with demos etc.)
Plans that don’t involve US government support while building crazy weapons/defense with wildly superhuman AIs involve commiting massive crimes and I think we should have a policy against this.
Another way to put this is that the story for needing much smarter AIs is presumably that you need to build crazy weapons/defenses to defend against someone else’s crazily powerful AI. Building insane weapons/defenses requires US government consent (unless you’re commiting massive crimes which seems like a bad idea). Thus, you might as well go all the way to preventing much smarter AIs from being built (by anyone) for a while which seems possible with some US government support and the use of these human-ish level AIs.
(Responding in a consolidated way just to this comment.)
Ok, got it. I don’t think the US government will be able and willing to coordinate and enforce a worldwide moratorium on superhuman TAI development, if we get to just-barely TAI, at least not without plans that leverage that just-barely TAI in unsafe ways which violate the safety invariants of this plan. It might become more willing than it is now (though I’m not hugely optimistic), but I currently don’t think as an institution it’s capable of executing on that kind of plan and don’t see why that will change in the next five years.
I think I disagree with the framing (“crazy weapons/defenses”) but it does seem like you need some kind of qualitatively new technology. This could very well be social technology, rather than something more material.
I don’t think this is actually true, except in the trivial sense where we have a legal system that allows the government to decide approximately arbitrary behaviors are post-facto illegal if it feels strongly enough about it. Most new things are not explicitly illegal. But even putting that aside[1], I think this is ignoring the legal routes by which a qualitatively superhuman TAI might find to ending the Acute Risk Period, if it was so motivated.
(A reminder that I am not claiming this is Anthropic’s plan, nor would I endorse someone trying to build ASI to execute on this kind of plan.)
I think there’s a very large difference between plans that involve nominal US government signoff on private actors doing things, in order to avoid comitting massive crimes (or to avoid the appearance of doing so), plans that involve the US government mostly just slowing things down or stopping people from doing things, and plans that involve the US government actually being the entity that makes high-context decisions about e.g. what values to to optimize for, given a slot into which to put values.
I agree that stories which require building things that look very obviously like “insane weapons/defenses” seem bad, both for obvious deontological reasons, but also I wouldn’t expect them to work well enough be worth it even under “naive” consequentialist analysis.
I’m basically imagining being able to use controlled AIs which aren’t qualitatively smarter than humans for whatever R&D purposes we want. (Though not applications like (e.g.) using smart AIs to pilot drone armies live.) Some of these applications will be riskier than others, but I think this can be done while managing risk to a moderate degree.
Bootstrapping to some extent should also be possible where you use the first controlled AIs to improve the safety of later deployments (both improving control and possibly alignment).
Is your perspective something like:
I think that even having the ability to easily take over the world as a private actor is pretty norms violating. I’m unsure about the claim that if you put this aside, there is a way to end the acute risk period (edit: without US government collaboration and) without needing truly insanely smart AIs. I suppose that if you go smart enough this is possible though pre-existing norms also just get more confusing in the regime where you can steer the world to whatever outcome you want.
So overall, I’m not sure I disagree with this perspective exactly. I think the overriding consideration for me is that this seems like a crazy and risky proposal at multiple levels.
To be clear, you are explicitly not endorsing this as a plan nor claiming this is Anthropic’s plan.
Something like that, though I’m much less sure about “non-norms-violating”, because many possible solutions seem like they’d involve something qualitatively new (and therefore de-facto norm-violating, like nearly all new technology). Maybe a very superhuman TAI could arrange matters such that things just seem to randomly end up going well rather than badly, without introducing any new[1] social or material technology, but that does seem quite a bit harder.
I’m pretty uncertain about, if something like that ended up looking norm-violating, it’d be norm-violating like Uber was[2], or like super-persuasian. That question seems very contingent on empirical questions that I think we don’t have much insight into, right now.
I didn’t mean to make the claim that there’s a way to end the acute risk period without needing truly insanely smart AIs (if you put aside centrally-illegal methods); rather, that an AI would probably need to be relatively low on the “smarter than humans” scale to need to resort to methods that were obviously illegal to end the acute risk period.
In ways that are obvious to humans.
Minus the part where Uber was pretty obviously illegal in many places where it operated.
My proposal would roughly be that the US government (in collaboration with allies etc) enforces no one building AI which are qualitatively smarter than humans and this should be the default plan.
(This might be doable without government support via coordination between multiple labs, but I basically doubt it.)
Their could be multiple AI projects backed by the US+allies or just one, either could be workable in principle, though multiple seems tricky.
TBC, I don’t think there are plausible alternatives to at least some US government involvement which don’t require commiting a bunch of massive crimes.
I have a policy against commiting or recommending commiting massive crimes.