[I agree with most of this, and think it’s a very useful comment; just pointing out disagreements]
For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk <5% (partially by stopping in worlds where this appears needed).
I assume this would be a crux with Connor/Gabe (and I think I’m at least much less confident in this than you appear to be).
We’re already in a world where stopping appears necessary.
It’s entirely possible we all die before stopping was clearly necessary.
What gives you confidence that RSPs would actually trigger a pause?
If a lab is stopping for reasons that aren’t based on objective conditions in an RSP, then what did the RSP achieve?
Absent objective tests that everyone has signed up for, a lab may well not stop, since there’ll always be the argument “Well we think that the danger is somewhat high, but it doesn’t help if only we pause”.
It’s far from clear that we’ll get objective and sufficient conditions for safety (or even for low risk). I don’t expect us to—though it’d obviously be nice to be wrong.
[EDIT: or rather, ones that allow scaling to continue safely—we already know sufficient conditions for safety: stopping]
Calling something a “pragmatic middle ground” doesn’t imply that there aren’t better options
I think the objection here is more about what is loosely suggested by the language used, and what is not said—not about logical implications. What is loosely suggested by the ARC Evals language is that it’s not sensible to aim for the more “extreme” end of things (pausing), and that this isn’t worthy of argument.
Perhaps ARC Evals have a great argument , but they don’t make one. I think it’s fair to say that they argue the middle ground is practical. I don’t think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic.
It’s not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn’t think it was the right place for such an argument, then it’d be easy to say that: that this is a complex question, that it’s unclear this course is best, and that RSPs vs Pause vs … deserves a lot more analysis.
The post presupposes a communication/advocacy norm and states violations of this norm should be labeled “lying”. I’m not sure I’m sold on this communication norm in the first place.
I’d agree with that, but I do think that in this case it’d be useful for people/orgs to state both a [here’s what we’d like ideally] and a [here’s what we’re currently pushing for]. I can imagine many cases where this wouldn’t hold, but I don’t see the argument here. If there is an argument, I’d like to hear it! (fine if it’s conditional on not being communicated further)
Thanks for the response, one quick clarification in case this isn’t clear.
On:
For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk to <5% (partially by stopping in worlds where this appears needed).
I assume this would be a crux with Connor/Gabe (and I think I’m at least much less confident in this than you appear to be).
It’s worth noting here that I’m responding to this passage from the text:
In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.
Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.
I’m responding to the “many people believe this” which I think implies that the groups they are critiquing believe this. I want to contest what these people believe, not what is actually true.
Like many of therse people think policy interventions other than pause reduce X-risk below 10%.
Maybe I think something like (numbers not well considered):
P(doom) = 35%
P(doom | scaling pause by executive order in 2024) = 25%
P(doom | good version of regulatory agency doing something like RSP and safety arguments passed into law in 2024) = 5% (depends a ton on details and political buy in!!!)
P(doom | full and strong international coordination around pausing all AI related progress for 10+ years which starts by pausing hardware progress and current manufacturing) = 3%
Note that these numbers take into account evidential updates (e.g., probably other good stuff is happening if we have super strong internation coordination around pausing AI).
Agreed that the post is at the very least not clear. In particular, it’s obviously not true that [if we don’t stop today, there’s more than a 10% chance we all die], and I don’t think [if we neverstop, under any circumstances...] is a case many people would be considering at all.
It’d make sense to be much clearer on the ‘this’ that “many people believe”.
Calling something a “pragmatic middle ground” doesn’t imply that there aren’t better options
I think the objection here is more about what is loosely suggested by the language used, and what is not said—not about logical implications. What is loosely suggested by the ARC Evals language is that it’s not sensible to aim for the more “extreme” end of things (pausing), and that this isn’t worthy of argument.
Perhaps ARC Evals have a great argument , but they don’t make one. I think it’s fair to say that they argue the middle ground is practical. I don’t think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic.
It’s not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn’t think it was the right place for such an argument, then it’d be easy to say that: that this is a complex question, that it’s unclear this course is best, and that RSPs vs Pause vs … deserves a lot more analysis.
Yeah, I probably want to walk back my claim a bit. Maybe I want to say “doesn’t strongly imply”?
It would have been better if ARC evals noted that the conclusion isn’t entirely obvious. It doesn’t seem like a huge error to me, but maybe I’m underestimating the ripple effects etc.
[I agree with most of this, and think it’s a very useful comment; just pointing out disagreements]
I assume this would be a crux with Connor/Gabe (and I think I’m at least much less confident in this than you appear to be).
We’re already in a world where stopping appears necessary.
It’s entirely possible we all die before stopping was clearly necessary.
What gives you confidence that RSPs would actually trigger a pause?
If a lab is stopping for reasons that aren’t based on objective conditions in an RSP, then what did the RSP achieve?
Absent objective tests that everyone has signed up for, a lab may well not stop, since there’ll always be the argument “Well we think that the danger is somewhat high, but it doesn’t help if only we pause”.
It’s far from clear that we’ll get objective and sufficient conditions for safety (or even for low risk). I don’t expect us to—though it’d obviously be nice to be wrong.
[EDIT: or rather, ones that allow scaling to continue safely—we already know sufficient conditions for safety: stopping]
I think the objection here is more about what is loosely suggested by the language used, and what is not said—not about logical implications. What is loosely suggested by the ARC Evals language is that it’s not sensible to aim for the more “extreme” end of things (pausing), and that this isn’t worthy of argument.
Perhaps ARC Evals have a great argument , but they don’t make one. I think it’s fair to say that they argue the middle ground is practical. I don’t think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic.
It’s not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn’t think it was the right place for such an argument, then it’d be easy to say that: that this is a complex question, that it’s unclear this course is best, and that RSPs vs Pause vs … deserves a lot more analysis.
I’d agree with that, but I do think that in this case it’d be useful for people/orgs to state both a [here’s what we’d like ideally] and a [here’s what we’re currently pushing for]. I can imagine many cases where this wouldn’t hold, but I don’t see the argument here. If there is an argument, I’d like to hear it! (fine if it’s conditional on not being communicated further)
Thanks for the response, one quick clarification in case this isn’t clear.
On:
It’s worth noting here that I’m responding to this passage from the text:
I’m responding to the “many people believe this” which I think implies that the groups they are critiquing believe this. I want to contest what these people believe, not what is actually true.
Like many of therse people think policy interventions other than pause reduce X-risk below 10%.
Maybe I think something like (numbers not well considered):
P(doom) = 35%
P(doom | scaling pause by executive order in 2024) = 25%
P(doom | good version of regulatory agency doing something like RSP and safety arguments passed into law in 2024) = 5% (depends a ton on details and political buy in!!!)
P(doom | full and strong international coordination around pausing all AI related progress for 10+ years which starts by pausing hardware progress and current manufacturing) = 3%
Note that these numbers take into account evidential updates (e.g., probably other good stuff is happening if we have super strong internation coordination around pausing AI).
Ah okay—thanks. That’s clarifying.
Agreed that the post is at the very least not clear.
In particular, it’s obviously not true that [if we don’t stop today, there’s more than a 10% chance we all die], and I don’t think [if we never stop, under any circumstances...] is a case many people would be considering at all.
It’d make sense to be much clearer on the ‘this’ that “many people believe”.
(and I hope you’re correct on P(doom)!)
Yeah, I probably want to walk back my claim a bit. Maybe I want to say “doesn’t strongly imply”?
It would have been better if ARC evals noted that the conclusion isn’t entirely obvious. It doesn’t seem like a huge error to me, but maybe I’m underestimating the ripple effects etc.