I would love if you could write something along the lines of what you wrote in “If it were all up to me, the world would pause now—but it isn’t, and I’m more uncertain about whether a “partial pause” is good” at the top of ARC post, which as we discussed and as I wrote in my post would make RSPs more likely to be positive in my opinion by making the policy/voluntary safety commitments distinction clearer.
Regarding
Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine
2. It seems like it’s empirically wrong based on the strong pushback RSPs received so that at least you shouldn’t call it “robustly”, unless you mean a kind of modified version that would accommodate the most important parts of the pushback.
3. I feel like overall the way you discuss RSPs here is one of the many instances of people chatting about idealized RSPs that are not specified, and pointed to against disagreement. See below, from my post:
And second, the coexistence of ARC’s RSP framework with the specific RSPs labs implementations allows slack for commitments that are weak within a framework that would in theory allow ambitious commitments. It leads to many arguments of the form:
“That’s the V1. We’ll raise ambition over time”. I’d like to see evidence of that happening over a 5 year timeframe, in any field or industry. I can think of fields, like aviation where it happened over the course of decades, crashes after crashes. But if it’s relying on expectations that there will be large scale accidents, then it should be clear. If it’s relying on the assumption that timelines are long, it should be explicit.
“It’s voluntary, we can’t expect too much and it’s way better than what’s existing”. Sure, but if the level of catastrophic risks is 1% (which several AI risk experts I’ve talked to believe to be the case for ASL-3 systems) and that it gives the impression that risks are covered, then the name “responsible scaling” is heavily misleading policymakers. The adequate name for 1% catastrophic risks would be catastrophic scaling, which is less rosy.
Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine
2. It seems like it’s empirically wrong based on the strong pushback RSPs received so that at least you shouldn’t call it “robustly”, unless you mean a kind of modified version that would accommodate the most important parts of the pushback.
FWIW, my read here was that “people who have different views from mine” was in reference to these sets of people:
Some people think that the kinds of risks I’m worried about are far off, farfetched or ridiculous.
Some people think such risks might be real and soon, but that we’ll make enough progress on security, alignment, etc. to handle the risks—and indeed, that further scaling is an important enabler of this progress (e.g., a lot of alignment research will work better with more advanced systems).
Some people think the risks are real and soon, but might be relatively small, and that it’s therefore more important to focus on things like the U.S. staying ahead of other countries on AI progress.
That may be right but then the claim is wrong. The true claim would be “RSPs seem like a robustly good compromise with people who are more optimistic than me”.
That may be right but then the claim is wrong. The true claim would be “RSPs seem like a robustly good compromise with people who are more optimistic than me”.
IDK man, this seems like nitpicking to me ¯\_(ツ)_/¯. Though I do agree that, on my read, it’s technically more accurate.
My sense here is that Holden is speaking from a place where he considers himself to be among the folks (like you and I) who put significant probability on AI posing a catastrophic/existential risk in the next few years, and “people who have different views from mine” is referring to folks who aren’t in that set.
(Of course, I don’t actually know what Holden meant. This is just what seemed like the natural interpretation to me.)
#1: METR made some edits to the post in this direction (in particular see footnote 3).
On #2, Malo’s read is what I intended. I think compromising with people who want “less caution” is most likely to result in progress (given the current state of things), so it seems appropriate to focus on that direction of disagreement when making pragmatic calls like this.
On #3: I endorse the “That’s a V 1” view. While industry-wide standards often take years to revise, I think individual company policies often (maybe usually) update more quickly and frequently.
Holden, thanks for this public post.
I would love if you could write something along the lines of what you wrote in “If it were all up to me, the world would pause now—but it isn’t, and I’m more uncertain about whether a “partial pause” is good” at the top of ARC post, which as we discussed and as I wrote in my post would make RSPs more likely to be positive in my opinion by making the policy/voluntary safety commitments distinction clearer.
Regarding
2. It seems like it’s empirically wrong based on the strong pushback RSPs received so that at least you shouldn’t call it “robustly”, unless you mean a kind of modified version that would accommodate the most important parts of the pushback.
3. I feel like overall the way you discuss RSPs here is one of the many instances of people chatting about idealized RSPs that are not specified, and pointed to against disagreement. See below, from my post:
Thanks for the post.
FWIW, my read here was that “people who have different views from mine” was in reference to these sets of people:
That may be right but then the claim is wrong. The true claim would be “RSPs seem like a robustly good compromise with people who are more optimistic than me”.
And then the claim becomes not really relevant?
IDK man, this seems like nitpicking to me ¯\_(ツ)_/¯. Though I do agree that, on my read, it’s technically more accurate.
My sense here is that Holden is speaking from a place where he considers himself to be among the folks (like you and I) who put significant probability on AI posing a catastrophic/existential risk in the next few years, and “people who have different views from mine” is referring to folks who aren’t in that set.
(Of course, I don’t actually know what Holden meant. This is just what seemed like the natural interpretation to me.)
Why?
Because it’s meaningless to talk about a “compromise” dismissing one entire side of the people who disagree with you (but only one side!).
Like I could say “global compute thresholds is a robustly good compromise with everyone who disagrees with me”
*Footnote: only those who’re more pessimistic than me.
Thanks for the thoughts!
#1: METR made some edits to the post in this direction (in particular see footnote 3).
On #2, Malo’s read is what I intended. I think compromising with people who want “less caution” is most likely to result in progress (given the current state of things), so it seems appropriate to focus on that direction of disagreement when making pragmatic calls like this.
On #3: I endorse the “That’s a V 1” view. While industry-wide standards often take years to revise, I think individual company policies often (maybe usually) update more quickly and frequently.