My understanding is that their commitment is to stop once their ASL-3 evals are triggered.
Ok, we agree. By “beyond ASL-3” I thought you meant “stuff that’s outside the category ASL-3″ instead of “the first thing inside the category ASL-3”.
For the Anthropic RSP in particular, I think it’s accurate & helpful to say
Yep, that summary seems right to me. (I also think the “concrete commitments” statement is accurate.)
But I want to see RSP advocates engage more with the burden of proof concerns.
Yeah, I also think putting the burden of proof on scaling (instead of on pausing) is safer and probably appropriate. I am hesitant about it on process grounds; it seems to me like evidence of safety might require the scaling that we’re not allowing until we see evidence of safety. On net, it seems like the right decision on the current margin but the same lock-in concerns (if we do the right thing now for the wrong reasons perhaps we will do the wrong thing for the same reasons in the future) worry me about simply switching the burden of proof (instead of coming up with a better system to evaluate risk).
Ok, we agree. By “beyond ASL-3” I thought you meant “stuff that’s outside the category ASL-3″ instead of “the first thing inside the category ASL-3”.
Yep, that summary seems right to me. (I also think the “concrete commitments” statement is accurate.)
Yeah, I also think putting the burden of proof on scaling (instead of on pausing) is safer and probably appropriate. I am hesitant about it on process grounds; it seems to me like evidence of safety might require the scaling that we’re not allowing until we see evidence of safety. On net, it seems like the right decision on the current margin but the same lock-in concerns (if we do the right thing now for the wrong reasons perhaps we will do the wrong thing for the same reasons in the future) worry me about simply switching the burden of proof (instead of coming up with a better system to evaluate risk).