I think there are two paths, roughly, that RSPs could send us down.
RSPs are a good starting point. Over time we make them more concrete, build out the technical infrastructure to measure risk, and enshrine them in regulation or binding agreements between AI companies. They reduce risk substantially, and provide a mechanism whereby we can institute a global pause if necessary, which seems otherwise infeasible right now.
RSPs are a type of safety-washing. They provide the illusion of a plan, but as written they are so vague as to be meaningless. They let companies claim they take safety seriously but don’t meaningfully reduce risk, and in fact may increase it by letting companies skate by without doing real work, rather than forcing companies to act responsibly by just not developing a dangerous uncontrollable technology.
If you think that Anthropic and other labs that adopt these are fundamentally well meaning and trying to do the right thing, you’ll assume that we are by default heading down path #1. If you are more cynical about how companies are acting, then #2 may seem more plausible.
My feeling is that Anthropic et al are clearly trying to do the right thing, and that it’s on us to do the work to ensure that we stay on the good path here, by working to deliver the concrete pieces we need, and to keep the pressure on AI labs to take these ideas seriously. And to ask regulators to also take concrete steps to make RSPs have teeth and enforce the right outcomes.
But I also suspect that people on the more cynical side aren’t going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there’s probably not much to say at this point other than, let’s see what happens next.
If you think that Anthropic and other labs that adopt these are fundamentally well meaning and trying to do the right thing, you’ll assume that we are by default heading down path #1. If you are more cynical about how companies are acting, then #2 may seem more plausible.
I disagree that what you think about a lab’s internal motivations should be very relevant here. For any particular lab/government adopting any particular RSP, you can just ask, does having this RSP make it easier or harder to implement future good legislation? My sense is that the answer to that question should mostly depend on whether the substance of the RSP is actually better-than-nothing or not, and what your general models of politics are, rather than any facts about people’s internal motivations—especially since trying to externally judge the motivation of a company with huge PR resources is a fundamentally fraught thing to do.
Furthermore, my sense is that, most of the time, the crux here tends to be more around models of how politics works. If you think that there’s only a very narrow policy window to get in some policy and if you get the wrong policy in you miss your shot, then you won’t be willing to accept an RSP that is good but insufficient on its own. I tend to refer to this as the “resource mindset”—you’re thinking of political influence, policy windows, etc. as a limited resource to be spent wisely. My sense, though, is that the resource mindset is just wrong when applied to politics—the right mindset, I think, is a positive-sum mindset, where small better-than-nothing policy actions yield larger, even-better-than-nothing policy actions, until eventually you build up to something sufficient.
Certainly I could imagine situations where an RSP is crafted in such a way as to try to stymie future regulation, though I think doing so is actually quite hard:
Governments have sovereignty, so you can’t just restrict what they’ll do in the future.
Once a regulatory organization exists for something, it’s very easy to just give it more tasks, make it stricter, etc., and much harder to get rid of it, so the existence of previous regulation generally makes new regulation easier not harder.
At least in democracies, leaders regularly come and go, and tend to like to get their new thing passed without caring that much about repealing the old thing, so different overlapping regulations can easily pile up.
Of course, that’s not to say that we shouldn’t still ask for RSPs that make future regulation even more likely to be good, e.g. by:
Not overclaiming about what sort of stuff is measurable (e.g. not trying to formalize simple metrics for alignment that will be insufficient).
Leaving open clear and obvious holes to be filled later by future regulation.
But I also suspect that people on the more cynical side aren’t going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there’s probably not much to say at this point other than, let’s see what happens next.
This seems wrong to me. We can say all kinds of things, like:
Are these RSPs actually effective if implemented? How could they be better? (Including aspects like: how will this policy be updated in the future? What will happen given disagreements?)
Is there external verification that they are implemented well?
Which developers have and have not implemented effective and verifiable RSPs?
How could employees, the public, and governments push developers to do better?
I don’t think we’re just sitting here and rolling a die about which is going to happen, path #1 or path #2. Maybe that’s right if you just are asking how much companies will do voluntarily, but I don’t think that should be the exclusive focus (and if it was there wouldn’t be much purpose to this more meta discussion). One of my main points is that external stakeholders can look at what companies are doing, discuss ways in which it is or isn’t adequate, and then actually push them to do better (and build support for government action to demand better). That process can start immediately, not at some hypothetical future time.
I agree with all of this. It’s what I meant by “it’s up to all of us.”
It will be a signal of how things are going if I’m a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.
I guess our biggest differences are (i) I don’t think the takeaway depends so strongly on whether AI developers are trying to do the right thing—either way it’s up to all of us, and (ii) I think it’s already worth talking about ways which Anthropic’s RSP is good or bad or could be better, and so I disagree with “there’s probably not much to say at this point.”
I think there are two paths, roughly, that RSPs could send us down.
RSPs are a good starting point. Over time we make them more concrete, build out the technical infrastructure to measure risk, and enshrine them in regulation or binding agreements between AI companies. They reduce risk substantially, and provide a mechanism whereby we can institute a global pause if necessary, which seems otherwise infeasible right now.
RSPs are a type of safety-washing. They provide the illusion of a plan, but as written they are so vague as to be meaningless. They let companies claim they take safety seriously but don’t meaningfully reduce risk, and in fact may increase it by letting companies skate by without doing real work, rather than forcing companies to act responsibly by just not developing a dangerous uncontrollable technology.
If you think that Anthropic and other labs that adopt these are fundamentally well meaning and trying to do the right thing, you’ll assume that we are by default heading down path #1. If you are more cynical about how companies are acting, then #2 may seem more plausible.
My feeling is that Anthropic et al are clearly trying to do the right thing, and that it’s on us to do the work to ensure that we stay on the good path here, by working to deliver the concrete pieces we need, and to keep the pressure on AI labs to take these ideas seriously. And to ask regulators to also take concrete steps to make RSPs have teeth and enforce the right outcomes.
But I also suspect that people on the more cynical side aren’t going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there’s probably not much to say at this point other than, let’s see what happens next.
I disagree that what you think about a lab’s internal motivations should be very relevant here. For any particular lab/government adopting any particular RSP, you can just ask, does having this RSP make it easier or harder to implement future good legislation? My sense is that the answer to that question should mostly depend on whether the substance of the RSP is actually better-than-nothing or not, and what your general models of politics are, rather than any facts about people’s internal motivations—especially since trying to externally judge the motivation of a company with huge PR resources is a fundamentally fraught thing to do.
Furthermore, my sense is that, most of the time, the crux here tends to be more around models of how politics works. If you think that there’s only a very narrow policy window to get in some policy and if you get the wrong policy in you miss your shot, then you won’t be willing to accept an RSP that is good but insufficient on its own. I tend to refer to this as the “resource mindset”—you’re thinking of political influence, policy windows, etc. as a limited resource to be spent wisely. My sense, though, is that the resource mindset is just wrong when applied to politics—the right mindset, I think, is a positive-sum mindset, where small better-than-nothing policy actions yield larger, even-better-than-nothing policy actions, until eventually you build up to something sufficient.
Certainly I could imagine situations where an RSP is crafted in such a way as to try to stymie future regulation, though I think doing so is actually quite hard:
Governments have sovereignty, so you can’t just restrict what they’ll do in the future.
Once a regulatory organization exists for something, it’s very easy to just give it more tasks, make it stricter, etc., and much harder to get rid of it, so the existence of previous regulation generally makes new regulation easier not harder.
At least in democracies, leaders regularly come and go, and tend to like to get their new thing passed without caring that much about repealing the old thing, so different overlapping regulations can easily pile up.
Of course, that’s not to say that we shouldn’t still ask for RSPs that make future regulation even more likely to be good, e.g. by:
Not overclaiming about what sort of stuff is measurable (e.g. not trying to formalize simple metrics for alignment that will be insufficient).
Leaving open clear and obvious holes to be filled later by future regulation.
etc.
This seems wrong to me. We can say all kinds of things, like:
Are these RSPs actually effective if implemented? How could they be better? (Including aspects like: how will this policy be updated in the future? What will happen given disagreements?)
Is there external verification that they are implemented well?
Which developers have and have not implemented effective and verifiable RSPs?
How could employees, the public, and governments push developers to do better?
I don’t think we’re just sitting here and rolling a die about which is going to happen, path #1 or path #2. Maybe that’s right if you just are asking how much companies will do voluntarily, but I don’t think that should be the exclusive focus (and if it was there wouldn’t be much purpose to this more meta discussion). One of my main points is that external stakeholders can look at what companies are doing, discuss ways in which it is or isn’t adequate, and then actually push them to do better (and build support for government action to demand better). That process can start immediately, not at some hypothetical future time.
I agree with all of this. It’s what I meant by “it’s up to all of us.”
It will be a signal of how things are going if I’m a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.
That’s fair, I think I misread you.
I guess our biggest differences are (i) I don’t think the takeaway depends so strongly on whether AI developers are trying to do the right thing—either way it’s up to all of us, and (ii) I think it’s already worth talking about ways which Anthropic’s RSP is good or bad or could be better, and so I disagree with “there’s probably not much to say at this point.”