But I also suspect that people on the more cynical side aren’t going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there’s probably not much to say at this point other than, let’s see what happens next.
This seems wrong to me. We can say all kinds of things, like:
Are these RSPs actually effective if implemented? How could they be better? (Including aspects like: how will this policy be updated in the future? What will happen given disagreements?)
Is there external verification that they are implemented well?
Which developers have and have not implemented effective and verifiable RSPs?
How could employees, the public, and governments push developers to do better?
I don’t think we’re just sitting here and rolling a die about which is going to happen, path #1 or path #2. Maybe that’s right if you just are asking how much companies will do voluntarily, but I don’t think that should be the exclusive focus (and if it was there wouldn’t be much purpose to this more meta discussion). One of my main points is that external stakeholders can look at what companies are doing, discuss ways in which it is or isn’t adequate, and then actually push them to do better (and build support for government action to demand better). That process can start immediately, not at some hypothetical future time.
I agree with all of this. It’s what I meant by “it’s up to all of us.”
It will be a signal of how things are going if I’m a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.
I guess our biggest differences are (i) I don’t think the takeaway depends so strongly on whether AI developers are trying to do the right thing—either way it’s up to all of us, and (ii) I think it’s already worth talking about ways which Anthropic’s RSP is good or bad or could be better, and so I disagree with “there’s probably not much to say at this point.”
This seems wrong to me. We can say all kinds of things, like:
Are these RSPs actually effective if implemented? How could they be better? (Including aspects like: how will this policy be updated in the future? What will happen given disagreements?)
Is there external verification that they are implemented well?
Which developers have and have not implemented effective and verifiable RSPs?
How could employees, the public, and governments push developers to do better?
I don’t think we’re just sitting here and rolling a die about which is going to happen, path #1 or path #2. Maybe that’s right if you just are asking how much companies will do voluntarily, but I don’t think that should be the exclusive focus (and if it was there wouldn’t be much purpose to this more meta discussion). One of my main points is that external stakeholders can look at what companies are doing, discuss ways in which it is or isn’t adequate, and then actually push them to do better (and build support for government action to demand better). That process can start immediately, not at some hypothetical future time.
I agree with all of this. It’s what I meant by “it’s up to all of us.”
It will be a signal of how things are going if I’m a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.
That’s fair, I think I misread you.
I guess our biggest differences are (i) I don’t think the takeaway depends so strongly on whether AI developers are trying to do the right thing—either way it’s up to all of us, and (ii) I think it’s already worth talking about ways which Anthropic’s RSP is good or bad or could be better, and so I disagree with “there’s probably not much to say at this point.”