I’m sympathetic to how this process might be exhausting, but at an institutional level I think Anthropic (and all labs) owe humanity a much clearer image of how they would approach a potentially serious and dangerous situation with their models. Especially so, given that the RSP is fairly silent on this point, leaving the response to evaluations up to the discretion of Anthropic. In other words, the reason I want to hear more from employees is in part because I don’t know what the decision process inside of Anthropic will look like if an evaluation indicates something like “yeah, it’s excellent at inserting backdoors, and also, the vibe is that it’s overall pretty capable.” And given that Anthropic is making these decisions on behalf of everyone, Anthropic (like all labs) really owes it to humanity to be more upfront about how it’ll make these decisions (imo).
I will also note what I feel is a somewhat concerning trend. It’s happened many times now that I’ve critiqued something about Anthropic (its RSP, advocating to eliminate pre-harm from SB 1047, the silent reneging on the commitment to not push the frontier), and someone has said something to the effect of: “this wouldn’t seem so bad if you knew what was happening behind the scenes.” They of course cannot tell me what the “behind the scenes” information is, so I have no way of knowing whether that’s true. And, maybe I would in fact update positively about Anthropic if I knew. But I do think the shape of “we’re doing something which might be incredibly dangerous, many external bits of evidence point to us not taking the safety of this endeavor seriously, but actually you should think we are based on me telling you we are” is pretty sketchy.
I will also note what I feel is a somewhat concerning trend. It’s happened many times now that I’ve critiqued something about Anthropic (its RSP, advocating to eliminate pre-harm from SB 1047, the silent reneging on the commitment to not push the frontier), and someone has said something to the effect of: “this wouldn’t seem so bad if you knew what was happening behind the scenes.”
I just wanted to +1 that I am also concerned about this trend, and I view it as one of the things that I think has pushed me (as well as many others in the community) to lose a lot of faith in corporate governance (especially of the “look, we can’t make any tangible commitments but you should just trust us to do what’s right” variety) and instead look to governments to get things under control.
I don’t think Anthropic is solely to blame for this trend, of course, but I think Anthropic has performed less well on comms/policy than I [and IMO many others] would’ve predicted if you had asked me [or us] in 2022.
I’m sympathetic to how this process might be exhausting, but at an institutional level I think Anthropic (and all labs) owe humanity a much clearer image of how they would approach a potentially serious and dangerous situation with their models. Especially so, given that the RSP is fairly silent on this point, leaving the response to evaluations up to the discretion of Anthropic. In other words, the reason I want to hear more from employees is in part because I don’t know what the decision process inside of Anthropic will look like if an evaluation indicates something like “yeah, it’s excellent at inserting backdoors, and also, the vibe is that it’s overall pretty capable.” And given that Anthropic is making these decisions on behalf of everyone, Anthropic (like all labs) really owes it to humanity to be more upfront about how it’ll make these decisions (imo).
I will also note what I feel is a somewhat concerning trend. It’s happened many times now that I’ve critiqued something about Anthropic (its RSP, advocating to eliminate pre-harm from SB 1047, the silent reneging on the commitment to not push the frontier), and someone has said something to the effect of: “this wouldn’t seem so bad if you knew what was happening behind the scenes.” They of course cannot tell me what the “behind the scenes” information is, so I have no way of knowing whether that’s true. And, maybe I would in fact update positively about Anthropic if I knew. But I do think the shape of “we’re doing something which might be incredibly dangerous, many external bits of evidence point to us not taking the safety of this endeavor seriously, but actually you should think we are based on me telling you we are” is pretty sketchy.
I just wanted to +1 that I am also concerned about this trend, and I view it as one of the things that I think has pushed me (as well as many others in the community) to lose a lot of faith in corporate governance (especially of the “look, we can’t make any tangible commitments but you should just trust us to do what’s right” variety) and instead look to governments to get things under control.
I don’t think Anthropic is solely to blame for this trend, of course, but I think Anthropic has performed less well on comms/policy than I [and IMO many others] would’ve predicted if you had asked me [or us] in 2022.