Things like their RSP rely on being upheld in spirit, not only in letter.
This is something I’m worried about. I think that Anthropic’s current RSP is vague and/or undefined on many crucial points. For instance, I feel pretty nervous about Anthropic’s proposed response to an evaluation threshold triggering. One of the first steps is that they will “conduct a thorough analysis to determine whether the evaluation was overly conservative,” without describing what this “thorough analysis” is, nor who is doing it.
In other words, they will undertake some currently undefined process involving undefined people to decide whether it was a false alarm. Given how much is riding on this decision—like, you know, all of the potential profit they’d be losing if they concluded that the model was in fact dangerous—it seems pretty important to be clear about how these things will be resolved.
Instituting a policy like this is only helpful insomuch as it meaningfully constrains the company’s behavior. But when the response to evaluations are this loosely and vaguely defined, it’s hard for me to trust that the RSP cashes out to more than a vague hope that Anthropic will be careful. It would be nice to feel like the Long Term Benefit Trust provided some kind of assurance against this. But even this seems difficult to trust when they’ve added “failsafe provisions” that allow a “sufficiently large” supermajority of stockholders to make changes to the Trust’s powers (without the Trustees consent), and without saying what counts as “sufficiently large.”
This is something I’m worried about. I think that Anthropic’s current RSP is vague and/or undefined on many crucial points. For instance, I feel pretty nervous about Anthropic’s proposed response to an evaluation threshold triggering. One of the first steps is that they will “conduct a thorough analysis to determine whether the evaluation was overly conservative,” without describing what this “thorough analysis” is, nor who is doing it.
In other words, they will undertake some currently undefined process involving undefined people to decide whether it was a false alarm. Given how much is riding on this decision—like, you know, all of the potential profit they’d be losing if they concluded that the model was in fact dangerous—it seems pretty important to be clear about how these things will be resolved.
Instituting a policy like this is only helpful insomuch as it meaningfully constrains the company’s behavior. But when the response to evaluations are this loosely and vaguely defined, it’s hard for me to trust that the RSP cashes out to more than a vague hope that Anthropic will be careful. It would be nice to feel like the Long Term Benefit Trust provided some kind of assurance against this. But even this seems difficult to trust when they’ve added “failsafe provisions” that allow a “sufficiently large” supermajority of stockholders to make changes to the Trust’s powers (without the Trustees consent), and without saying what counts as “sufficiently large.”