Making safety commitments conditional on risks is nice
Making commitments conditional on risks would be nice. Making commitments conditional on [risks that we notice] is clearly inadequate.
That this distinction isn’t made in giant red letters by ARC Evals is disappointing.
Evals might well be great—conditional on clarity that, absent fundamental breakthroughs in our understanding, they can only tell us [model is dangerous], not [model is safe]. Without that clarity, both Evals generally, and RSPs specifically, seem likely to engender dangerous overconfidence.
Another thing I’d like to be clearer is that the following can both be true:
RSPs are a sensible approach for labs/nations to adopt unilaterally.
RSPs are inadequate as a target for global coordination.
I’m highly uncertain about (1). (2) is obviously true. (obvious because the risk is much too high—not that it’s impossible we get extremely lucky)
Making commitments conditional on risks would be nice.
Making commitments conditional on [risks that we notice] is clearly inadequate.
That this distinction isn’t made in giant red letters by ARC Evals is disappointing.
Evals might well be great—conditional on clarity that, absent fundamental breakthroughs in our understanding, they can only tell us [model is dangerous], not [model is safe]. Without that clarity, both Evals generally, and RSPs specifically, seem likely to engender dangerous overconfidence.
Another thing I’d like to be clearer is that the following can both be true:
RSPs are a sensible approach for labs/nations to adopt unilaterally.
RSPs are inadequate as a target for global coordination.
I’m highly uncertain about (1).
(2) is obviously true. (obvious because the risk is much too high—not that it’s impossible we get extremely lucky)