fOh, I’m certainly not claiming that no-one should attempt to make the estimates.
Ah my bad if I lost the thread there
I’d want regulators to push for safer strategies, not to run checks on unsafe strategies—at best that seems likely to get a local minimum (and, as ever, overconfidence).
Seems like checks on unsafe strategies does well encourages safer strategies, I agree overconfidence is an issue though
More [evaluate the plan to get through the minefield], and less [estimate whether we’ll get blown up on the next step]
Seems true in an ideal world but in practice I’d imagine it’s much easier to get consensus when you have more concrete evidence of danger / misalignment. Seems like there’s lots of disagreement even within the current alignment field and I don’t expect that to change absent of more evidence of danger/misalignment and perhaps credible estimates.
To be clear I think if we could push a button for an international pause now it would be great, and I think it’s good to advocate for that to shift the Overton Window if nothing else, but in terms of realistic plans it seems good to aim for stuff a bit closer to evaluating the next step than overall policies, for which there is massive disagreement.
(of course there’s a continuum between just looking at the next step and the overall plan, there totally should be people doing both and there are so it’s a question at the margin, etc.)
The other portions of your comment I think I’ve already given my thoughts on previously, but overall I’d say I continue to think it depends a lot on the particulars of the regulation and the group doing the risk assessment; done well I think it could set up incentives well but yes if done poorly it will get Goodharted. Anyway, I’m not sure it’s particularly likely to get enshrined into regulation anytime soon, so hopefully we will get some evidence as to how feasible it is and how it’s perceived via pilots and go from there.
Ah my bad if I lost the thread there
Seems like checks on unsafe strategies does well encourages safer strategies, I agree overconfidence is an issue though
Seems true in an ideal world but in practice I’d imagine it’s much easier to get consensus when you have more concrete evidence of danger / misalignment. Seems like there’s lots of disagreement even within the current alignment field and I don’t expect that to change absent of more evidence of danger/misalignment and perhaps credible estimates.
To be clear I think if we could push a button for an international pause now it would be great, and I think it’s good to advocate for that to shift the Overton Window if nothing else, but in terms of realistic plans it seems good to aim for stuff a bit closer to evaluating the next step than overall policies, for which there is massive disagreement.
(of course there’s a continuum between just looking at the next step and the overall plan, there totally should be people doing both and there are so it’s a question at the margin, etc.)
The other portions of your comment I think I’ve already given my thoughts on previously, but overall I’d say I continue to think it depends a lot on the particulars of the regulation and the group doing the risk assessment; done well I think it could set up incentives well but yes if done poorly it will get Goodharted. Anyway, I’m not sure it’s particularly likely to get enshrined into regulation anytime soon, so hopefully we will get some evidence as to how feasible it is and how it’s perceived via pilots and go from there.