ryan_greenblatt comments on Catching AIs red-handed

ryan_greenblatt 11 Feb 2024 2:38 UTC
3 points
0
Ok, well the most immediate next step is to try to ensure that some sort of commitment along these lines gets included in RSPs (or preparedness frameworks or etc).

This sort of commitment is technically a special case of RSP commitment like “we’ll make a safety case (approved by an independent review board) that we’ll be safe from risks due to models autonomously causing harm”, but I think it’s pretty likely that having a more specific incident response policy is pretty good.

I’m unsure what specific actions along these lines can be taken to influence RSPs, but insofar as you already have some influence over some such process, pushing for this sort of commitment seems probably good (as it’s extremely common sense and also moderately useful).

I’m uncertain about what the space looks like as far as executive orders or other us/uk/etc goverment policy. But, insofar as anything might include something like incident reporting, specifically highlighting this sort of case seems pretty good.