The charitable interpretation here is that we’ll compute E[harm|action] (or more generally E[utility|action]) using our posterior over hypothesis and then choose what action to execute based on this. (Or at least we’ll pause and refer actions to humans if E[harm|action] is too high.)
I think “ruling out the possiblity” isn’t really a good frame for thinking about this, and it’s much more natural to just think about this as an estimation procedure which is trying hard to avoid over confidence in out-of-distribution contexts (or more generally, in contexts where training data doesn’t pin down predictions well enough).
ETA: realistically, I think this isn’t going to perform better than ensembling in practice.
The charitable interpretation here is that we’ll compute E[harm|action] (or more generally E[utility|action]) using our posterior over hypothesis and then choose what action to execute based on this. (Or at least we’ll pause and refer actions to humans if E[harm|action] is too high.)
I think “ruling out the possiblity” isn’t really a good frame for thinking about this, and it’s much more natural to just think about this as an estimation procedure which is trying hard to avoid over confidence in out-of-distribution contexts (or more generally, in contexts where training data doesn’t pin down predictions well enough).
ETA: realistically, I think this isn’t going to perform better than ensembling in practice.
See discussion here also.