You’re right, I’ve reread the section and that was a slight misunderstanding on my part.
Even so I still think it falls at a 7 on my scale as it’s a way of experimentally validating oversight processes that gives you some evidence about how they’ll work in unseen situations.
I’d say the main point here is that I don’t want to rely on my ability to extrapolate anything about how the model behaves in “unseen situations”, I want to run this eval in every situation where I’m deploying my model.
You’re right, I’ve reread the section and that was a slight misunderstanding on my part.
Even so I still think it falls at a 7 on my scale as it’s a way of experimentally validating oversight processes that gives you some evidence about how they’ll work in unseen situations.
I’d say the main point here is that I don’t want to rely on my ability to extrapolate anything about how the model behaves in “unseen situations”, I want to run this eval in every situation where I’m deploying my model.