Sammy Martin comments on Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Sammy Martin 26 Jul 2023 17:47 UTC
LW: 2 AF: 1
0
AF
You’re right, I’ve reread the section and that was a slight misunderstanding on my part.
Even so I still think it falls at a 7 on my scale as it’s a way of experimentally validating oversight processes that gives you some evidence about how they’ll work in unseen situations.
- Buck 26 Jul 2023 17:54 UTC
  LW: 5 AF: 1
  3
  AF Parent
  I’d say the main point here is that I don’t want to rely on my ability to extrapolate anything about how the model behaves in “unseen situations”, I want to run this eval in every situation where I’m deploying my model.