David Rein comments on When can we trust model evaluations?

David Rein 17 Jan 2024 21:17 UTC
1 point
0
AF
the existence of predicates on the world that are easier to evaluate than generate examples of (in the same way that verifying the answer to a problem in NP can be easier than generating it) guarantees that the model should be better at distinguishing between evaluation and deployment than any evaluator can be at tricking it into thinking it’s in deployment
Where does the guarantee come from? Why do we know that for this specific problem (generating vs. evaluating whether the model is deployed) it’s easier to evaluate? For many problems it’s equally difficult, right?
- evhub 17 Jan 2024 21:28 UTC
  LW: 2 AF: 2
  0
  AF Parent
  We know that there exist predicates which are easier to evaluate than generate examples of (e.g. RSA-2048), such that, given that the model is actively looking for such predicates, it will be able to use them to make evaluation easier than generation.