Buck comments on You’re Measuring Model Complexity Wrong

Buck 5 Nov 2023 21:16 UTC
LW: 3 AF: 2
0
AF
(some of which we’d rather not expose our model to).
Why do you want to avoid exposing your model to some inputs?
- Jesse Hoogland 16 Nov 2023 9:17 UTC
  LW: 3 AF: 1
  0
  AF Parent
  I think there’s some chance of models executing treacherous turns in response to a particular input, and I’d rather not trigger those if the model hasn’t been sufficiently sandboxed.