(some of which we’d rather not expose our model to).
Why do you want to avoid exposing your model to some inputs?
I think there’s some chance of models executing treacherous turns in response to a particular input, and I’d rather not trigger those if the model hasn’t been sufficiently sandboxed.
Why do you want to avoid exposing your model to some inputs?
I think there’s some chance of models executing treacherous turns in response to a particular input, and I’d rather not trigger those if the model hasn’t been sufficiently sandboxed.