Do you think models like this need to be kept static? For example if you allow the model to learn from its mistakes, etc, after this evaluation, or receive additional networks added to its architecture, doesn’t this negate any safety evals?
I ask because this is the obvious way to use ai in a business. Subscribe to a base model with unlocked weights and add additional networks optimized for RL or domain specific estimations.
Do you think models like this need to be kept static? For example if you allow the model to learn from its mistakes, etc, after this evaluation, or receive additional networks added to its architecture, doesn’t this negate any safety evals?
I ask because this is the obvious way to use ai in a business. Subscribe to a base model with unlocked weights and add additional networks optimized for RL or domain specific estimations.
No, see appendix C in our paper for a discussion of how to handle online training while being conservative about inductive biases.
(We can also approximate this as needed if that exact scheme is too complex or expensive.)