Gerald Monroe comments on The case for ensuring that powerful AIs are controlled

Gerald Monroe 26 Jan 2024 6:00 UTC
2 points
0
Do you think models like this need to be kept static? For example if you allow the model to learn from its mistakes, etc, after this evaluation, or receive additional networks added to its architecture, doesn’t this negate any safety evals?

I ask because this is the obvious way to use ai in a business. Subscribe to a base model with unlocked weights and add additional networks optimized for RL or domain specific estimations.
- ryan_greenblatt 26 Jan 2024 6:44 UTC
  2 points
  0
  Parent
  No, see appendix C in our paper for a discussion of how to handle online training while being conservative about inductive biases.
  
  (We can also approximate this as needed if that exact scheme is too complex or expensive.)