I plan to spend more time thinking about AI model security. The main reasons I’m not spending a lot of time on it now are:
I’m excited about the project/agenda we’ve started working on in interpretability, and my team/org more generally, and I think (or at least I hope) that I have a non-trivial positive influence on it.
I haven’t thought through what the best things to do would be. Some ideas (takes welcome):
Help create RAND or RAND-style reports like Securing AI Model Weights (I think this report is really great). E.g.
Make forecasts about how much interest from adversaries certain models are likely to get, and then how likely the model is to be stolen/compromised given that level of interest and the level defense of the developer. I expect this to be much more speculative than a typical RAND report. It might also require a bunch of non-public info on both offense and defense capabilities.
(not my idea) Make forecasts about how long a lab would take to implement certain levels of security.
Make demos that convince natsec people that AI is or will be very capable and become a top-priority target.
Improve security at a lab (probably requires becoming a full-time employee).
I plan to spend more time thinking about AI model security. The main reasons I’m not spending a lot of time on it now are:
I’m excited about the project/agenda we’ve started working on in interpretability, and my team/org more generally, and I think (or at least I hope) that I have a non-trivial positive influence on it.
I haven’t thought through what the best things to do would be. Some ideas (takes welcome):
Help create RAND or RAND-style reports like Securing AI Model Weights (I think this report is really great). E.g.
Make forecasts about how much interest from adversaries certain models are likely to get, and then how likely the model is to be stolen/compromised given that level of interest and the level defense of the developer. I expect this to be much more speculative than a typical RAND report. It might also require a bunch of non-public info on both offense and defense capabilities.
(not my idea) Make forecasts about how long a lab would take to implement certain levels of security.
Make demos that convince natsec people that AI is or will be very capable and become a top-priority target.
Improve security at a lab (probably requires becoming a full-time employee).