I think that even after your edit, your argument still applies more broadly than you’re giving it credit for: if computer security is going to go poorly, then we’re facing pretty serious AI risk even if the safety techniques require trivial effort during deployment.
If your AI is stolen, you face substantial risk even if you had been able to align it (e.g. because you might immediately get into an AI enabled war, and you might be forced to proceed with building more powerful and less-likely-to-be-aligned models because of the competitive pressure).
So I think your argument also pushes against working on alignment techniques.
I plan to spend more time thinking about AI model security. The main reasons I’m not spending a lot of time on it now are:
I’m excited about the project/agenda we’ve started working on in interpretability, and my team/org more generally, and I think (or at least I hope) that I have a non-trivial positive influence on it.
I haven’t thought through what the best things to do would be. Some ideas (takes welcome):
Help create RAND or RAND-style reports like Securing AI Model Weights (I think this report is really great). E.g.
Make forecasts about how much interest from adversaries certain models are likely to get, and then how likely the model is to be stolen/compromised given that level of interest and the level defense of the developer. I expect this to be much more speculative than a typical RAND report. It might also require a bunch of non-public info on both offense and defense capabilities.
(not my idea) Make forecasts about how long a lab would take to implement certain levels of security.
Make demos that convince natsec people that AI is or will be very capable and become a top-priority target.
Improve security at a lab (probably requires becoming a full-time employee).
I think that even after your edit, your argument still applies more broadly than you’re giving it credit for: if computer security is going to go poorly, then we’re facing pretty serious AI risk even if the safety techniques require trivial effort during deployment.
If your AI is stolen, you face substantial risk even if you had been able to align it (e.g. because you might immediately get into an AI enabled war, and you might be forced to proceed with building more powerful and less-likely-to-be-aligned models because of the competitive pressure).
So I think your argument also pushes against working on alignment techniques.
I’m curious @Dan Braun, why don’t you work on computer security (assuming I correctly understand that you don’t)?
I plan to spend more time thinking about AI model security. The main reasons I’m not spending a lot of time on it now are:
I’m excited about the project/agenda we’ve started working on in interpretability, and my team/org more generally, and I think (or at least I hope) that I have a non-trivial positive influence on it.
I haven’t thought through what the best things to do would be. Some ideas (takes welcome):
Help create RAND or RAND-style reports like Securing AI Model Weights (I think this report is really great). E.g.
Make forecasts about how much interest from adversaries certain models are likely to get, and then how likely the model is to be stolen/compromised given that level of interest and the level defense of the developer. I expect this to be much more speculative than a typical RAND report. It might also require a bunch of non-public info on both offense and defense capabilities.
(not my idea) Make forecasts about how long a lab would take to implement certain levels of security.
Make demos that convince natsec people that AI is or will be very capable and become a top-priority target.
Improve security at a lab (probably requires becoming a full-time employee).