I’d be on board with at least a very long delay on the AI safety equivalent of “not writing in C,” which would be “not building AGI.”
Unfortunately, that seems to not be a serious option on the table. Even if it were, we could still hope for duct tape patches/Swiss cheese security layers to mitigate, slow, or reduce the chance of an AI security failure. It seems to me that the possibility of a reasonably robust AI safety combination solution is something we’d want to encourage. If not, why not?
The equivalent of not using C for AGI development is not using machine learning techniques. You are right that that seems to be what DM/et. al. are gearing us up to do, and I agree that developing such compiler guardrails might be better than nothing and that we should encourage people to come up with more of them when they can be stacked neatly. I’m not that pessimistic. These compiler level security features do help prevent bugs. They’re just not generally sufficient when stacked against overwhelming optimization pressure and large attack surfaces.
My probably wrong layman’s read of the AGI safety field is that people will still need to either come up with a “new abstraction”, or start cataloging the situations in which they will actually be faced with overwhelming optimization pressure, and avoid those situations desperately, instead of trying to do the DEP+ASLR+Stack Canaries thing. AGI safety is not, actually, a security problem. You get to build your dragon and your task is to “box” the dragon you choose. Remove the parts where you let the dragon think about how to fuck up its training process and you remove the places where it can design these exploits.
I’d be on board with at least a very long delay on the AI safety equivalent of “not writing in C,” which would be “not building AGI.”
Unfortunately, that seems to not be a serious option on the table. Even if it were, we could still hope for duct tape patches/Swiss cheese security layers to mitigate, slow, or reduce the chance of an AI security failure. It seems to me that the possibility of a reasonably robust AI safety combination solution is something we’d want to encourage. If not, why not?
The equivalent of not using C for AGI development is not using machine learning techniques. You are right that that seems to be what DM/et. al. are gearing us up to do, and I agree that developing such compiler guardrails might be better than nothing and that we should encourage people to come up with more of them when they can be stacked neatly. I’m not that pessimistic. These compiler level security features do help prevent bugs. They’re just not generally sufficient when stacked against overwhelming optimization pressure and large attack surfaces.
My probably wrong layman’s read of the AGI safety field is that people will still need to either come up with a “new abstraction”, or start cataloging the situations in which they will actually be faced with overwhelming optimization pressure, and avoid those situations desperately, instead of trying to do the DEP+ASLR+Stack Canaries thing. AGI safety is not, actually, a security problem. You get to build your dragon and your task is to “box” the dragon you choose. Remove the parts where you let the dragon think about how to fuck up its training process and you remove the places where it can design these exploits.