The equivalent of not using C for AGI development is not using machine learning techniques. You are right that that seems to be what DM/et. al. are gearing us up to do, and I agree that developing such compiler guardrails might be better than nothing and that we should encourage people to come up with more of them when they can be stacked neatly. I’m not that pessimistic. These compiler level security features do help prevent bugs. They’re just not generally sufficient when stacked against overwhelming optimization pressure and large attack surfaces.
My probably wrong layman’s read of the AGI safety field is that people will still need to either come up with a “new abstraction”, or start cataloging the situations in which they will actually be faced with overwhelming optimization pressure, and avoid those situations desperately, instead of trying to do the DEP+ASLR+Stack Canaries thing. AGI safety is not, actually, a security problem. You get to build your dragon and your task is to “box” the dragon you choose. Remove the parts where you let the dragon think about how to fuck up its training process and you remove the places where it can design these exploits.
The equivalent of not using C for AGI development is not using machine learning techniques. You are right that that seems to be what DM/et. al. are gearing us up to do, and I agree that developing such compiler guardrails might be better than nothing and that we should encourage people to come up with more of them when they can be stacked neatly. I’m not that pessimistic. These compiler level security features do help prevent bugs. They’re just not generally sufficient when stacked against overwhelming optimization pressure and large attack surfaces.
My probably wrong layman’s read of the AGI safety field is that people will still need to either come up with a “new abstraction”, or start cataloging the situations in which they will actually be faced with overwhelming optimization pressure, and avoid those situations desperately, instead of trying to do the DEP+ASLR+Stack Canaries thing. AGI safety is not, actually, a security problem. You get to build your dragon and your task is to “box” the dragon you choose. Remove the parts where you let the dragon think about how to fuck up its training process and you remove the places where it can design these exploits.