I’m not really clear what you mean by not buying the example. You certainly seem to understand the distinction I’m drawing—mechanistic interpretability is definitely not what I mean by “mathematical AI safety,” though I agree there is math involved.
And I think the work on goal misgeneralization was conceptualized in ways directly related to Goodhart, and this type of problem inspired a number of research projects, including quantilizers, which is certainly agent-foundations work. I’ll point here for more places the agents foundations people think it is relevant.
Ah OK, I think I’ve worked out where some of my confusion is coming from: I don’t really see any argument for why mathematical work may be useful, relative to other kinds of foundational conceptual work. e.g. you write (with my emphasis): “Current mathematical research could play a similar role in the coming years...” But why might it? Isn’t that where you need to be arguing?
The examples seem to be of cases where people have done some kind of conceptual foundational work which has later gone on to influence/inspire ML work. But early work on deception or goodhart was not mathematical work, that’s why I don’t understand how these are examples.
I think th dispute here is that you’re interpreting mathematical too narrowly, and almost all of the work happening in agent foundations and similar is exactly what was being worked on by “mathematical AI research” 5-7 years ago. The argument was that those approaches have been fruitful, and we should expect them to continue to be so—if you want to call that “foundational conceptual research” instead of “Mathematical AI research,” that’s fine..
I’m not really clear what you mean by not buying the example. You certainly seem to understand the distinction I’m drawing—mechanistic interpretability is definitely not what I mean by “mathematical AI safety,” though I agree there is math involved.
And I think the work on goal misgeneralization was conceptualized in ways directly related to Goodhart, and this type of problem inspired a number of research projects, including quantilizers, which is certainly agent-foundations work. I’ll point here for more places the agents foundations people think it is relevant.
Ah OK, I think I’ve worked out where some of my confusion is coming from: I don’t really see any argument for why mathematical work may be useful, relative to other kinds of foundational conceptual work. e.g. you write (with my emphasis): “Current mathematical research could play a similar role in the coming years...” But why might it? Isn’t that where you need to be arguing?
The examples seem to be of cases where people have done some kind of conceptual foundational work which has later gone on to influence/inspire ML work. But early work on deception or goodhart was not mathematical work, that’s why I don’t understand how these are examples.
I think th dispute here is that you’re interpreting mathematical too narrowly, and almost all of the work happening in agent foundations and similar is exactly what was being worked on by “mathematical AI research” 5-7 years ago. The argument was that those approaches have been fruitful, and we should expect them to continue to be so—if you want to call that “foundational conceptual research” instead of “Mathematical AI research,” that’s fine..