Davidmanheim comments on A Defense of Work on Mathematical AI Safety

Davidmanheim 7 Sep 2023 12:50 UTC
2 points
−1
I’m not really clear what you mean by not buying the example. You certainly seem to understand the distinction I’m drawing—mechanistic interpretability is definitely not what I mean by “mathematical AI safety,” though I agree there is math involved.
And I think the work on goal misgeneralization was conceptualized in ways directly related to Goodhart, and this type of problem inspired a number of research projects, including quantilizers, which is certainly agent-foundations work. I’ll point here for more places the agents foundations people think it is relevant.
- carboniferous_umbraculum 14 Sep 2023 14:19 UTC
  2 points
  0
  Parent
  Ah OK, I think I’ve worked out where some of my confusion is coming from: I don’t really see any argument for why mathematical work may be useful, relative to other kinds of foundational conceptual work. e.g. you write (with my emphasis): “Current mathematical research could play a similar role in the coming years...” But why might it? Isn’t that where you need to be arguing?
  
  The examples seem to be of cases where people have done some kind of conceptual foundational work which has later gone on to influence/inspire ML work. But early work on deception or goodhart was not mathematical work, that’s why I don’t understand how these are examples.
  - Davidmanheim 24 Dec 2023 8:05 UTC
    2 points
    0
    Parent
    I think th dispute here is that you’re interpreting mathematical too narrowly, and almost all of the work happening in agent foundations and similar is exactly what was being worked on by “mathematical AI research” 5-7 years ago. The argument was that those approaches have been fruitful, and we should expect them to continue to be so—if you want to call that “foundational conceptual research” instead of “Mathematical AI research,” that’s fine..