I will also add a point re “just do AI alignment math”:
Math studies the structures of things. A solution to our AI alignment problem has to be something we can use, in this universe. The structure of this problem is laden with stuff like agents and deception, and in order to derive relevant stuff for us, our AI is going to need to understand all that.
Most of the work of solving AI alignment does not look like proving things that are hard to prove. It involves puzzling over the structure of agents trying to build agents, and trying to find a promising angle on our ability to build an agent that will help us get what we want. If you want your AI to solve alignment, it has to be able to do this.
This sketch of the problem puts “solve AI alignment” in a dangerous capability reference class for me. I do remain hopeful that we can find places where AI can help us along the way. But I personally don’t know of current avenues where we could use non-scary AI to meaningfully help.
I will also add a point re “just do AI alignment math”:
Math studies the structures of things. A solution to our AI alignment problem has to be something we can use, in this universe. The structure of this problem is laden with stuff like agents and deception, and in order to derive relevant stuff for us, our AI is going to need to understand all that.
Most of the work of solving AI alignment does not look like proving things that are hard to prove. It involves puzzling over the structure of agents trying to build agents, and trying to find a promising angle on our ability to build an agent that will help us get what we want. If you want your AI to solve alignment, it has to be able to do this.
This sketch of the problem puts “solve AI alignment” in a dangerous capability reference class for me. I do remain hopeful that we can find places where AI can help us along the way. But I personally don’t know of current avenues where we could use non-scary AI to meaningfully help.