Remember that a gears-level model is an explanation of some particular phenomenon that is solid enough to causally intervene on, not an understanding of everything to do with ML. I feel like you don’t need to have the latter to make useful alignment progress. John gives the example of Bengio and vanishing gradients; Bengio didn’t need to understand every important phenomenon relevant to ML to form the gears-level model, nor did he go beyond this narrow gears-level model when writing the unitary evolution paper. With this in mind, I think the gears-level models required to make alignment progress can be very specific to the area and maybe not very enlightening to write in a big list. With 1000 papers trying to solve 100 different problems, my guess is you’d have 10 different theories of the dynamics of machine learning, and 300 different models of the problems, and the latter would be at least as important to the success of the papers.
Remember that a gears-level model is an explanation of some particular phenomenon that is solid enough to causally intervene on, not an understanding of everything to do with ML. I feel like you don’t need to have the latter to make useful alignment progress. John gives the example of Bengio and vanishing gradients; Bengio didn’t need to understand every important phenomenon relevant to ML to form the gears-level model, nor did he go beyond this narrow gears-level model when writing the unitary evolution paper. With this in mind, I think the gears-level models required to make alignment progress can be very specific to the area and maybe not very enlightening to write in a big list. With 1000 papers trying to solve 100 different problems, my guess is you’d have 10 different theories of the dynamics of machine learning, and 300 different models of the problems, and the latter would be at least as important to the success of the papers.