The story you have is “the developers build a few small neural net modules that do one thing, mechanistically understand those modules, then use those modules to build newer modules that do ‘bigger’ things, and mechanistically understand those, and keep iterating this until they have an AGI”. Does that sound right to you? If so, I agree that by following such a process the developer team could get mechanistic transparency into the neural net the same way that laptop-making companies have mechanistic transparency into laptops.
The story I took away from this post is “we do end-to-end training with regularization for modularity, and then we get out a neural net with modular structure. We then need to understand this neural net mechanistically to ensure it isn’t dangerous”. This seems much more analogous to needing to mechanistically understand a laptop that “fell out of the sky one day” before we had ever made a laptop.
My critiques are primarily about the second story. My critique of the first story would be that it seems like you’re sacrificing a lot of competitiveness by having to develop the modules one at a time, instead of using end-to-end training.
You could imagine a synthesis of the two stories: train a medium-level smart thing end-to-end, look at what all the modules are doing, and use those modules when training a smarter thing.
Okay, I think I see the miscommunication.
The story you have is “the developers build a few small neural net modules that do one thing, mechanistically understand those modules, then use those modules to build newer modules that do ‘bigger’ things, and mechanistically understand those, and keep iterating this until they have an AGI”. Does that sound right to you? If so, I agree that by following such a process the developer team could get mechanistic transparency into the neural net the same way that laptop-making companies have mechanistic transparency into laptops.
The story I took away from this post is “we do end-to-end training with regularization for modularity, and then we get out a neural net with modular structure. We then need to understand this neural net mechanistically to ensure it isn’t dangerous”. This seems much more analogous to needing to mechanistically understand a laptop that “fell out of the sky one day” before we had ever made a laptop.
My critiques are primarily about the second story. My critique of the first story would be that it seems like you’re sacrificing a lot of competitiveness by having to develop the modules one at a time, instead of using end-to-end training.
You could imagine a synthesis of the two stories: train a medium-level smart thing end-to-end, look at what all the modules are doing, and use those modules when training a smarter thing.