Oops yes. That’s the weaker claim, that I agree with. The stronger claim is that because we can’t understand something “all at once” then mechanistic transparency is too hard and so we shouldn’t take Daniel’s approach. But the way we understand laptops is also in a mechanistic sense. No one argues that because laptops are too hard to understand all at once, then we should’t try to understand them mechanistically.
This seems to be assuming that we have to be able to take any complex trained AGI-as-a-neural-net and determine whether or not it is dangerous. Under that assumption, I agree that the problem is itself very hard, and mechanistic transparency is not uniquely bad relative to other possibilities.
I didn’t assume that. I objected to the specific example of a laptop as an instance of mechanistic transparency being too hard. Laptops are normally understood well because understanding can be broken into components and built up from abstractions. But each our understanding of each component and abstraction is pretty mechanistic—and this understanding is useful.
Furthermore, because laptops did not fall out of the sky one day, but instead slowly built over successive years of research and development, it seems like a great example of how Daniel’s mechanistic transparency approach does not rely on us having to understand arbitrary systems. Just as we built up an understanding of laptops, presumably we could do the same with neural networks. This was my interpretation of why he is using Zoom In as an example.
All of the other stories for preventing catastrophe that I mentioned in the grandparent are tackling a hopefully easier problem than “detect whether an arbitrary neural net is dangerous”.
Indeed, but I don’t think this was the crux of my objection.
The story you have is “the developers build a few small neural net modules that do one thing, mechanistically understand those modules, then use those modules to build newer modules that do ‘bigger’ things, and mechanistically understand those, and keep iterating this until they have an AGI”. Does that sound right to you? If so, I agree that by following such a process the developer team could get mechanistic transparency into the neural net the same way that laptop-making companies have mechanistic transparency into laptops.
The story I took away from this post is “we do end-to-end training with regularization for modularity, and then we get out a neural net with modular structure. We then need to understand this neural net mechanistically to ensure it isn’t dangerous”. This seems much more analogous to needing to mechanistically understand a laptop that “fell out of the sky one day” before we had ever made a laptop.
My critiques are primarily about the second story. My critique of the first story would be that it seems like you’re sacrificing a lot of competitiveness by having to develop the modules one at a time, instead of using end-to-end training.
You could imagine a synthesis of the two stories: train a medium-level smart thing end-to-end, look at what all the modules are doing, and use those modules when training a smarter thing.
Oops yes. That’s the weaker claim, that I agree with. The stronger claim is that because we can’t understand something “all at once” then mechanistic transparency is too hard and so we shouldn’t take Daniel’s approach. But the way we understand laptops is also in a mechanistic sense. No one argues that because laptops are too hard to understand all at once, then we should’t try to understand them mechanistically.
I didn’t assume that. I objected to the specific example of a laptop as an instance of mechanistic transparency being too hard. Laptops are normally understood well because understanding can be broken into components and built up from abstractions. But each our understanding of each component and abstraction is pretty mechanistic—and this understanding is useful.
Furthermore, because laptops did not fall out of the sky one day, but instead slowly built over successive years of research and development, it seems like a great example of how Daniel’s mechanistic transparency approach does not rely on us having to understand arbitrary systems. Just as we built up an understanding of laptops, presumably we could do the same with neural networks. This was my interpretation of why he is using Zoom In as an example.
Indeed, but I don’t think this was the crux of my objection.
Okay, I think I see the miscommunication.
The story you have is “the developers build a few small neural net modules that do one thing, mechanistically understand those modules, then use those modules to build newer modules that do ‘bigger’ things, and mechanistically understand those, and keep iterating this until they have an AGI”. Does that sound right to you? If so, I agree that by following such a process the developer team could get mechanistic transparency into the neural net the same way that laptop-making companies have mechanistic transparency into laptops.
The story I took away from this post is “we do end-to-end training with regularization for modularity, and then we get out a neural net with modular structure. We then need to understand this neural net mechanistically to ensure it isn’t dangerous”. This seems much more analogous to needing to mechanistically understand a laptop that “fell out of the sky one day” before we had ever made a laptop.
My critiques are primarily about the second story. My critique of the first story would be that it seems like you’re sacrificing a lot of competitiveness by having to develop the modules one at a time, instead of using end-to-end training.
You could imagine a synthesis of the two stories: train a medium-level smart thing end-to-end, look at what all the modules are doing, and use those modules when training a smarter thing.