Overall I haven’t thought about it that much but it seems interesting. (I thought your NTK summary was good.)
With respect to alignment, the main lesson I’ve taken away is to be careful about intuitions that come from “building up structure slowly,” you should at least check that all of your methods work fine in the local linear regime where in some sense everything is in there at the start and you are just perturbing weights a tiny bit. I think this has been useful for perspective. In some sense it’s something you think about automatically when focusing on the worst case, but it’s still nice to know which parts of the worst case are actually real and I think I used to overlook some of these issues more.
In practice it seems like the number of datapoints is large relative to the width, and in fact it’s quite valuable to take multiple gradient descent steps even if your initialization is quite careful. So it doesn’t seem like you can actually make the NTK simplification, i.e. you still have to deal with the additional challenges posed by long optimization paths. I’d want to think about this much more if there was a proposal that appeared to apply for the NTK but not for general neural networks (and I think that alignment for the NTK is a reasonable thing for people to think about though I don’t see a way to get more traction than on the general case); in that case it feels unlikely that the proposal would apply directly but it would still be a suggestive hint.
More broadly, I do also think that understanding how neural networks behave is helpful for alignment (in the same ballpark as empirical work trying to e.g. more deeply understand how neural networks generalize in practice). I’m less excited about it than trying to resolve the problem for our current understanding of neural networks. Part of the reason is that my current conception of the alignment problem for neural networks seems to be extremely similar to our understanding for e.g. random program search, suggesting that a lot of what we are dealing with are pretty fundamental issues that probably won’t change qualitatively unless we have a giant shift in our understanding of neural networks (though I think this might change as we make further progress on alignment.)
Overall I haven’t thought about it that much but it seems interesting. (I thought your NTK summary was good.)
With respect to alignment, the main lesson I’ve taken away is to be careful about intuitions that come from “building up structure slowly,” you should at least check that all of your methods work fine in the local linear regime where in some sense everything is in there at the start and you are just perturbing weights a tiny bit. I think this has been useful for perspective. In some sense it’s something you think about automatically when focusing on the worst case, but it’s still nice to know which parts of the worst case are actually real and I think I used to overlook some of these issues more.
In practice it seems like the number of datapoints is large relative to the width, and in fact it’s quite valuable to take multiple gradient descent steps even if your initialization is quite careful. So it doesn’t seem like you can actually make the NTK simplification, i.e. you still have to deal with the additional challenges posed by long optimization paths. I’d want to think about this much more if there was a proposal that appeared to apply for the NTK but not for general neural networks (and I think that alignment for the NTK is a reasonable thing for people to think about though I don’t see a way to get more traction than on the general case); in that case it feels unlikely that the proposal would apply directly but it would still be a suggestive hint.
More broadly, I do also think that understanding how neural networks behave is helpful for alignment (in the same ballpark as empirical work trying to e.g. more deeply understand how neural networks generalize in practice). I’m less excited about it than trying to resolve the problem for our current understanding of neural networks. Part of the reason is that my current conception of the alignment problem for neural networks seems to be extremely similar to our understanding for e.g. random program search, suggesting that a lot of what we are dealing with are pretty fundamental issues that probably won’t change qualitatively unless we have a giant shift in our understanding of neural networks (though I think this might change as we make further progress on alignment.)