I think it very clearly advantages alignment over capabilities—understanding SGD’s inductive biases is one of the primary bottlenecks for inner alignment imo.
The stuff linked here is pretty old, though, e.g. this stuff predates Mingard et al..
P.s. the main thing I have taken so far from the link you posted is that the important part is not exactly about the biases of SGD. Rather, it is about the structure of the DNN itself; the algorithm used to find a (local) optimum plays less of a role than the overall structure. But probably I’m reading too much into your precise phrasing.
I think it very clearly advantages alignment over capabilities—understanding SGD’s inductive biases is one of the primary bottlenecks for inner alignment imo.
The stuff linked here is pretty old, though, e.g. this stuff predates Mingard et al..
Thanks very much for the link!
P.s. the main thing I have taken so far from the link you posted is that the important part is not exactly about the biases of SGD. Rather, it is about the structure of the DNN itself; the algorithm used to find a (local) optimum plays less of a role than the overall structure. But probably I’m reading too much into your precise phrasing.