Huh, interesting! So the way I’m thinking about this is, your loss landscape determines the attractor/repellor structure of your phase space (= network parameter space). For a (reasonable) optimization algorithm to have chaotic behavior on that landscape, it seems like the landscape would either have to have 1) a positive-measure flat region, on which the dynamics were ergodic, or 2) a strange attractor, which seems more plausible.
I’m not sure how that relates to the above link; it mentions the parameters “diverging”, but it’s not clear to me how neural network weights can diverge; aren’t they bounded?
Sometimes!
https://sohl-dickstein.github.io/2024/02/12/fractal.html
Huh, interesting! So the way I’m thinking about this is, your loss landscape determines the attractor/repellor structure of your phase space (= network parameter space). For a (reasonable) optimization algorithm to have chaotic behavior on that landscape, it seems like the landscape would either have to have 1) a positive-measure flat region, on which the dynamics were ergodic, or 2) a strange attractor, which seems more plausible.
I’m not sure how that relates to the above link; it mentions the parameters “diverging”, but it’s not clear to me how neural network weights can diverge; aren’t they bounded?