This is only one step toward a correct theory of inductive bias. I would say that “clear and important implications” will only come weeks from now, when we are much less confused and have run more experiments. The main audience for this post is researchers whose work is directly adjacent to inductive bias and training dynamics. If you don’t need gears-level insights on this topic, I would say the tl;dr is: “Circuit simplicity seems kind of wrong; there’s a cool connection between information loss and basin flatness which is probably better but maybe still very predictive; experiments are surprising so far; stay posted for more in ~2 weeks.”
Ah okay—I have updated positively in terms of the usefulness based on that description, and have updated positively on the hypothesis “I am missing a lot of important information that contextualizes this project,” though still confused.
Would be interested to know the causal chain from understanding circuit simplicity to the future being better, but maybe I should just stay posted (or maybe there is a different post I should read that you can link me to; or maybe the impact is diffuse and talking about any particular path doesn’t make that much sense [though even in this case my guess is that it is still helpful to have at least one possible impact story]).
Also, just want to make clear that I made my original comment because I figured sharing my user-experience would be helpful (e.g. via causing a sentence about the ToC), and hopefully not with the effect of being discouraging / being a downer.
This is only one step toward a correct theory of inductive bias. I would say that “clear and important implications” will only come weeks from now, when we are much less confused and have run more experiments.
The main audience for this post is researchers whose work is directly adjacent to inductive bias and training dynamics. If you don’t need gears-level insights on this topic, I would say the tl;dr is: “Circuit simplicity seems kind of wrong; there’s a cool connection between information loss and basin flatness which is probably better but maybe still very predictive; experiments are surprising so far; stay posted for more in ~2 weeks.”
Ah okay—I have updated positively in terms of the usefulness based on that description, and have updated positively on the hypothesis “I am missing a lot of important information that contextualizes this project,” though still confused.
Would be interested to know the causal chain from understanding circuit simplicity to the future being better, but maybe I should just stay posted (or maybe there is a different post I should read that you can link me to; or maybe the impact is diffuse and talking about any particular path doesn’t make that much sense [though even in this case my guess is that it is still helpful to have at least one possible impact story]).
Also, just want to make clear that I made my original comment because I figured sharing my user-experience would be helpful (e.g. via causing a sentence about the ToC), and hopefully not with the effect of being discouraging / being a downer.