I don’t remember the exact quote, but some sculptor described their art as “the statue is already there inside the stone, I just remove the extra pieces”.
And, I guess we can agree that this statement is in some technical sense true, but of course completely misses the point (intentionally, to sound deep). More precisely, it could be said that the essence of the statue—the information that makes it different from a random piece of stone or whatever material—was all added by the act of “removing the extra pieces”, and none of it was there at the beginning (except for the trivial constraints, such as that the original stone must be larger than the intended statue).
My question is, how much “the network is already there, ML just removes the extra pieces” is a statement of this type?
Assuming that the “start with lots of models then remove the bad ones” picture is accurate, the fact that NNs work in practice implies that the initial set of models matters a lot. The vast majority of models consistent with the data would not generalize well to future data points, unless the models are biased towards models-which-generalize-well (simple models, for instance) from the start.
Imagine if you start with 100x stone and kinda quit any piece that doesn’t seem to be going pretty well when you half ass it and are not too picky about the final result. That’s a rough rough estimate.
I don’t remember the exact quote, but some sculptor described their art as “the statue is already there inside the stone, I just remove the extra pieces”.
And, I guess we can agree that this statement is in some technical sense true, but of course completely misses the point (intentionally, to sound deep). More precisely, it could be said that the essence of the statue—the information that makes it different from a random piece of stone or whatever material—was all added by the act of “removing the extra pieces”, and none of it was there at the beginning (except for the trivial constraints, such as that the original stone must be larger than the intended statue).
My question is, how much “the network is already there, ML just removes the extra pieces” is a statement of this type?
Assuming that the “start with lots of models then remove the bad ones” picture is accurate, the fact that NNs work in practice implies that the initial set of models matters a lot. The vast majority of models consistent with the data would not generalize well to future data points, unless the models are biased towards models-which-generalize-well (simple models, for instance) from the start.
I’ve heard it attributed to Michelangelo, but that may be apocryphal.
Imagine if you start with 100x stone and kinda quit any piece that doesn’t seem to be going pretty well when you half ass it and are not too picky about the final result. That’s a rough rough estimate.