Eventually it makes sense, I promise. “Bayesianism” in the sense of keeping track of every hypothesis is very computationally expensive—modern algorithms only keep track of a very small number of hypotheses (only those representable by a neural network [or what have you], and even then only those required to do gradient descent). This fact opens you up to the overfitting problem, where the simplest perfect hypothesis in your space actually has very little information about the true external reality. You need some way of throwing away the parts of the signal that your model wasn’t going to figure out anyhow.
For this reason among others, modern machine learning algorithms often have a lot of settings that have to be set by smarter systems (humans), before your algorithm can actually learn a novel domain. These settings reflect how the properties of the domain interact with properties of your algorithm (e.g. how many resources the algorithm has to commit before it can expect to have found something good, or what degree of noise the algorithm has to learn to throw away). These are those “hyperparameter” things. Cross-validation is just an empirical tool that helps humans figure out the right settings. You can probably figure out why it’s expected to work.
I upvoted because I understand the rationale, I understand the explanation, I just rather wish that a book whose purpose is to teach the subject wouldn’t be so… ad hoc.
Eventually it makes sense, I promise. “Bayesianism” in the sense of keeping track of every hypothesis is very computationally expensive—modern algorithms only keep track of a very small number of hypotheses (only those representable by a neural network [or what have you], and even then only those required to do gradient descent). This fact opens you up to the overfitting problem, where the simplest perfect hypothesis in your space actually has very little information about the true external reality. You need some way of throwing away the parts of the signal that your model wasn’t going to figure out anyhow.
For this reason among others, modern machine learning algorithms often have a lot of settings that have to be set by smarter systems (humans), before your algorithm can actually learn a novel domain. These settings reflect how the properties of the domain interact with properties of your algorithm (e.g. how many resources the algorithm has to commit before it can expect to have found something good, or what degree of noise the algorithm has to learn to throw away). These are those “hyperparameter” things. Cross-validation is just an empirical tool that helps humans figure out the right settings. You can probably figure out why it’s expected to work.
I upvoted because I understand the rationale, I understand the explanation, I just rather wish that a book whose purpose is to teach the subject wouldn’t be so… ad hoc.