Second, I don’t believe you. I say it’s always smarter to use the partitioned data than the aggregate data. If you have a data set that includes the gender of the subject, you’re always better off building two models (one for each gender) instead of one big model. Why throw away information?
Because, as Von Neumann was supposed to have said, “with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” Unless your data is good enough to support the existence of the other factors, or you have other data available that does so, a model you fit to the lowest-level data is likely to capture more noise than reality.
Right, so the challenge is to incorporate as much auxiliary information as possible without overfitting. That’s what AdaBoost does—if you run it for T rounds, the complexity of the model you get is linear in T, not exponential as you would get from fitting the model to the finest partitions.
This is in general one of the advantages of Bayesian statistics in that you can split the line between aggregate and separated data with techniques that automatically include partial pooling and information sharing between various levels of the analysis. (See pretty much anything written by Andrew Gelman, but Bayesian Data Analysis is a great book to cover Gelman’s whole perspective.)
Because, as Von Neumann was supposed to have said, “with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” Unless your data is good enough to support the existence of the other factors, or you have other data available that does so, a model you fit to the lowest-level data is likely to capture more noise than reality.
Right, so the challenge is to incorporate as much auxiliary information as possible without overfitting. That’s what AdaBoost does—if you run it for T rounds, the complexity of the model you get is linear in T, not exponential as you would get from fitting the model to the finest partitions.
This is in general one of the advantages of Bayesian statistics in that you can split the line between aggregate and separated data with techniques that automatically include partial pooling and information sharing between various levels of the analysis. (See pretty much anything written by Andrew Gelman, but Bayesian Data Analysis is a great book to cover Gelman’s whole perspective.)