Does that actually work better than just setting a higher bar for significance? My gut says that data is data and chopping it up cleverly can’t work magic.
Cross validation is actually hugely useful for predictive models. For a simple correlation like this, it’s less of a big deal. But if you are fitting a local linearly weighted regression line for instance, chopping the data up is absolutely standard operating procedure.
Does that actually work better than just setting a higher bar for significance? My gut says that data is data and chopping it up cleverly can’t work magic.
How do you decide for how high to hang your bar for significance? It very hard to estimate how high you have to hang it depending on how you go fishing in your data.
The advantage of the two step procedure is that you are completely free to fish how you want in the first step. There are even cases where you might want a three step procedure.
Does that actually work better than just setting a higher bar for significance? My gut says that data is data and chopping it up cleverly can’t work magic.
Cross validation is actually hugely useful for predictive models. For a simple correlation like this, it’s less of a big deal. But if you are fitting a local linearly weighted regression line for instance, chopping the data up is absolutely standard operating procedure.
How do you decide for how high to hang your bar for significance? It very hard to estimate how high you have to hang it depending on how you go fishing in your data. The advantage of the two step procedure is that you are completely free to fish how you want in the first step. There are even cases where you might want a three step procedure.