Sure, I can always offer my own interpretations, but the whole idea was to minimize that as much as possible. I can rationalize anything. Watch: Milk consumption is negatively correlated with income inequality. Drinking less milk leads to stunted intelligence, resulting in a rise in income inequality. Or income inequality leads to a drop in milk consumption among poor families. Or the alien warlord Thon-Gul hates milk and equal incomes.
What conditions must my goal satisfy in order to qualify as a “well-defined goal”? Have I made any actual (meaning technical) mistakes so far? (Anyway, thanks for reminding me to check for temporal stability. I should write a script to scrape the data off pdfs. (Never mind, I found a library.))
the whole idea was to minimize that as much as possible
I believe this idea to be misguided. The point of the process is to understand. You can’t understand without “interpretation”—looking for just the biggest numbers inevitably leads you astray.
The issue isn’t what you can rationalize—“don’t be stupid” is still the baseline, level zero criterion.
What conditions must my goal satisfy in order to qualify as a “well-defined goal”?
A specification of what kind of answers will be acceptable and what kind will not.
Have I made any actual (meaning technical) mistakes so far?
Are you asking whether your spaghetti factory mixes flour and water in the right ratio?
Not being stupid is an admirable goal, but it’s not well-defined. I tried Googling “spaghetti factory analysis” and “spaghetti factory analysis statistics” for more information, but it’s not turning up anything. Is there a standard term for the error you are referring to?
Can’t I have my common sense, but make all possible comparisons anyway just to inform my common sense as to the general directions in which the winds of evidence are blowing?
I don’t see how informing myself of correlations harms my common sense in any way, and the only alternative I can think of is to stick to my prejudices, but whenever some doubt arises as to which of my prejudices has a stronger claim, I should thoroughly investigate real world data to settle the dispute between the two. As soon as that process is over, I should stop immediately because nothing else matters.
Not being stupid is an admirable goal, but it’s not well-defined.
It’s not a goal. It is a criterion you should apply to the steps which you intend to take. I admit to it not being well-defined :-)
Is there a standard term for the error you are referring to?
In statistics that used to be called “data mining” and was a bad thing. Data science repurposed the term and it’s now a good thing :-/ Andrew Gelman calls a similar phenomenon “garden of the forking paths” (see e.g. here).
Basically the problem is paying attention to noise.
Can’t I have my common sense, but make all possible comparisons anyway
You can. It’s just that you shouldn’t attach undue importance to which comparison came the first and which the second. You’re generating estimates and at the very minimum you should also be generating what you think are the errors of your estimates—these should be helpful in establishing how meaningful your ranking of all the pairs is.
And you still need to define a goal. For example, a goal of explanation/understanding is different from the goal of forecasting.
I’m not telling you to ignore the data. I’m telling you to be sceptical of what the data is telling you.
Thank you! Those data mining algorithms are exactly what I was looking for.
(Personally, I would describe the situation you are warning me against as reducing it “more than is possible” rather than “as much as possible”. I am definitely in favor of using common sense.)
Sure, I can always offer my own interpretations, but the whole idea was to minimize that as much as possible. I can rationalize anything. Watch: Milk consumption is negatively correlated with income inequality. Drinking less milk leads to stunted intelligence, resulting in a rise in income inequality. Or income inequality leads to a drop in milk consumption among poor families. Or the alien warlord Thon-Gul hates milk and equal incomes.
What conditions must my goal satisfy in order to qualify as a “well-defined goal”? Have I made any actual (meaning technical) mistakes so far? (Anyway, thanks for reminding me to check for temporal stability. I should write a script to scrape the data off pdfs. (Never mind, I found a library.))
I believe this idea to be misguided. The point of the process is to understand. You can’t understand without “interpretation”—looking for just the biggest numbers inevitably leads you astray.
The issue isn’t what you can rationalize—“don’t be stupid” is still the baseline, level zero criterion.
A specification of what kind of answers will be acceptable and what kind will not.
Are you asking whether your spaghetti factory mixes flour and water in the right ratio?
Not being stupid is an admirable goal, but it’s not well-defined. I tried Googling “spaghetti factory analysis” and “spaghetti factory analysis statistics” for more information, but it’s not turning up anything. Is there a standard term for the error you are referring to?
Can’t I have my common sense, but make all possible comparisons anyway just to inform my common sense as to the general directions in which the winds of evidence are blowing?
I don’t see how informing myself of correlations harms my common sense in any way, and the only alternative I can think of is to stick to my prejudices, but whenever some doubt arises as to which of my prejudices has a stronger claim, I should thoroughly investigate real world data to settle the dispute between the two. As soon as that process is over, I should stop immediately because nothing else matters.
Is that the course of action you recommend?
It’s not a goal. It is a criterion you should apply to the steps which you intend to take. I admit to it not being well-defined :-)
In statistics that used to be called “data mining” and was a bad thing. Data science repurposed the term and it’s now a good thing :-/ Andrew Gelman calls a similar phenomenon “garden of the forking paths” (see e.g. here).
Basically the problem is paying attention to noise.
You can. It’s just that you shouldn’t attach undue importance to which comparison came the first and which the second. You’re generating estimates and at the very minimum you should also be generating what you think are the errors of your estimates—these should be helpful in establishing how meaningful your ranking of all the pairs is.
And you still need to define a goal. For example, a goal of explanation/understanding is different from the goal of forecasting.
I’m not telling you to ignore the data. I’m telling you to be sceptical of what the data is telling you.
Thank you! Those data mining algorithms are exactly what I was looking for.
(Personally, I would describe the situation you are warning me against as reducing it “more than is possible” rather than “as much as possible”. I am definitely in favor of using common sense.)