statistical correlations aren’t worth making anything of unless they indicate mechanisms.
tell it to the statistics establishment. Methinks I can make better predictions using not-causally-explained statistics than I can without. For example, If I learn of a person who is black and american, I can predict that he is 5x (or whatever it is) more likely to be in prison. I can predict that he is more likely to be a part of that awful antisocial gansta culture.
Of course, if I then learn that at this very moment, he is wearing a cardigan, a lot of that goes away.
If you restrict yourself to causal models, you do very poorly. I might even be tempted to say “I guess you’re fucked then”
If you throw out information you have reason to believe is true but can’t explain the mechanism for your model is more coherent but less powerful. Does that make sense?
No. How exactly are you defining a causal vs a statistical model? What I find confusing is in the Newtonian physics limit of what you can know, I don’t think you can do better than a causal model, in some sense. I understand that it can happen that non-causal models can predict better if knowledge is not complete, I am just trying to find a way to state that formally.
Let’s talk about fluid dynamics. In FD, we have many equations that were determined by measuring things and approximating their relationship. For example, the darcy weisbach equation for drag in a pipe: dP = fd*L/D*rho*v^2/2. This equation (and other like it) is called a corellation, or an empirical equation, as opposed to a theoretical model. To demonstrate the power of corellations, consider that we still can’t predict fd from theory (except for laminar flow). At this point, it’s just a lack of computing power, the use of which would be esentially the same as measurement anyways. There were times in the past, though, where we didn’t know even in principle how to get that from theory.
Bascially, you need to be able to look at the world and describe what you see, even if you can’t explain it. If we’d taken the policy of ignoring corellations that couldn’t be understood causally, we still wouldn’t have airplanes, plumbing, engines, etc.
I don’t think these sorts of equations are good examples of what you are trying to say, since laws of physics and related equations are counterfactual and thus causal. That is, if I were to counterfactually change the length of the pipe in your equation, it would still predict the loss correctly. Invariance to change is precisely what makes these kinds of equations useful and powerful, and this invariance is causal. The fact that the equation is ‘ad hoc’ rather than deduced from a theory is irrelevant to whether the equation is causal or not. Causality has to do with counterfactual invariance (see also Hume’s counterfactual definition).
I think a better example would be something like the crazy “expert voting” algorithm that won the Netflix prize. I think in that case, though, given sufficient knowledge, a causal model would do better. Not because it was causal, mind you, but just because observing enough about the domain gives you as a side effect causal knowledge of the domain. In the Netflix prize case, which was about movie recommendations, ‘sufficient knowledge’ would entail having detailed knowledge of decision and preference algorithms of all potential users of the system. At that point, the model becomes so detailed it inevitably encodes causal information.
The people who supply statistics to people who are looking for causal mechanisms.
For example, If I learn of a person who is black and american,
“American” isn’t a race. An american of any race has a n enhanced likelihood of being in jail, becaue the US imprisons a lot of poeple. Have I converted you to Americainism?
tell it to the statistics establishment. Methinks I can make better predictions using not-causally-explained statistics than I can without. For example, If I learn of a person who is black and american, I can predict that he is 5x (or whatever it is) more likely to be in prison. I can predict that he is more likely to be a part of that awful antisocial gansta culture.
Of course, if I then learn that at this very moment, he is wearing a cardigan, a lot of that goes away.
If you restrict yourself to causal models, you do very poorly. I might even be tempted to say “I guess you’re fucked then”
I don’t like this. Not sure why.
Could you clarify what you mean, here?
If you throw out information you have reason to believe is true but can’t explain the mechanism for your model is more coherent but less powerful. Does that make sense?
No. How exactly are you defining a causal vs a statistical model? What I find confusing is in the Newtonian physics limit of what you can know, I don’t think you can do better than a causal model, in some sense. I understand that it can happen that non-causal models can predict better if knowledge is not complete, I am just trying to find a way to state that formally.
Let’s talk about fluid dynamics. In FD, we have many equations that were determined by measuring things and approximating their relationship. For example, the darcy weisbach equation for drag in a pipe:
dP = fd*L/D*rho*v^2/2
. This equation (and other like it) is called a corellation, or an empirical equation, as opposed to a theoretical model. To demonstrate the power of corellations, consider that we still can’t predict fd from theory (except for laminar flow). At this point, it’s just a lack of computing power, the use of which would be esentially the same as measurement anyways. There were times in the past, though, where we didn’t know even in principle how to get that from theory.Bascially, you need to be able to look at the world and describe what you see, even if you can’t explain it. If we’d taken the policy of ignoring corellations that couldn’t be understood causally, we still wouldn’t have airplanes, plumbing, engines, etc.
I don’t think these sorts of equations are good examples of what you are trying to say, since laws of physics and related equations are counterfactual and thus causal. That is, if I were to counterfactually change the length of the pipe in your equation, it would still predict the loss correctly. Invariance to change is precisely what makes these kinds of equations useful and powerful, and this invariance is causal. The fact that the equation is ‘ad hoc’ rather than deduced from a theory is irrelevant to whether the equation is causal or not. Causality has to do with counterfactual invariance (see also Hume’s counterfactual definition).
I think a better example would be something like the crazy “expert voting” algorithm that won the Netflix prize. I think in that case, though, given sufficient knowledge, a causal model would do better. Not because it was causal, mind you, but just because observing enough about the domain gives you as a side effect causal knowledge of the domain. In the Netflix prize case, which was about movie recommendations, ‘sufficient knowledge’ would entail having detailed knowledge of decision and preference algorithms of all potential users of the system. At that point, the model becomes so detailed it inevitably encodes causal information.
The people who supply statistics to people who are looking for causal mechanisms.
“American” isn’t a race. An american of any race has a n enhanced likelihood of being in jail, becaue the US imprisons a lot of poeple. Have I converted you to Americainism?
Culture is culture, not race.