First, the Epsilon Fallacy: the idea that effects are the result of many tiny causes adding up. In practice, 80⁄20 is a thing, and most things most of the time do have a small number of “main” root causes which account for most of the effect. So it’s not necessarily wrong to look for “exactly one cause”—as in e.g. optimizing runtime of a program, there’s often one cause which accounts for most of the effect. In the “logical-and” case you mention, I’d usually expect to see either
most of the things in the and-clause don’t actually vary much in the population (i.e. most of them are almost always true or almost always false), and just one or two account for most of the variance, OR
a bunch of the things in the and-clause are highly correlated due to some underlying cause.
Of course there are exceptions to this, in particular for traits under heavy selection pressure—if we always hammer down the nail that sticks out, then all the nails end up at around the same height. If we repeatedly address bottlenecks/limiting factors in a system, then all limiting factors will end up roughly equally limiting, and 80⁄20 doesn’t happen.
Second: the right “language” in which to think about this sort of thing is not flat boolean logic (i.e. “effect = (A or B) and C and D”) but rather causal diagrams. The sort of medical studies you mention—i.e. “saliva is a risk factor for cancer but only if taken orally in small doses over a long period of time”—are indeed pretty dumb, but the fix is not to look for a giant and-clause of conditions which result in the effect. The fix is to build a gears-level model of the system, figure out the whole internal cause-and-effect graph.
Right, one could expand the clause indefinitely, that is kind of what I meant by “can only find what you are looking for”. But that only means it is hard, not that it is bad to think that way.
I do neither think of it as logic nor as causal diagrams nor Bayesian nor Markov diagrams but simply as sets of some member type that may have any number of features/properties/attributes that make them a member of some subset.
When I wrote “A AND B” I wanted you to understand it as a dual logic clause, but only for simplicity.
The way I really think about it is: attribute magnitude to impact function and then some form of interaction function that is neither only AND nor OR but possibly both to some degree. We have to deal with negative correlation in some way, I do not see how that is possible if it is always OR.
right “language” in which to think [is] causal diagrams
They are nice on paper but I can not see how they are useful. To me they seem like some synthetic made up way to get the result, unfit to model the world. “If the world would not be as it is, it would be mathematically correct to do this.” is so academic. As far as I understand it, the graph can not be cyclic. Since you do not know if the graph is cyclic and what factors are in the cycle you do not know which factors you must treat as an aggregate. The only directions known are those that go into the graph.
There is only one joint probability for cases where there were multiple causal paths to one feature/property.
Think of a hospital. Sick people go to hospitals, but sometimes people in a hospital will catch an infection that is only typical in hospitals.
A= person is sick B= person is in hospital C= person has hospital infection C is a subset of A A causes B B causes C
How do you work with that?
“the fix is not to look for a giant and-clause of conditions [but] to build a gears-level model of the system, figure out the whole internal cause-and-effect graph”
I thought that was what I was suggesting. Instead of stopping at: “It has to do with gears.” keep going to get more specific, find subsets of things with gears: “gear AND oval-shape AND a sprocket is missing AND there is a cardan shaft AND …” But if indeed only things with gears are affected do not expand with “gears AND needs oil” because that already follows from gears.
I think of “gears-level model” and “causal DAG” as usually synonymous. There are some arguable exceptions—e.g. some non-DAG markov models are arguably gears-level—but DAGs are the typical use case.
The obvious objection to this idea is “what about feedback loops?”, and the answer is “it’s still a causal DAG when you expand over time”—and that’s exactly what gears-level understanding of a feedback loop requires. Same with undirected markov models: they typically arise from DAG models with some of the nodes unobserved; a gears-level model hypothesizes what those hidden factors are. The hospital example includes both of these: a feedback loop, with some nodes unobserved. But if you expand out the actual gears-level model, distinguishing between different people with different diseases at different times, then it all looks DAG-shaped; the observed data just doesn’t include most of those nodes.
This generalizes: the physical world is always DAG-shaped, on a fundamental level. Everything else is an abstraction on top of that, and it can always be grounded in DAGs if needed.
Instead of stopping at: “It has to do with gears.” keep going to get more specific, find subsets of things with gears: “gear AND oval-shape AND a sprocket is missing AND there is a cardan shaft AND …” But if indeed only things with gears are affected do not expand with “gears AND needs oil” because that already follows from gears.
The advantage of using causal DAGs for our model, even when most of the nodes are not observed, is that it tells us which things need to be included in the AND-clauses and which do not. For instance, “gear AND oval-shaped” vs “gear AND needs oil”—the idea that the second can be ignored “because that already follows from gears” is a fact which derives from DAG structure. For a large model, there’s an exponential number of logical clauses which we could form; a DAG gives formal rules for which clauses are relevant to our analysis.
Two relevant things.
First, the Epsilon Fallacy: the idea that effects are the result of many tiny causes adding up. In practice, 80⁄20 is a thing, and most things most of the time do have a small number of “main” root causes which account for most of the effect. So it’s not necessarily wrong to look for “exactly one cause”—as in e.g. optimizing runtime of a program, there’s often one cause which accounts for most of the effect. In the “logical-and” case you mention, I’d usually expect to see either
most of the things in the and-clause don’t actually vary much in the population (i.e. most of them are almost always true or almost always false), and just one or two account for most of the variance, OR
a bunch of the things in the and-clause are highly correlated due to some underlying cause.
Of course there are exceptions to this, in particular for traits under heavy selection pressure—if we always hammer down the nail that sticks out, then all the nails end up at around the same height. If we repeatedly address bottlenecks/limiting factors in a system, then all limiting factors will end up roughly equally limiting, and 80⁄20 doesn’t happen.
Second: the right “language” in which to think about this sort of thing is not flat boolean logic (i.e. “effect = (A or B) and C and D”) but rather causal diagrams. The sort of medical studies you mention—i.e. “saliva is a risk factor for cancer but only if taken orally in small doses over a long period of time”—are indeed pretty dumb, but the fix is not to look for a giant and-clause of conditions which result in the effect. The fix is to build a gears-level model of the system, figure out the whole internal cause-and-effect graph.
Right, one could expand the clause indefinitely, that is kind of what I meant by “can only find what you are looking for”. But that only means it is hard, not that it is bad to think that way.
I do neither think of it as logic nor as causal diagrams nor Bayesian nor Markov diagrams but simply as sets of some member type that may have any number of features/properties/attributes that make them a member of some subset.
When I wrote “A AND B” I wanted you to understand it as a dual logic clause, but only for simplicity.
The way I really think about it is: attribute magnitude to impact function and then some form of interaction function that is neither only AND nor OR but possibly both to some degree. We have to deal with negative correlation in some way, I do not see how that is possible if it is always OR.
They are nice on paper but I can not see how they are useful. To me they seem like some synthetic made up way to get the result, unfit to model the world. “If the world would not be as it is, it would be mathematically correct to do this.” is so academic. As far as I understand it, the graph can not be cyclic. Since you do not know if the graph is cyclic and what factors are in the cycle you do not know which factors you must treat as an aggregate. The only directions known are those that go into the graph.
There is only one joint probability for cases where there were multiple causal paths to one feature/property.
Think of a hospital. Sick people go to hospitals, but sometimes people in a hospital will catch an infection that is only typical in hospitals.
A= person is sick
B= person is in hospital
C= person has hospital infection
C is a subset of A
A causes B
B causes C
How do you work with that?
“the fix is not to look for a giant and-clause of conditions [but] to build a gears-level model of the system, figure out the whole internal cause-and-effect graph”
I thought that was what I was suggesting. Instead of stopping at: “It has to do with gears.” keep going to get more specific, find subsets of things with gears: “gear AND oval-shape AND a sprocket is missing AND there is a cardan shaft AND …” But if indeed only things with gears are affected do not expand with “gears AND needs oil” because that already follows from gears.
I think of “gears-level model” and “causal DAG” as usually synonymous. There are some arguable exceptions—e.g. some non-DAG markov models are arguably gears-level—but DAGs are the typical use case.
The obvious objection to this idea is “what about feedback loops?”, and the answer is “it’s still a causal DAG when you expand over time”—and that’s exactly what gears-level understanding of a feedback loop requires. Same with undirected markov models: they typically arise from DAG models with some of the nodes unobserved; a gears-level model hypothesizes what those hidden factors are. The hospital example includes both of these: a feedback loop, with some nodes unobserved. But if you expand out the actual gears-level model, distinguishing between different people with different diseases at different times, then it all looks DAG-shaped; the observed data just doesn’t include most of those nodes.
This generalizes: the physical world is always DAG-shaped, on a fundamental level. Everything else is an abstraction on top of that, and it can always be grounded in DAGs if needed.
The advantage of using causal DAGs for our model, even when most of the nodes are not observed, is that it tells us which things need to be included in the AND-clauses and which do not. For instance, “gear AND oval-shaped” vs “gear AND needs oil”—the idea that the second can be ignored “because that already follows from gears” is a fact which derives from DAG structure. For a large model, there’s an exponential number of logical clauses which we could form; a DAG gives formal rules for which clauses are relevant to our analysis.