What I want to know is: why is it necessary to come up with ad-hoc solutions for these types of problems instead of using bayesian networks, which were invented precisely for this purpose (and have been wildly successful at it)?
here is a RCT that got A LOT of press six months ago. Roughly, the three arms were: (1) give the person a liter of olive oil per week; (2) give the person 30g of nuts per day; (3) nothing. Nuts and oil equally reduced heart attacks, but only olive oil reduced deaths.
If you look at the table of results, the nuts did have a slightly lower all-cause mortality result than the control group (which brings us back to the power issue).
I think financing is the most constraining part of “where it’s possible”. Drug trials are far more expensive, but easier to finance. Food trials should be cheaper, but more difficult to finance.
Have any researchers tried kickstarter? I think there would be a lot of people who would be willing to pay for this kind of stuff rather than drug research.
We can’t have X causing Y causing Z causing X! At least, not without a time machine. Because of this we constrain the graph to be a directed acyclic graph
The many components of life don’t seem acyclic in their interdependencies. Anyone care to explain?
Sure. In a lot of cases where there seems to be a cyclic dependency, it’s actually an acyclic dependency when unrolled over time. So X1 causes Y1 causes Z1 causes X2 (number denotes timestep).
The simplest model of such type is simply where X1 directly causes X2 which directly causes X3 and so on, which is called a Markov chain, and is a type of Bayesian network.
it’s actually an acyclic dependency when unrolled over time
Yes; by introducing time steps and indexing by “at time tn” you can “always” do away with loops by kind of transforming them into spirals going down in time, from the loops they once were. But I wonder if people don’t seem to mind that the endless spiral still resembles a loop, that “X at time t1” and “X at time t2″ may technically be different nodes, but only in a trivial and uninteresting sense?
You can work DAG magic on such a “linearized” causal structure, I just wonder whether it’s actually cutting reality at its joints, and whether there are (commonly used) alternatives.
You should definitely look at the book “Probabilistic Graphical Models” by Daphne Koller, these concepts and more are explained in depth and with numerous examples. I kind of consider it the spiritual successor to Jayne’s work.
The reason why having a DAG is important is because it makes inference much easier. There are very efficient and exact algorithms for doing inference over DAGs, whereas inference over general graphs can be extremely hard (often taking exponential time). Once you roll out a loop in this fashion, eventually you reach a point where you can assume that X1 has very little influence over, say, X100, due to mixing. http://en.wikipedia.org/wiki/Markov_chain#Ergodicity Thus the network can be ‘grounded’ at X1, breaking the cycle.
If you can convert a cyclic dependency to a hundred acyclic dependencies, it makes sense to do so, and not just because of computational concerns.
Often real-world events really do unroll over time and we have to have some way of modelling time dependencies. Hidden Markov models and the Viterbi algorithm are a good example of this in action.
What I want to know is: why is it necessary to come up with ad-hoc solutions for these types of problems instead of using bayesian networks, which were invented precisely for this purpose (and have been wildly successful at it)?
It’s not necessary. The sort of reasoning Steve is doing here looks to me like putting informative prior distributions on the causal effect parameters. If we run pcalg on the dataset and it tells us “nut consumption causes age,” we’ll probably say “well, something went wrong.” But with a large study size like this, it’s not clear to me that informative priors are going to push the final estimate around by much. Hopefully, the DAG discovery methods will discover common causes, but the priors may be most useful in establishing correlations between the parameters- “if nuts don’t help with X, we don’t expect they help with Y, because of an underlying similarity between X and Y”- but I don’t know if that’s better accomplished by just adding another node to the system.
My addition was eyeballing what the DAG results would look like, which is nowhere near as good as getting the data and running pcalg on it.
What I want to know is: why is it necessary to come up with ad-hoc solutions for these types of problems instead of using bayesian networks, which were invented precisely for this purpose (and have been wildly successful at it)?
The usual way is to create a set of DAGs with various latent variables and figure out the maximum likelihood estimate, such as what Judea Pearl did for analyzing the relationship between smoking and lung cancer: http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/
It’s unlikely you will find any good objections to this here even if they exist.
In most cases where it’s possible, why not just use RCTs? They’re simple to arrange.
A RCT for nut consumption with a power to pick up an effect like this is probably prohibitively expensive to set up in the current funding enviroment.
here is a RCT that got A LOT of press six months ago. Roughly, the three arms were: (1) give the person a liter of olive oil per week; (2) give the person 30g of nuts per day; (3) nothing. Nuts and oil equally reduced heart attacks, but only olive oil reduced deaths.
If you look at the table of results, the nuts did have a slightly lower all-cause mortality result than the control group (which brings us back to the power issue).
I think financing is the most constraining part of “where it’s possible”. Drug trials are far more expensive, but easier to finance. Food trials should be cheaper, but more difficult to finance.
Have any researchers tried kickstarter? I think there would be a lot of people who would be willing to pay for this kind of stuff rather than drug research.
The many components of life don’t seem acyclic in their interdependencies. Anyone care to explain?
Sure. In a lot of cases where there seems to be a cyclic dependency, it’s actually an acyclic dependency when unrolled over time. So X1 causes Y1 causes Z1 causes X2 (number denotes timestep).
The simplest model of such type is simply where X1 directly causes X2 which directly causes X3 and so on, which is called a Markov chain, and is a type of Bayesian network.
Yes; by introducing time steps and indexing by “at time tn” you can “always” do away with loops by kind of transforming them into spirals going down in time, from the loops they once were. But I wonder if people don’t seem to mind that the endless spiral still resembles a loop, that “X at time t1” and “X at time t2″ may technically be different nodes, but only in a trivial and uninteresting sense?
You can work DAG magic on such a “linearized” causal structure, I just wonder whether it’s actually cutting reality at its joints, and whether there are (commonly used) alternatives.
You should definitely look at the book “Probabilistic Graphical Models” by Daphne Koller, these concepts and more are explained in depth and with numerous examples. I kind of consider it the spiritual successor to Jayne’s work.
The reason why having a DAG is important is because it makes inference much easier. There are very efficient and exact algorithms for doing inference over DAGs, whereas inference over general graphs can be extremely hard (often taking exponential time). Once you roll out a loop in this fashion, eventually you reach a point where you can assume that X1 has very little influence over, say, X100, due to mixing. http://en.wikipedia.org/wiki/Markov_chain#Ergodicity Thus the network can be ‘grounded’ at X1, breaking the cycle.
If you can convert a cyclic dependency to a hundred acyclic dependencies, it makes sense to do so, and not just because of computational concerns.
Often real-world events really do unroll over time and we have to have some way of modelling time dependencies. Hidden Markov models and the Viterbi algorithm are a good example of this in action.
I will (… put it on my “do() at some indeterminable time in the near to far future”-list. Sigh.). Thanks.
If you can wait, she may teach it again.
It’s not necessary. The sort of reasoning Steve is doing here looks to me like putting informative prior distributions on the causal effect parameters. If we run pcalg on the dataset and it tells us “nut consumption causes age,” we’ll probably say “well, something went wrong.” But with a large study size like this, it’s not clear to me that informative priors are going to push the final estimate around by much. Hopefully, the DAG discovery methods will discover common causes, but the priors may be most useful in establishing correlations between the parameters- “if nuts don’t help with X, we don’t expect they help with Y, because of an underlying similarity between X and Y”- but I don’t know if that’s better accomplished by just adding another node to the system.
My addition was eyeballing what the DAG results would look like, which is nowhere near as good as getting the data and running pcalg on it.