In traditional decision theory as proposed by bayesians such as Jaynes, you always condition on all observed data. The thing that tells you whether any of this observed data is actually relevant is your model, and it does this by outputting a joint probability distribution for your situation conditional on all that data. (What I mean by “model” here is expressed in the language of probability as a prior joint distribution P(your situation × dataset | model), or equivalently a conditional distribution P(your situation | dataset, model) if you don’t care about computing the prior probabilities of your data.)
Option 2 is what I call “blindly importing related historical data as if it was a true description of your situation”. Clearly any model that says that the joint probability for your situation is identically equal to the empirical frequencies in any random data set is wrong.
From the English names, we can probably get those right. If they’re unlabeled columns in a matrix or nodes in a graph, we’ll have trouble.
The point is, it’s not about figuring stuff out from English names. It’s about having a model that correctly generalises from observed data to predictions. Unlabeled columns in a matrix are no trouble at all if your model relates them to the nodes in your personal situation in the right way.
The CDT solution of turning the problem into a causal graph and calculating probabilities with do(·) is effectively just such a model, that admittedly happens to be an elegant and convenient one. Here the information that allows you to generalise from observed data to make personal predictions is introduced when you use your human intelligence to figure out a causal graph for the situation.
Still, none of this addresses the issue that the problem itself is underspecified.
ETA: Lest you think I’ve just said that CDT is better than EDT, the point I’m trying to make here is that if you want a decision theory to generalise from data, you need to provide a model. “Your situation has the same probabilities as a causal intervention on this causal graph on that dataset, where nodes {A, B, C, …} match up to nodes {X, Y, Z, …}” is as good a model as any, and can certainly be used in EDT. The fact that EDT doesn’t come “model included” is a feature, not a bug.
Option 2 is what I call “blindly importing related historical data as if it was a true description of your situation”. Clearly any model that says that the joint probability for your situation is identically equal to the empirical frequencies in any random data set is wrong.
Agreed that this is a bad idea. I think where we disagree is that I don’t see EDT as discouraging this. It doesn’t even throw a type error when you give it blindly imported related historical data! CDT encourages you to actually think about causality before making any decisions.
It’s about having a model that correctly generalises from observed data to predictions.
Note that decision theory does actually serve a slightly different role from a general prediction module, because it should be built specifically for counterfactual reasoning. The five-and-ten argument seems to be an example of this: if while observing another agent, you see them choose $5 over $10, it could be reasonable to update towards them preferring $5 to $10. If considering the hypothetical situation where you choose $5 instead of $10, it does not make sense to update towards yourself preferring $5 to $10, or to draw whatever conclusion you like by the principle of explosion.
that admittedly happens to be an elegant and convenient one.
Given that you can emulate one system using the other, I think that elegance and convenience are the criteria we should use to choose between them. Note that emulating a joint probability without causal knowledge using a causal network is trivial- you just use undirected edges for any correlations- but emulating a causal network using a joint probability is difficult.
“Your situation has the same probabilities as a causal intervention on this causal graph on that dataset, where nodes {A, B, C, …} match up to nodes {X, Y, Z, …}” is as good a model as any, and can certainly be used in EDT. The fact that EDT doesn’t come “model included” is a feature, not a bug.
In traditional decision theory as proposed by bayesians such as Jaynes, you always condition on all observed data. The thing that tells you whether any of this observed data is actually relevant is your model, and it does this by outputting a joint probability distribution for your situation conditional on all that data. (What I mean by “model” here is expressed in the language of probability as a prior joint distribution
P(your situation × dataset | model)
, or equivalently a conditional distributionP(your situation | dataset, model)
if you don’t care about computing the prior probabilities of your data.)Option 2 is what I call “blindly importing related historical data as if it was a true description of your situation”. Clearly any model that says that the joint probability for your situation is identically equal to the empirical frequencies in any random data set is wrong.
The point is, it’s not about figuring stuff out from English names. It’s about having a model that correctly generalises from observed data to predictions. Unlabeled columns in a matrix are no trouble at all if your model relates them to the nodes in your personal situation in the right way.
The CDT solution of turning the problem into a causal graph and calculating probabilities with
do(·)
is effectively just such a model, that admittedly happens to be an elegant and convenient one. Here the information that allows you to generalise from observed data to make personal predictions is introduced when you use your human intelligence to figure out a causal graph for the situation.Still, none of this addresses the issue that the problem itself is underspecified.
ETA: Lest you think I’ve just said that CDT is better than EDT, the point I’m trying to make here is that if you want a decision theory to generalise from data, you need to provide a model. “Your situation has the same probabilities as a causal intervention on this causal graph on that dataset, where nodes {A, B, C, …} match up to nodes {X, Y, Z, …}” is as good a model as any, and can certainly be used in EDT. The fact that EDT doesn’t come “model included” is a feature, not a bug.
Agreed that this is a bad idea. I think where we disagree is that I don’t see EDT as discouraging this. It doesn’t even throw a type error when you give it blindly imported related historical data! CDT encourages you to actually think about causality before making any decisions.
Note that decision theory does actually serve a slightly different role from a general prediction module, because it should be built specifically for counterfactual reasoning. The five-and-ten argument seems to be an example of this: if while observing another agent, you see them choose $5 over $10, it could be reasonable to update towards them preferring $5 to $10. If considering the hypothetical situation where you choose $5 instead of $10, it does not make sense to update towards yourself preferring $5 to $10, or to draw whatever conclusion you like by the principle of explosion.
Given that you can emulate one system using the other, I think that elegance and convenience are the criteria we should use to choose between them. Note that emulating a joint probability without causal knowledge using a causal network is trivial- you just use undirected edges for any correlations- but emulating a causal network using a joint probability is difficult.
Precisely.