Simulations (and computer programs in general—think about how debuggers for computer programs work) are causal models, not purely predictive models. Your answer does no work, because being able to simulate at that level of fidelity means we are already Done with the science of what we are simulating. In particular our simulator will contain in it a very detailed causal model that would contain answers to everything we might want to know. The question is what do we do when our information isn’t very good, not when we can just say “let’s ask God.”
This is a quote from an ML researcher today, who is talking about what is done today. And what is done today for purely predictive modeling are those crazy deep learning networks or support vector machines they have in ML. Those are algorithms specifically tailored to answering p(Y | X) kinds of questions (e.g. prediction questions), not causal questions.
edit: to add to this a little more. I think there is a general mathematical principle at play here, which is similar in spirit to Occam’s razor. This principle is : “try to use the weakest assumptions needed to get the right answer.” It is this principle that makes “Omega-style simulations” an unsatisfactory answer. It’s a kind of overfitting of the entire scientific process.
A good enough prediction engine can substitute, to a degree, for a causal model. Obviously, not always and once you get outside of its competency domain it will break, but still—if you can forecast very well what effects will an intervention produce, your need for a causal model is diminished.
I see. So then if I were to give you a causal decision problem, can you tell me what the right answer is using only a prediction engine? I have a list of them right here!
The general form of these problems is : “We have a causal model where an outcome is death. We only have observational data obtained from this causal model. We are interested in whether a given intervention will reduce the death rate. Should we do the intervention?”
Observational data is enough for the predictor, right? (But the predictor doesn’t get to see what the causal model is, after all, it just works on observational data and is agnostic of how it came about).
Huh? You don’t obtain observational data from a model, you obtain it from reality.
Right, the data comes from the territory, but we assume the map is correct.
That depends. I think I understand prediction models wider than you do.
The point is, if your ‘prediction model’ has a rich enough language to incorporate the causal model, it’s no longer purely a prediction model as everyone in the ML field understands it, because it can then also answer counterfactual questions. In particular, if your prediction model only uses the language of probability theory, it cannot incorporate any causal information because it cannot talk about counterfactuals.
So are you willing to take me up on my offer of solving causal problems with a prediction algorithm?
the data comes from the territory, but we assume the map is correct.
You don’t need any assumptions about the model to get observational data. Well, you need some to recognize what are you looking at, but certainly you don’t need to assume the correctness of a causal model.
no longer purely a prediction model as everyone in the ML field understands it
We may be having some terminology problems. Normally I call a “prediction model” anything that outputs testable forecasts about the future. Causal models are a subset of prediction models. Within the context of this thread I understand “prediction model” as a model which outputs forecasts and which does not depend on simulating the mechanics of the underlying process. It seems you’re thinking of “pure prediction models” as something akin to “technical” models in finance which look at price history, only at price history, and nothing but the price history. So a “pure prediction model” would be to you something like a neural network into which you dump a lot of more or less raw data but you do not tweak the NN structure to reflect your understanding of how the underlying process works.
Yes, I would agree that a prediction model cannot talk about counterfactuals. However I would not agree that a prediction model can’t successfully forecast on the basis of inputs it never saw before.
So are you willing to take me up on my offer of solving causal problems with a prediction algorithm?
Good prediction algorithms are domain-specific. I am not defending an assertion that you can get some kind of a Universal Problem Solver out of ML techniques.
Simulations (and computer programs in general—think about how debuggers for computer programs work) are causal models, not purely predictive models. Your answer does no work, because being able to simulate at that level of fidelity means we are already Done with the science of what we are simulating. In particular our simulator will contain in it a very detailed causal model that would contain answers to everything we might want to know. The question is what do we do when our information isn’t very good, not when we can just say “let’s ask God.”
This is a quote from an ML researcher today, who is talking about what is done today. And what is done today for purely predictive modeling are those crazy deep learning networks or support vector machines they have in ML. Those are algorithms specifically tailored to answering p(Y | X) kinds of questions (e.g. prediction questions), not causal questions.
edit: to add to this a little more. I think there is a general mathematical principle at play here, which is similar in spirit to Occam’s razor. This principle is : “try to use the weakest assumptions needed to get the right answer.” It is this principle that makes “Omega-style simulations” an unsatisfactory answer. It’s a kind of overfitting of the entire scientific process.
A good enough prediction engine can substitute, to a degree, for a causal model. Obviously, not always and once you get outside of its competency domain it will break, but still—if you can forecast very well what effects will an intervention produce, your need for a causal model is diminished.
I see. So then if I were to give you a causal decision problem, can you tell me what the right answer is using only a prediction engine? I have a list of them right here!
The general form of these problems is : “We have a causal model where an outcome is death. We only have observational data obtained from this causal model. We are interested in whether a given intervention will reduce the death rate. Should we do the intervention?”
Observational data is enough for the predictor, right? (But the predictor doesn’t get to see what the causal model is, after all, it just works on observational data and is agnostic of how it came about).
A good enough prediction engine, yes.
Huh? You don’t obtain observational data from a model, you obtain it from reality.
That depends. I think I understand prediction models wider than you do. A prediction model can use any kind of input it likes if it finds it useful.
Right, the data comes from the territory, but we assume the map is correct.
The point is, if your ‘prediction model’ has a rich enough language to incorporate the causal model, it’s no longer purely a prediction model as everyone in the ML field understands it, because it can then also answer counterfactual questions. In particular, if your prediction model only uses the language of probability theory, it cannot incorporate any causal information because it cannot talk about counterfactuals.
So are you willing to take me up on my offer of solving causal problems with a prediction algorithm?
You don’t need any assumptions about the model to get observational data. Well, you need some to recognize what are you looking at, but certainly you don’t need to assume the correctness of a causal model.
We may be having some terminology problems. Normally I call a “prediction model” anything that outputs testable forecasts about the future. Causal models are a subset of prediction models. Within the context of this thread I understand “prediction model” as a model which outputs forecasts and which does not depend on simulating the mechanics of the underlying process. It seems you’re thinking of “pure prediction models” as something akin to “technical” models in finance which look at price history, only at price history, and nothing but the price history. So a “pure prediction model” would be to you something like a neural network into which you dump a lot of more or less raw data but you do not tweak the NN structure to reflect your understanding of how the underlying process works.
Yes, I would agree that a prediction model cannot talk about counterfactuals. However I would not agree that a prediction model can’t successfully forecast on the basis of inputs it never saw before.
Good prediction algorithms are domain-specific. I am not defending an assertion that you can get some kind of a Universal Problem Solver out of ML techniques.