I disagree with your last point. Since we’re agents, we can get a much better intuitive understanding of what causality is, how it works and how to apply it in our childhood. As babies, we start doing lots and lots of experiments. Those are not exactly randomized controlled trials, so they will not fully remove confounders, but it gets close when we try to do something different in a relatively similar situation. Doing lots of gymnastics, dropping stuff, testing the parent’s limits etc., is what allows us to quickly learn causality.
LLMs, as they are currently trained, don’t have this privilege of experimentation. Also, LLMs are missing so many potential confounders as they can only look at text, which is why I think that systems like Flamingo and Gato are important (even though the latter was a bit disappointing).
I agree my last point is more speculative. The question is whether vast amounts of pre-trained data + a smaller amount of finetuning by online RL substitutes for the human experience. Given the success of pre-training so far, I think it probably will.
Note that the modern understanding of causality in stats/analytic philosophy/Pearl took centuries of intellectual progress—even if it seems straightforward. Spurious causal inference seems ubiquitous among humans unless they have learned—by reading/explicit training—about the modern understanding. Your examples from human childhood (dropping stuff) seem most relevant to basic physics experiments and less to stochastic relationships between 3 or more variables.
I disagree with your last point. Since we’re agents, we can get a much better intuitive understanding of what causality is, how it works and how to apply it in our childhood. As babies, we start doing lots and lots of experiments. Those are not exactly randomized controlled trials, so they will not fully remove confounders, but it gets close when we try to do something different in a relatively similar situation. Doing lots of gymnastics, dropping stuff, testing the parent’s limits etc., is what allows us to quickly learn causality.
LLMs, as they are currently trained, don’t have this privilege of experimentation. Also, LLMs are missing so many potential confounders as they can only look at text, which is why I think that systems like Flamingo and Gato are important (even though the latter was a bit disappointing).
I agree my last point is more speculative. The question is whether vast amounts of pre-trained data + a smaller amount of finetuning by online RL substitutes for the human experience. Given the success of pre-training so far, I think it probably will.
Note that the modern understanding of causality in stats/analytic philosophy/Pearl took centuries of intellectual progress—even if it seems straightforward. Spurious causal inference seems ubiquitous among humans unless they have learned—by reading/explicit training—about the modern understanding. Your examples from human childhood (dropping stuff) seem most relevant to basic physics experiments and less to stochastic relationships between 3 or more variables.
Well maybe llms can “experiment” on their dataset by assuming something about it and then being modified if they encounter counterexample.
I think it vaguely counts as experimenting.