If both networks give the same answers to marginal and conditional probability queries, that amounts to them making the same predictions about the world.
Does it? A bunch of probability distributions on opaque variables like X1 and X2 seems like it is missing something in terms of making predictions about any world. Even if you relabel the variables with more suggestive names like “rain” and “wet”, that’s a bit like manually programming in a bunch of IS-A() and HAS-A() relationships and if-then statements into a 1970s AI system.
Bayes nets are one component for understanding and formalizing causality, and they capture something real and important about its nature. The remaining pieces involve concepts that are harder to encode in simple, traditional algorithms, but that doesn’t make them any less real or ontologically special, nor does it make Bayes nets useless or flawed.
Without all the knowledge about what words like rain and wetness and and slippery mean, you might be better off replacing these labels with things like “bloxor” and “greeblic”. You could then still do interventions on the network to learn something about whether the data you have suggests that bloxors cause greeblic-ness is a simpler hypothesis than greeblic-ness causing bloxors. Without Bayes nets (or something isomorphic to them in conceptspace), you’d be totally lost in an unfamiliar world of bloxors and greeblic-ness. But there’s still a missing piece for explaining causality that involves using physics (or higher-level domain-specific knowledge) about what bloxors and greeblic-ness represent to actually make predictions about them.
(I read this post as claiming implicitly that the original post (or its author) are missing or forgetting some of the points above. I did find this post useful and interesting as a exploration and explanation of the nuts and bolts of Bayes nets, but I don’t think I was left confused or misled by the original piece.)
seems like it is missing something in terms of making predictions about any world
I mean, you’re right, but that’s not what I was going for with that sentence. Suppose we were talking about a tiny philosophical “world” of opaque variables, rather than the real physical universe in all its richness and complexity. If you’re just drawing samples from the original joint distribution, both networks will tell you exactly what you should predict to see. But if we suppose that there are “further facts” about some underlying mechanisms that generate that distribution, the two networks are expressing different beliefs about those further facts (e.g., whether changing X1 will change X4, which you can’t tell if you don’t change it).
claiming implicitly that the original post (or its author) are missing [...] some of the points above
Not the author, but some readers (not necessarily you). This post is trying to fill in a gap, of something that people who Actually Read The Serious Textbook and Done Exercises already know, but people who have only read one or two intro blog posts maybe don’t know. (It’s no one’s fault; any one blog post can only say so much.)
Does it? A bunch of probability distributions on opaque variables like X1 and X2 seems like it is missing something in terms of making predictions about any world. Even if you relabel the variables with more suggestive names like “rain” and “wet”, that’s a bit like manually programming in a bunch of IS-A() and HAS-A() relationships and if-then statements into a 1970s AI system.
Bayes nets are one component for understanding and formalizing causality, and they capture something real and important about its nature. The remaining pieces involve concepts that are harder to encode in simple, traditional algorithms, but that doesn’t make them any less real or ontologically special, nor does it make Bayes nets useless or flawed.
Without all the knowledge about what words like rain and wetness and and slippery mean, you might be better off replacing these labels with things like “bloxor” and “greeblic”. You could then still do interventions on the network to learn something about whether the data you have suggests that bloxors cause greeblic-ness is a simpler hypothesis than greeblic-ness causing bloxors. Without Bayes nets (or something isomorphic to them in conceptspace), you’d be totally lost in an unfamiliar world of bloxors and greeblic-ness. But there’s still a missing piece for explaining causality that involves using physics (or higher-level domain-specific knowledge) about what bloxors and greeblic-ness represent to actually make predictions about them.
(I read this post as claiming implicitly that the original post (or its author) are missing or forgetting some of the points above. I did find this post useful and interesting as a exploration and explanation of the nuts and bolts of Bayes nets, but I don’t think I was left confused or misled by the original piece.)
I mean, you’re right, but that’s not what I was going for with that sentence. Suppose we were talking about a tiny philosophical “world” of opaque variables, rather than the real physical universe in all its richness and complexity. If you’re just drawing samples from the original joint distribution, both networks will tell you exactly what you should predict to see. But if we suppose that there are “further facts” about some underlying mechanisms that generate that distribution, the two networks are expressing different beliefs about those further facts (e.g., whether changing X1 will change X4, which you can’t tell if you don’t change it).
Not the author, but some readers (not necessarily you). This post is trying to fill in a gap, of something that people who Actually Read The Serious Textbook and Done Exercises already know, but people who have only read one or two intro blog posts maybe don’t know. (It’s no one’s fault; any one blog post can only say so much.)