I think the motivation for the representability of some sets of conditional independences with a DAG is pretty clear, because people already use probability distributions all the time, they sometimes have conditional independences and visuals are nice.
On the other hand the fundamental theorem relates orthogonality to independences in a family of distributions generated in a particular way. Neither of these things are natural properties of probability distributions in the way that conditional independence is. If I am using probability distributions, it seems to me I’d rather avoid introducing them if I can. Even if the reasons are mysterious, it might be useful to work with models of this type—I was just wondering if there were reasons for doing that are apparent before you derive any useful results.
Alternatively, is it plausible that you could derive the same results just using probability + whatever else you need anyway? For example, you could perhaps define X to be prior to Y if, relative to some ordering of functions by “naturalness”, there is a more natural f(X,Y) such that X⊥f(X,Y) and X⊥/f(X,Y)|Y than any g(X,Y) such that Y⊥g(X,Y) etc. I have no idea if that actually works!
However, I’m pretty sure you’ll need something like a naturalness ordering in order to separate “true orthogonality” from “merely apparent orthogonality”, which is why I think it’s fair to posit it as an element of “whatever else you need anyway”. Maybe not.
I think the motivation for the representability of some sets of conditional independences with a DAG is pretty clear, because people already use probability distributions all the time, they sometimes have conditional independences and visuals are nice.
On the other hand the fundamental theorem relates orthogonality to independences in a family of distributions generated in a particular way. Neither of these things are natural properties of probability distributions in the way that conditional independence is. If I am using probability distributions, it seems to me I’d rather avoid introducing them if I can. Even if the reasons are mysterious, it might be useful to work with models of this type—I was just wondering if there were reasons for doing that are apparent before you derive any useful results.
Alternatively, is it plausible that you could derive the same results just using probability + whatever else you need anyway? For example, you could perhaps define X to be prior to Y if, relative to some ordering of functions by “naturalness”, there is a more natural f(X,Y) such that X⊥f(X,Y) and X⊥/f(X,Y)|Y than any g(X,Y) such that Y⊥g(X,Y) etc. I have no idea if that actually works!
However, I’m pretty sure you’ll need something like a naturalness ordering in order to separate “true orthogonality” from “merely apparent orthogonality”, which is why I think it’s fair to posit it as an element of “whatever else you need anyway”.Maybe not.