I was considering this, but the problem is that in your setup S is supposed to be derived from X (that is, S is a deterministic function of X), which is not true when X = training data and S = that which we want to predict.
That’s an (implicit) assumption in Conant & Ashby’s setup, I explicitly remove that constraint in the “Minimum Entropy → Maximum Expected Utility and Imperfect Knowledge” section. (That’s the “imperfect knowledge” part.)
If S is derived from X, then “information in S” = “information in X relevant to S”
Same here. Once we relax the “S is a deterministic function of X” constraint, the “information in X relevant to S” is exactly the posterior distribution (s↦P[S=s|X]), which is why that distribution comes up so much in the later sections.
(In general I struggled with keeping the summary short vs. staying true to the details of the causal model.)
Yeah, the number of necessary nontrivial pieces is… just a little to high to not have to worry about inductive distance.
Good enough. I don’t love it, but I also don’t see easy ways to improve it without making it longer and more technical (which would mean it’s not strictly an improvement). Maybe at some point I’ll take the time to make a shorter and less math-dense writeup.
That’s an (implicit) assumption in Conant & Ashby’s setup, I explicitly remove that constraint in the “Minimum Entropy → Maximum Expected Utility and Imperfect Knowledge” section. (That’s the “imperfect knowledge” part.)
Same here. Once we relax the “S is a deterministic function of X” constraint, the “information in X relevant to S” is exactly the posterior distribution (s↦P[S=s|X]), which is why that distribution comes up so much in the later sections.
Yeah, the number of necessary nontrivial pieces is… just a little to high to not have to worry about inductive distance.
… That’ll teach me to skim through the math in posts I’m trying to summarize. I’ve edited the summary, lmk if it looks good now.
Good enough. I don’t love it, but I also don’t see easy ways to improve it without making it longer and more technical (which would mean it’s not strictly an improvement). Maybe at some point I’ll take the time to make a shorter and less math-dense writeup.