This is a great post in a humanistic sense, that I only just read today via curatorial cleverness giving me appreciation for author and curators both :-)
I will now proceed to ignore the many redeeming qualities and geek out on math quibbling <3
Something jumped out at me related to the presumed equivalence between predicate logic and bayesian probability and subjective observations.
Consider how this is presented:
P(H)=P(H|obs(E))P(obs(E))+P(H|¬obs(E))P(¬obs(E))
Then also this:
P(H)=P(H|obs(E))P(obs(E))+P(H|obs(¬E))P(obs(¬E))
I instantly and intuitively “get what this is pointing to” within the essay as a way of phenomenologically directing my thoughts towards certain experiences of looking at things (or not), and trying experiments (or not), or even bringing things up in a discussion (or not)…
...and yet also some part of me feels that this is a sort of “an abuse of notation” that might have massive cascading consequences?
Like… it is nearly axiomatic that:
1.0=P(obs(E))+P(¬obs(E))
But it does NOT seem obviously “nearly axiomatic” that:
1.0=P(obs(E))+P(obs(¬E))
Also, I feel like “the predictable failure of anticipated observations to logically sum to unity” is close to the core of many confusions?
Until it is formalized, I find that I just do not know for sure how that “obs” operator is supposed to work mechanically and syntactically and in terms of whether adding that operator to the rest of the bayesian tools would maintain any particular properties like soundness or completeness!
Like maybe it is possible to “obs(E & not-E)” by two different methods, during a single observational session, and maybe that’s just… in the range of formally allowed (sometimes empirically faulty?) “observations”?
Is this intended as syntactic sugar somehow for… sequential updating? (Surely the main point of Bayesian notation is to turn “pre-OBServational prior probabilities” into “post-OBServational posterior probabilities”?)
Or maybe you are trying to evoke SQL’s “three valued logic” where NULL sort of “contaminates” the output of boolean logical operators applied to it? Like in SQL you say “(True AND NULL) is NULL” while “(True OR NULL) is True” and “(False OR NULL) is NULL”.
Maybe the OBS() operator is literally syntactic sugar for DO() in a special case?
Maybe you could build a larger belief network that has a boolean “observed_the_thing” as a “belief” node with a causal arrow to a different node that represents “observation_of_the_thing” and then P(observation_of_the_thing=SQLNULL|observed_the_thing=False) > 99% as an epistemic belief in a larger table of conditional probabilities that spell out exactly how “observing CAUSES observations”?
Then maybe OBS() is literally just DO() in the restricted case of DO(observed_the_thing=True)?
))
My hope, given that you authored this in 2019 and I’m commenting in 2023, is that you already noticed something proximate to these possibilities, and can just point me to some other essay or textbook chapter or something <3
Have you explored much of this in other ways since writing it?
It feels like it could be an evocative call to “invent new notation to formalize something not-yet-formalized” <3
Until it is formalized, I find that I just do not know for sure how that “obs” operator is supposed to work mechanically and syntactically and in terms of whether adding that operator to the rest of the bayesian tools would maintain any particular properties like soundness or completeness!
I entirely agree here. Since December, I have been writing a megapost about interpreting “obs” as “box” in modal logic, so that it’s like a concept of necessity or proof, but mostly in a critical way where I question every assumption that anyone has made in the literature. Hopefully that post will se the light of day some time this year.
Currently I think of Obs(X) as the proposition that you have observed X. The idea that you should update on Obs(X) when you observe X is related to the idea that Obs(X) → Obs(Obs(X)); ie, the idea that you always observe that you osberve X, if you observe X at all. I think this proposition is faulty, so we can’t necessarily update on Obs(X), as most Bayesians recommend, even though we would be better off doing so.[1]
So I don’t particularly think the “do” operator is involved here.
Like maybe it is possible to “obs(E & not-E)” by two different methods, during a single observational session, and maybe that’s just… in the range of formally allowed (sometimes empirically faulty?) “observations”?
An example which appears in the philosophical literature: you observe that it’s 6pm, and later you observe that it’s 7pm, which contradicts the 6pm observation.
(The contradiction seems pretty dumb, and should be easy to take care of, but the important question is how exactly we take care of it.)
My hope, given that you authored this in 2019 and I’m commenting in 2023, is that you already noticed something proximate to these possibilities, and can just point me to some other essay or textbook chapter or something <3
I would point you to Novelty, Information, and Surprise with the caveat that I don’t entirely buy the approach therein, since it firmly assumes Obs(X) → Obs(Obs(X)). However, it still yields an interesting generalization of information theory, by rejecting the critical ‘partition assumption’, an assumption about how Obs() can work (due to Aumann, afaict) which I briefly argued against in my recent post on common knowledge. I think re-reading Aumann’s classic paper ‘agreeing to disagree’ and thinking carefully about what’s going on is a good place to start. Also, Probabilistic Reasoning in Intelligent Systems by Judea Pearl has a careful, thoughtful defense of conditioning on Obs(X) instead of X somewhere in the early chapters.
Define the “evidence relation” R to mean that when I’m in w1, I think I might be in w2. Define Obs(X) to mean that I think I’m within the set of worlds X. (That is to say, X is equal to, or a superset of, the worlds I think I might be in.) The “information set at a world w” is the set of worlds I think I might be in at w. To “know” a proposition X is to Obs(X); that is, to rule out worlds where X is not the case.
Obs(X) → Obs(Obs(X)) implies that R is transitive: if I think I might be in a world w2, then I must think I might be in any world w2 would think it might be in. Otherwise, I think I might be in a world w2 where I thought I might be in some world w3, which I don’t currently think I might be in. In other words, the information set at w2 contains something which the information set at w1 does not contain. Setting X to be something that’s false in w3 but true in w1 and w2, we see that Obs(X) but not Obs(Obs(X)), contradicting our initial assumption.
Indeed, Obs(X) → Obs(Obs(X)) is equivalent to transitivity of R.
But transitivity of R is implausible for an agent embedded in the physical world. For example, if I observe a coffee cup on a table, I can only measure its location to within some measurement error. For every possible location of the coffee cup, the information set includes a small region around that precise location. Transitivity says that if we can always slide the coffee cup by 1mm to make an indistinguishable observation, then we must be able to slide it by Xmm for any X. So, if we accept both transitivity and realistic imprecision of measurement, we are forced to conclude that the coffee cup could be anywhere.
This is a great post in a humanistic sense, that I only just read today via curatorial cleverness giving me appreciation for author and curators both :-)
I will now proceed to ignore the many redeeming qualities and geek out on math quibbling <3
Something jumped out at me related to the presumed equivalence between predicate logic and bayesian probability and subjective observations.
Consider how this is presented:
Then also this:
I instantly and intuitively “get what this is pointing to” within the essay as a way of phenomenologically directing my thoughts towards certain experiences of looking at things (or not), and trying experiments (or not), or even bringing things up in a discussion (or not)…
...and yet also some part of me feels that this is a sort of “an abuse of notation” that might have massive cascading consequences?
Like… it is nearly axiomatic that:
1.0=P(obs(E))+P(¬obs(E))
But it does NOT seem obviously “nearly axiomatic” that:
1.0=P(obs(E))+P(obs(¬E))
Also, I feel like “the predictable failure of anticipated observations to logically sum to unity” is close to the core of many confusions?
Until it is formalized, I find that I just do not know for sure how that “obs” operator is supposed to work mechanically and syntactically and in terms of whether adding that operator to the rest of the bayesian tools would maintain any particular properties like soundness or completeness!
Like maybe it is possible to “obs(E & not-E)” by two different methods, during a single observational session, and maybe that’s just… in the range of formally allowed (sometimes empirically faulty?) “observations”?
Is this intended as syntactic sugar somehow for… sequential updating? (Surely the main point of Bayesian notation is to turn “pre-OBServational prior probabilities” into “post-OBServational posterior probabilities”?)
Or maybe you are trying to evoke SQL’s “three valued logic” where NULL sort of “contaminates” the output of boolean logical operators applied to it? Like in SQL you say “(True AND NULL) is NULL” while “(True OR NULL) is True” and “(False OR NULL) is NULL”.
Or is this intended to be evocative of something like the Pearlian DO() operator?
((
Maybe the OBS() operator is literally syntactic sugar for DO() in a special case?
Maybe you could build a larger belief network that has a boolean “observed_the_thing” as a “belief” node with a causal arrow to a different node that represents “observation_of_the_thing” and then P(observation_of_the_thing=SQLNULL|observed_the_thing=False) > 99% as an epistemic belief in a larger table of conditional probabilities that spell out exactly how “observing CAUSES observations”?
Then maybe OBS() is literally just DO() in the restricted case of DO(observed_the_thing=True)?
))
My hope, given that you authored this in 2019 and I’m commenting in 2023, is that you already noticed something proximate to these possibilities, and can just point me to some other essay or textbook chapter or something <3
Have you explored much of this in other ways since writing it?
It feels like it could be an evocative call to “invent new notation to formalize something not-yet-formalized” <3
I entirely agree here. Since December, I have been writing a megapost about interpreting “obs” as “box” in modal logic, so that it’s like a concept of necessity or proof, but mostly in a critical way where I question every assumption that anyone has made in the literature. Hopefully that post will se the light of day some time this year.
Currently I think of Obs(X) as the proposition that you have observed X. The idea that you should update on Obs(X) when you observe X is related to the idea that Obs(X) → Obs(Obs(X)); ie, the idea that you always observe that you osberve X, if you observe X at all. I think this proposition is faulty, so we can’t necessarily update on Obs(X), as most Bayesians recommend, even though we would be better off doing so.[1]
So I don’t particularly think the “do” operator is involved here.
An example which appears in the philosophical literature: you observe that it’s 6pm, and later you observe that it’s 7pm, which contradicts the 6pm observation.
(The contradiction seems pretty dumb, and should be easy to take care of, but the important question is how exactly we take care of it.)
I would point you to Novelty, Information, and Surprise with the caveat that I don’t entirely buy the approach therein, since it firmly assumes Obs(X) → Obs(Obs(X)). However, it still yields an interesting generalization of information theory, by rejecting the critical ‘partition assumption’, an assumption about how Obs() can work (due to Aumann, afaict) which I briefly argued against in my recent post on common knowledge. I think re-reading Aumann’s classic paper ‘agreeing to disagree’ and thinking carefully about what’s going on is a good place to start. Also, Probabilistic Reasoning in Intelligent Systems by Judea Pearl has a careful, thoughtful defense of conditioning on Obs(X) instead of X somewhere in the early chapters.
I’ll construct a counterexample.
Define the “evidence relation” R to mean that when I’m in w1, I think I might be in w2. Define Obs(X) to mean that I think I’m within the set of worlds X. (That is to say, X is equal to, or a superset of, the worlds I think I might be in.) The “information set at a world w” is the set of worlds I think I might be in at w. To “know” a proposition X is to Obs(X); that is, to rule out worlds where X is not the case.
Obs(X) → Obs(Obs(X)) implies that R is transitive: if I think I might be in a world w2, then I must think I might be in any world w2 would think it might be in. Otherwise, I think I might be in a world w2 where I thought I might be in some world w3, which I don’t currently think I might be in. In other words, the information set at w2 contains something which the information set at w1 does not contain. Setting X to be something that’s false in w3 but true in w1 and w2, we see that Obs(X) but not Obs(Obs(X)), contradicting our initial assumption.
Indeed, Obs(X) → Obs(Obs(X)) is equivalent to transitivity of R.
But transitivity of R is implausible for an agent embedded in the physical world. For example, if I observe a coffee cup on a table, I can only measure its location to within some measurement error. For every possible location of the coffee cup, the information set includes a small region around that precise location. Transitivity says that if we can always slide the coffee cup by 1mm to make an indistinguishable observation, then we must be able to slide it by Xmm for any X. So, if we accept both transitivity and realistic imprecision of measurement, we are forced to conclude that the coffee cup could be anywhere.