Until it is formalized, I find that I just do not know for sure how that “obs” operator is supposed to work mechanically and syntactically and in terms of whether adding that operator to the rest of the bayesian tools would maintain any particular properties like soundness or completeness!
I entirely agree here. Since December, I have been writing a megapost about interpreting “obs” as “box” in modal logic, so that it’s like a concept of necessity or proof, but mostly in a critical way where I question every assumption that anyone has made in the literature. Hopefully that post will se the light of day some time this year.
Currently I think of Obs(X) as the proposition that you have observed X. The idea that you should update on Obs(X) when you observe X is related to the idea that Obs(X) → Obs(Obs(X)); ie, the idea that you always observe that you osberve X, if you observe X at all. I think this proposition is faulty, so we can’t necessarily update on Obs(X), as most Bayesians recommend, even though we would be better off doing so.[1]
So I don’t particularly think the “do” operator is involved here.
Like maybe it is possible to “obs(E & not-E)” by two different methods, during a single observational session, and maybe that’s just… in the range of formally allowed (sometimes empirically faulty?) “observations”?
An example which appears in the philosophical literature: you observe that it’s 6pm, and later you observe that it’s 7pm, which contradicts the 6pm observation.
(The contradiction seems pretty dumb, and should be easy to take care of, but the important question is how exactly we take care of it.)
My hope, given that you authored this in 2019 and I’m commenting in 2023, is that you already noticed something proximate to these possibilities, and can just point me to some other essay or textbook chapter or something <3
I would point you to Novelty, Information, and Surprise with the caveat that I don’t entirely buy the approach therein, since it firmly assumes Obs(X) → Obs(Obs(X)). However, it still yields an interesting generalization of information theory, by rejecting the critical ‘partition assumption’, an assumption about how Obs() can work (due to Aumann, afaict) which I briefly argued against in my recent post on common knowledge. I think re-reading Aumann’s classic paper ‘agreeing to disagree’ and thinking carefully about what’s going on is a good place to start. Also, Probabilistic Reasoning in Intelligent Systems by Judea Pearl has a careful, thoughtful defense of conditioning on Obs(X) instead of X somewhere in the early chapters.
Define the “evidence relation” R to mean that when I’m in w1, I think I might be in w2. Define Obs(X) to mean that I think I’m within the set of worlds X. (That is to say, X is equal to, or a superset of, the worlds I think I might be in.) The “information set at a world w” is the set of worlds I think I might be in at w. To “know” a proposition X is to Obs(X); that is, to rule out worlds where X is not the case.
Obs(X) → Obs(Obs(X)) implies that R is transitive: if I think I might be in a world w2, then I must think I might be in any world w2 would think it might be in. Otherwise, I think I might be in a world w2 where I thought I might be in some world w3, which I don’t currently think I might be in. In other words, the information set at w2 contains something which the information set at w1 does not contain. Setting X to be something that’s false in w3 but true in w1 and w2, we see that Obs(X) but not Obs(Obs(X)), contradicting our initial assumption.
Indeed, Obs(X) → Obs(Obs(X)) is equivalent to transitivity of R.
But transitivity of R is implausible for an agent embedded in the physical world. For example, if I observe a coffee cup on a table, I can only measure its location to within some measurement error. For every possible location of the coffee cup, the information set includes a small region around that precise location. Transitivity says that if we can always slide the coffee cup by 1mm to make an indistinguishable observation, then we must be able to slide it by Xmm for any X. So, if we accept both transitivity and realistic imprecision of measurement, we are forced to conclude that the coffee cup could be anywhere.
I entirely agree here. Since December, I have been writing a megapost about interpreting “obs” as “box” in modal logic, so that it’s like a concept of necessity or proof, but mostly in a critical way where I question every assumption that anyone has made in the literature. Hopefully that post will se the light of day some time this year.
Currently I think of Obs(X) as the proposition that you have observed X. The idea that you should update on Obs(X) when you observe X is related to the idea that Obs(X) → Obs(Obs(X)); ie, the idea that you always observe that you osberve X, if you observe X at all. I think this proposition is faulty, so we can’t necessarily update on Obs(X), as most Bayesians recommend, even though we would be better off doing so.[1]
So I don’t particularly think the “do” operator is involved here.
An example which appears in the philosophical literature: you observe that it’s 6pm, and later you observe that it’s 7pm, which contradicts the 6pm observation.
(The contradiction seems pretty dumb, and should be easy to take care of, but the important question is how exactly we take care of it.)
I would point you to Novelty, Information, and Surprise with the caveat that I don’t entirely buy the approach therein, since it firmly assumes Obs(X) → Obs(Obs(X)). However, it still yields an interesting generalization of information theory, by rejecting the critical ‘partition assumption’, an assumption about how Obs() can work (due to Aumann, afaict) which I briefly argued against in my recent post on common knowledge. I think re-reading Aumann’s classic paper ‘agreeing to disagree’ and thinking carefully about what’s going on is a good place to start. Also, Probabilistic Reasoning in Intelligent Systems by Judea Pearl has a careful, thoughtful defense of conditioning on Obs(X) instead of X somewhere in the early chapters.
I’ll construct a counterexample.
Define the “evidence relation” R to mean that when I’m in w1, I think I might be in w2. Define Obs(X) to mean that I think I’m within the set of worlds X. (That is to say, X is equal to, or a superset of, the worlds I think I might be in.) The “information set at a world w” is the set of worlds I think I might be in at w. To “know” a proposition X is to Obs(X); that is, to rule out worlds where X is not the case.
Obs(X) → Obs(Obs(X)) implies that R is transitive: if I think I might be in a world w2, then I must think I might be in any world w2 would think it might be in. Otherwise, I think I might be in a world w2 where I thought I might be in some world w3, which I don’t currently think I might be in. In other words, the information set at w2 contains something which the information set at w1 does not contain. Setting X to be something that’s false in w3 but true in w1 and w2, we see that Obs(X) but not Obs(Obs(X)), contradicting our initial assumption.
Indeed, Obs(X) → Obs(Obs(X)) is equivalent to transitivity of R.
But transitivity of R is implausible for an agent embedded in the physical world. For example, if I observe a coffee cup on a table, I can only measure its location to within some measurement error. For every possible location of the coffee cup, the information set includes a small region around that precise location. Transitivity says that if we can always slide the coffee cup by 1mm to make an indistinguishable observation, then we must be able to slide it by Xmm for any X. So, if we accept both transitivity and realistic imprecision of measurement, we are forced to conclude that the coffee cup could be anywhere.