Theorem 2: Acausal Commutative Square:The following diagram commutes for any belief function Θ. Any infradistribution D where D=⊤Π⋉(λπ.u1{π}(D)) also makes this diagram commute and induces a belief function.
Here’s how the proof proceeds. In order to show that the full diagram commutes, we first select properties on the four corners such that for each morphism, if the source fulfills the relevant property, then the target fulfills the relevant property. Obviously, the property for the bottom-right corner must be “is a belief function”. There are eight phases for this, some of which are quite easy, but the two most difficult are at the start, corresponding to the two morphisms on the bottom side, because we must exhaustively verify the belief function conditions and infradistribution conditions. As usual, lower-semicontinuity is the trickiest part to show. Then, since we have four sides of the square to show are isomorphisms, we have another eight phases where we verify going forward to a different corner and back is identity. Finally, there’s an easy step where we rule out that going around in a loop produces a nontrivial automorphism, by showing that both paths from first-person static to third-person dynamic are equal.
First, let’s define our properties.
The property for third-person static is D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D))), and D is an infradistribution.
The property for third-person dynamic is ψT=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψT))), and →T(π,aoh)=δa,o,(πao,h), and ψT is an infradistribution.
The property for first-person static is that Θ is a belief function.
The property for first-person dynamic is ψF=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψF))), and →F((π,aoh),a′)=δo,(πao,h) when a′=a, and ⊥ when a′≠a, and ψF is an infradistribution.
T2.1 Our first step is showing that all 8 morphisms induce the relevant property in the target if the source has the property.
T2.1.1 3-static to 1-static. Our definition is
Θ(π):=pr(A×O)ω∗(u1{π}(D))
and we must check the five conditions of a belief function for Θ, assuming D is an infradistribution on PD and it has the property D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
T2.1.1.1 First, bounded Lipschitz constant. Our proof goal is
∃λ⊙<∞∀π′,U,U′:|Θ(π)(U)−Θ(π)(U′)|≤λ⊙d(U,U′)
Let λ⊙ be the Lipschitz constant of D, which is finite because D is an infradistribution. Then let π′,U,U′ be arbitrary (they must be bounded-continuous, though). We have
We’ll leave lower-semicontinuity for the end, it’s hard.
T2.1.1.2 Now, normalization, ie infπΘ(π)(1)=1∧infπΘ(π)(0)=0 To do this, we have
infπΘ(π)(1)=infπpr(A×O)ω∗(u1{π}(D))(1)
Then this can be rewritten as
=⊤Π(λπ.pr(A×O)ω∗(u1{π}(D))(1))
and then as a semidirect product.
=(⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))(1)
and then, from our defining property for D, and normalization for D since it’s an infradistribution, we have
=D(1)=1
The same proof works for 0 as well.
T2.1.1.3 Now, sensible supports. Pick a π′,U,U′ where U and U′ are identical on histories compatible with π′. We want to show
Θ(π′)(U)=Θ(π′)(U′)
To start,
Θ(π′)(U)=pr(A×O)ω∗(u1{π′}(D))(U)
and undoing the projection and update, we have
=D(λπ,h.1π=π′U(h)+1π≠π′)
Remember that actually, D is supported over PD, ie, the subset of Π×(A×O)ω where the destiny is compatible with the policy, so when π=π′, h∼π=π′ and so U and U′ behave identically. Making this substitution in, we have
=D(λπ,h.1π=π′U′(h)+1π≠π′)
which then packs up as Θ(π′)(U′) and we’re done.
T2.1.1.4 Now for agreement on max value. For the [0,1] type signature on inframeasures, we want to show
∀π′,π′′:Θ(π′)(1)=Θ(π′′)(1)
Let π′,π′′ be arbitrary. We have
Θ(π′)(1)=pr(A×O)ω∗(u1{π′}(D))(1)
and then the projection and update unpack to
D(λπ,h.1π=π′1+1π≠π′)=D(1)=D(λπ,h.1π=π′′1+1π≠π′′)
and then, working in reverse from there, we get =Θ(π′′)(1) and we’re done. For the R type signature, we want to show
and then, working in reverse, we get =Θ(π′′)(∞) and we’re done.
T2.1.1.5 Time to revisit lower-semicontinuity. We want to show that
liminfn→∞Θ(πn)(U)≥Θ(π∞)(U)
This is going to take some rather intricate work with Propositions 2 and 3, and Theorem 1 from Inframesures and Domain Theory. Namely, the supremum of continuous functions is lower-semicontinuous, any lower-semicontinuous lower-bounded function has an ascending sequence of lower-bounded continuous functions which limits to it pointwise, and for any ascending sequence which limits to a lower-semicontinuous function pointwise, you can shuffle the limit outside of an inframeasure. Fix some continuous bounded U, either in [0,1] or in R, and sequence of policies πn which converge to π∞.
Our first task is to establish that the subset of PD where π=πn or πn+1 or πn+2… or π∞ is closed. This can be viewed as the following subset of Π×(A×O)ω:
({πn}n≥m×(A×O)ω)∩{π,h|h∼π}
(A×O)ω is compact, and {πn}n≥m is compact as well, since it limits to a point and the limit point is included. So that product is a compact set, and must be closed. The latter set is PD itself. And so, since this set is the intersection of a closed set with PD, it’s closed in PD with the subspace topology.
Now, define the functions u≥m:PD→R⊔{∞} as follows. If (π,h) has π=πm or πm+1 or πm+2 or… or π∞, then return U(h). Otherwise, return 1 (or ∞ for the other type signature).
u∞ is the function where, if π=π∞, return U(h), otherwise return 1 (or ∞).
Our first task is to show that all the u≥m, and u∞ are lower-semicontinuous. This is pretty easy to do. They are 1 (or ∞) outside of a closed set, and less than that inside the closed set.
Lower-semicontinuity happens because we have one of three cases holding. In the first case, the limit point is outside of the closed set (in the complement open set). Then, at some finite stage and forever afterwards, the sequence is outside closed set of interest, and the sequence becomes just a constant value of 1 (or ∞), and so liminf equals the value of the limit.
In the second case, the limit point is inside the closed set, and there’s a subsequence of policy-tagged destinies which remains within the closed set of interest. The liminf of the original sequence of policy-tagged destinies evaluated by u≥m equals the liminf of the subsequence which stays in the “safe” closed set, because all other points in the sequence get maximal value according to u≥m (because they’re outside the closed set), so they’re irrelevant for evaluating the liminf. Then, inside the closed set, u≥m acts like U, which is continuous, so the liminf of u≥m inside the closed set equals the limit of U inside the closed set, which equals u≥m applied to the limit point (in the closed set)
In the third case, the limit point is inside the closed set, but there’s no subsequence of policy-tagged destinies which remains within the closed set of interest. So, after some finite stage, the value of u≥m is 1 or infinity, so the liminf is high and the value of the limit point is low, and again we get lower-semicontinuity.
Finally, we need to show that u≥m is an ascending sequence of functions which has u∞ as the pointwise limit. The “ascending sequence” part is easy, because as m increases, u≥m maps a smaller and smaller set (fewer policies) to the value U reports, and more points to 1 (or infinity).
If π,h has π=π∞, then regardless of n, u≥n(π,h)=U(h)=u∞(π,h), so that works out. If π,h has π≠π∞, then the sequence πn can’t hit π infinitely often, otherwise π=π∞, so after some finite n, π,h drops out of the closed set associated with u≥n, and we have limn→∞(u≥n(π,h))=1=u∞(π,h).
Just one more piece of setup. For each u≥m, since it’s lower-semicontinuous, we can apply our Proposition 3 from the domain theory post to construct an ascending sequence of bounded continuous functions fm,n where limn→∞fm,n=u≥m pointwise.
NOW we can begin showing lower-semicontinuity of Θ.
by how Θ was defined. And then we unpack the projection and update to get
=liminfn→∞D(λπ,h.1π=πnU(h)+1π≠πn)
First, we observe that the function inside D is greater than u≥n, because u≥n copies U(h) when π=πn or πn+1 or πn+2… and is maximal otherwise, while the interior function just copies U(h) when π=πn and is maximal otherwise. So, by monotonicity, we can go
≥liminfn→∞D(u≥n)
And then, we can use monotonicity again to get
≥liminfn→∞D(supm≤nfm,n)
Wait, what’s this? Well, if m≤n, then fm,n≤u≥m≤u≥n, because fm,n was constructed to be an ascending sequence of functions (in n) limiting to u≥m. And u≥m is an ascending sequence of functions (in m), so it’s below u≥n. Thus, the function supm≤nfm,n lies below u≥n, we just applied monotonicity.
Our next claim is that supm≤nfm,n is an ascending sequence of continuous functions which limits pointwise to u∞. Here’s how to show it. The functions are continuous because they’re all the supremum of finitely many continuous functions fm,n. They ascend because
supm≤n+1fm,n+1≥supm≤nfm,n+1≥supm≤nfm,n
The first inequality was from considering one less function in the sup. the second inequality is because, for all m, fm,n is an ascending sequence of functions in n.
For showing that the sequence supm≤nfm,n limits (in n) pointwise to u∞, we fix an arbitrary m∗ and do
limn→∞maxm≤nfm,n(π,h)≥limn→∞fm∗,n(π,h)=u≥m∗(π,h)
The inequality is because eventually n goes past the fixed number m∗, and then we’re selecting a single function out of the supremum of functions. The equality is because fm∗,n limits pointwise in n to u≥m∗. Now, with this, we can go:
u∞(π,h)=limn→∞u≥n(π,h)=limn→∞maxm≤nu≥m(π,h)
≥limn→∞maxm≤nfm,n(π,h)≥supm∗u≥m∗(π,h)=u∞(π,h)
Let’s break that down. The first equality was because we showed that u∞ is the pointwise limit of the u≥n. The second equality is because u≥m makes an ascending sequence of functions, so the supremum of the functions would just be u≥n itself. The first inequality is because fm,n≤u≥m, it’s one of the approximating continuous functions. The second inequality was the thing we showed earlier, about how the limit exceeds u≥m∗(π,h) regardless of m∗. Finally, we use again that u∞ is the pointwise limit of the u≥m∗.
We just showed a quantity is greater than or equal to itself, so all these inequalities must be equalities, and we have
u∞(π,h)=limn→∞maxm≤nfm,n(π,h)
So, the ascending sequence of continuous functions supm≤nfm,n limits pointwise to u∞. Now that we have that, let’s recap what we’ve got so far.We have
liminfn→∞Θ(πn)(U)≥liminfn→∞D(supm≤nfm,n)
Since supm≤nfm,n is an ascending sequence of continuous functions in n, then by monotonicity of D, that sequence must be increasing, so we have
=limn→∞D(supm≤nfm,n)
and then, we can apply the monotone convergence theorem for infradistributions, that a sequence of continuous functions ascending pointwise to a lower-semicontinuous function has the limit of the infradistribution expectations equaling the expectation of the lower-semicontinuous supremum, so we have
=D(u∞)=D(λπ,h.1π=π∞U(h)+1π≠π∞)
and then we pack this back up as an update and projection
=pr(A×O)ω∗(u1{π∞}(D))(U)
and back up to get
=Θ(π∞)(U)
and lower-semicontinuity of Θ has been shown. So translating over an infradistribution makes a belief function.
T2.1.2 1-static to 3-static.
D:=⊤Π⋉Θ
Now, we’ll show the five infradistribution conditions on D, along with the sixth condition on D that it equals ⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D))).
First, to show it’s even well-defined, we have
D(f)=(⊤Π⋉Θ)(f)=⊤Π(λπ.Θ(π)(λh.f(π,h)))
That inner function is lower-semicontinuous in π, because, starting with
liminfn→∞Θ(πn)(λh.f(πn,h))
then we have that, because f is continuous and Π×(A×O)ω is compact, f is uniformly continuous. So, as n increases, the function λh.f(πn,h) uniformly limits to λh.f(π∞,h). Since all the Θ(πn) have a uniform upper bound on the Lipschitz constant (from Θ being a belief function) they converge to agreeing that λh.f(πn,h) has similar value as λh.f(π∞,h), so we get
=liminfn→∞Θ(πn)(λh.f(π∞,h))
and then, by lower-semicontinuity for Θ,
≥Θ(π∞)(λh.f(π∞,h))
So, since the inner function is lower-semicontinuous, it can be evaluated.
T2.1.2.1 Now for the condition that
D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
We start with D(f) and unpack the definition of D, to get
=(⊤Π⋉Θ)(f)
And unpack the definition of semidirect product and what ⊤Π means, to get
=infπΘ(π)(λh.f(π,h))
=infπinfπ′Θ(π′)(λh.1π′=πf(π,h)+1π′≠π)
This step happens because, if π′≠π, then Θ(π′) will be assessing the value of the constant-1 function (or constant-infinity function), and will return a maximal value (by condition 5 on belief functions, that they all agree on the value of 1 (or ∞)). So the only way to minimize that line via a choice of π′ is to make it equal to π, as that means that we’re evaluating Θ(π)(λh.f(π,h)) instead, which may be less than the maximal value.
Now we pack up what ⊤Π means, and the semidirect product, to get
=infπ(⊤Π⋉Θ)(λπ′,h.1π′=πf(π,h)+1π′≠π)
Pack up the definition of D to get
=infπD(λπ′,h.1π′=πf(π,h)+1π′≠π)
Pack up the definition of “1-update on π” to get
=infπu1{π}(D)(λπ′,h.f(π,h))
Pack up how projection mappings work, to get:
=infπpr(A×O)ω∗(u1{π}(D))(λh.f(π,h))
Pack up what ⊤Π means and the semidirect product to get
=(⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D))))(f)
So we have
D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
as desired, since f was arbitrary.
T2.1.2.2 Now to verify the infradistribution conditions on D. First, monotonicity. Let f′≥f
D(f′)=(⊤Π⋉Θ)(f′)
And unpack the definition of semidirect product and what ⊤Π means, to get
=infπΘ(π)(λh.f′(π,h))
Then, by monotonicity for all the Θ(π), we have
≥infπΘ(π)(λh.f(π,h))
which packs back up in reverse as D(f).
T2.1.2.3 Now for concavity. Start with D(pf+(1−p)f′) and unpack in the usual way to get
=infπΘ(π)(λh.pf(π,h)+(1−p)f′(π,h))
Then, by concavity for the Θ(π), as they’re inframeasures, we have
T2.1.2.4 For Lipschitzness, we start with |D(f)−D(f′)| and then partially unpack in the usual way to get
=|⊤Π(λπ.Θ(π)(λh.f(π,h)))−⊤Π(λπ.Θ(π)(λh.f′(π,h)))|
And, since ⊤Π is a sharp infradistribution, it has a Lipschitz constant of 1, so we have
≤supπ|Θ(π)(λh.f(π,h))−Θ(π)(λh.f′(π,h))|
Since all theΘ(π) have a uniform bound on their Lipschitz constants (one of the belief function conditions), we then have
≤supπλ⊙suph|f(π,h)−f′(π,h)|
which rearranges as
=λ⊙supπ,h|f(π,h)−f′(π,h)|=λ⊙d(f,f′)
and we’re done, we got a Lipschitz constant.
T2.1.2.5 The CAS property for D is trival because PD is a closed subset of Π×(A×O)ω which is a compact space, and so you can just take all of PD as a compact support.
T2.1.2.6 That just leaves normalization. We have
D(1)=infπΘ(π)(1)=1
D(0)=infπΘ(π)(0)=0
by unpacking D and applying normalization for a belief function.
So now, D is an infradistribution.
T2.1.3 For 3-static to 3-dynamic, we have
→T(π,aoh):=δa,o,πao,h (which fulfills one of the 3-dynamic conditions), and ψT:=D
Since D is an infradistribution, ψT is too, and for the condition on ψT, we have:
and we’re done, since D had the relevant condition.
T2.1.4 For 3-dynamic to 3-static, we have
D:=prΠ×(A×O)ω∗(ψT⋉→T:∞)
First, we must show that the niceness conditions are fulfilled for the infinite semidirect product to be defined at all.→T maps π,aoh to a,o,(πao,h). This is clearly a continuous mapping, so in particular, it’s lower-semicontinuous. Dirac-delta distributions are 1-Lipschitz when interpreted as inframeasures (all probability distributions have that property, actually). The compact-shared-CAS condition is redundant because A×O×PD is already a compact space. And all probability distributions (dirac-delta distributions are probability distributions) map constants to the same constant, so we get the increase-in-constants property and the 1 maps to 1 property. So we can indeed take the infinite semidirect product of the →T.
Now, we’ll show that unrolling like this just produces ψT exactly. We’ll use s0 for the initial state (pair of policy and destiny, also written as (π,h)0), and aos1:∞ for the successive unrolled history of states, actions, and observations. First, by unpacking D’s definition,
D(f)=prΠ×(A×O)ω∗(ψT⋉→T:∞)(f)
Then we unpack the projection and semidirect product
And then we can observe that unrolling the initial state forever via “advance the destiny 1 step, popping actions and observations off, and advance the policy”, when you project down to just the actions and observations (not the intermediate states), yields just the point distribution on that destiny, so we get:
=ψT(λ(π,h)0.δh(λh′.f(π,h′)))
And then substitute the value of that dirac-delta in to get
and we’re done, showing D had the relevant condition from ψT having it. Since ψT is an infradistribution, D is too.
T2.1.5 For 3-dynamic to 1-dynamic, we have
→F(π,h,a):=prO×PD∗(u1{a}(→T(π,h)))
For the action π() (which is compatible with the start of the history), we have
→F(π,π()oh,π())(f)
Applying our definition, we have
=prO×PD∗(u1{π()}(→T(π,π()oh)))(f)
Unpacking the projection, we have (where s is a state)
=u1{π()}(→T(π,π()oh))(λa,o,s.f(o,s))
Unpacking the update, we have
=→T(π,π()oh)(λa,o,s.1a=π()f(o,s)+1a≠π())
And then we consider that →T(π,π()oh) is just the dirac-delta distribution on (π(),o,(ππ()o,h)), so we have
=δπ(),o,(ππ()o,h)(λa,o,s.1a=π()f(o,s)+1a≠π())
Substitute the values in, and we have
=f(o,(ππ()o,h))
and then pack up the dirac-delta and we get
=δo,(ππ()o,h)(f)
So, →F(π,π()oh,π()) is just the dirac-delta distribution on o and the pair of h and the policy advanced one step, as it should be. For a′ that aren’t π() as it should be, we have
→F(π,π()oh,a′)(f)
Applying our definition, we have
=prO×PD∗(u1{a′}(→T(π,π()oh)))(f)
Unpacking the projection, we have (where s is a state)
=u1{a′}(→T(π,π()oh))(λa,o,s.f(o,s))
Unpacking the update, we have
=→T(π,π()oh)(λa,o,s.1a=a′f(o,s)+1a≠a′)
And then we consider that →T(π,π()oh) is just the dirac-delta distribution on (π(),o,(ππ()o,h)), so we have
=δπ(),o,(ππ()o,h)(λa,o,s.1a=a′f(o,s)+1a≠a′)
Substitute the values in, and remembering that a′ and π() are assumed to be different, and we have (regardless of f)
=1
Or infinity for the other type signature. Therefore,
→F(π,π()oh,a′)=⊥
So our transition kernel works out. For the other conditions on ψF, just observe that ψF:=ψT, and use the extreme similarity of their defining conditions.
T2.1.6 For 1-dynamic to 3-dynamic, we have ψT:=ψF so we can use our defining property of ψF again, clearly show that ψT has the relevant defining properties, so that just leaves cleaning up the defining property for the →T infrakernel. We remember that
→T(π,h):=⊤A⋉(λa.→F(π,h,a))
Therefore, we have:
→T(π,π()oh)(f)=(⊤A⋉(λa.→F(π,π()oh,a)))(f)
This occurs because the history must be compatible with the policy. Then, we can unpack as:
=⊤A(λa.→F(π,π()oh,a)(λo,s.f(a,o,s)))
We unpack our ⊤A, getting
=infa(→F(π,π()oh,a)(λo,s.f(a,o,s)))
Then, we observe that for any a≠π(), →F(π,π()oh,a)=⊥. So, a must be π(), and we have
=→F(π,π()oh,π())(λo,s.f(π(),o,s))
Now, →F(π,π()oh,π()) is the dirac-delta distribution on (o,(h,ππ()o)), so making that substitution in, we have
=δo,(h,ππ()o)(λo,s.f(π(),o,s))
Ie
=f(π(),o,(h,ππ()o))=δπ(),o,(h,ππ()o)(f)
And this holds for all f, so we get our desired result that
→T(π,π()oh)=δπ(),o,(h,ππ()o)
T2.1.7 For 1-static to 1-dynamic, we just have
ψF:=⊤Π⋉Θ
→F(π,aoh,a):=δo,(πao,h)
→F(π,aoh,a′):=⊥
So this automatically makes the infrakernel have the desired properties. For showing the relevant properties of ψF, we just copy the proof at the start of getting the properties for D from Θ being a belief function, since D and ψF are both defined in the same way.
T2.1.8 For 1-dynamic to 1-static, we have
Θ(π′):=pr(A×O)ω∗((u1{π′}(ψF))⋉→π′F:∞)
We’ll work on getting that latter quantity into a better form, but first we have to verify that →πF:∞ is even well-defined at all, w need the →πF:n to fulfill the niceness conditions. They’re defined as
→πF:n(s0,aos1:n):=δπ(ao1:n)⋉(λa.→F(sn,a))
For lower-semicontinuity, →F is lower-semicontinuous, because it returns ⊥ on every action but one (this is a clopen set), and for the action that pairs up with the initial state, the policy-tagged history just has the history and policy advanced, which is continuous.→πF:n behaves similarly, continuous variation in input leads to continuous variation in output, because eventually the behavior of the policy settles down to a fixed action as the history of actions and observations stabilizes amongst one of the finitely many options.
For making inframeasures, →πF:n always returns a dirac-delta distribution or ⊥, so we’re good there. Similarly, for 1-Lipschitzness, both ⊥ and dirac-delta distributions are 1-Lipschitz. For compact-shared CAS, as the target space is A×O×PD, it’s compact and you don’t need to worry about that. Finally, both dirac-delta distributions and ⊥ map constants to either the same constant or higher, and in the [0,1] type signature, both ⊥ and dirac-delta distributions map 1 to 1. So all niceness conditions are fulfilled.
Let’s work on unpacking the value
pr(A×O)ω∗((u1{π′}(ψF))⋉→πF:∞)(U)
unpacking the projection and semidirect product, we get
And then we can realize that when we unroll →π′F:∞(π′,h), it just always deterministically unrolls the history, along with the policy, since for π′,h to be an input state, π′ must be compatible with h, so picking actions from π′ means there’s never any action mismatches. Projecting this down to the actions and observations just makes the history h exactly. So we have
=ψF(λπ,h.1π=π′δh(λao1:∞.U(ao1:∞))+1π≠π′)
Substituting the dirac-delta value in, we have
=ψF(λπ,h.1π=π′U(h)+1π≠π′)
And we pack the update and projection back up to get
We can run through the same proof that Θ is indeed a belief function, back from the 3-static to 1-static case, because ψF fulfills all the same properties that D did.
Alright, now we have, for all four corners, a condition that’s basically “came from an acausal belief function” that is preserved under morphisms. Now we need to show that all 8 back-and-forth directions are identity.
T2.2.1 For 1-static to 3-static back to 1-static, we want to show
Θ(π′)=pr(A×O)ω∗(u1{π′}(⊤Π⋉Θ))
This is (we assume the hard version first)
pr(A×O)ω∗(u1{π′}(⊤Π⋉Θ))(U)
Then unpack the projection and update
=(⊤Π⋉Θ)(λπ,h.1π=π′U(h)+1π≠π′)
Then unpack the semidirect product and ⊤Π to get
=infπΘ(π)(λh.1π=π′U(h)+1π≠π′)
Then realize that the minimizer is picking π′ exactly, otherwise you’d just get maximal value and all the Θ(π) agree on what a maximal input maps to.
=Θ(π′)(λh.U(h))=Θ(π′)(U)
And we’re done.
T.2.2.2 For 3-static to 1-static to 3-static being identity, we want to show
D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
This is just exactly the condition we’re assuming on D, so we trivially fulfill this (the nontrivial part was shoved into showing that going belief-function to infradistribution over policy-tagged states produced this property).
T2.2.3 For 3-static to 3-dynamic back to 3-static, we need to show that
D=prΠ×(A×O)ω∗(D⋉→T:∞)
For this, we abbreviate states beyond the first one as s, so (h,π)0 is the initial state, and aos1:∞ is an element of (A×O×PD)ω. Taking the complicated part, it’s
prΠ×(A×O)ω∗(D⋉→T:∞)(f)
=(D⋉→T:∞)(λ(π,h)0,aos1:∞.f(π,ao1:∞))
And then we unpack the semidirect product a bit, to get
=D(λ(π,h)0.→T:∞((π,h)0)(λaos1:∞.f(π,ao1:∞)))
And then we can write this as a projection, to get
Now, we observe that because →T, when repeatedly unrolled, is just emitting the actions and observations from the starting destiny/history, this projection is just the point distribution on the action-observation sequence that is h.
=D(λ(π,h)0.δh(λao1:∞.f(π,ao1:∞)))
Then we evaluate the expectation, yielding
=D(λπ,h.f(π,h))=D(f)
And we’re done.
T2.2.4 For going 3-dynamic to 3-static to 3-dynamic, we need that
ψT=prΠ×(A×O)ω∗(ψT⋉→T:∞)
To show this, just reuse the exact same proof from above, just with ψT instead of D. Also, →F(π,h) works out.
At this point, we’ve established isomorphism for two sides of the square. We’ve just got two more sides left to address, then showing that going around the square results in identity instead of a nontrivial automorphism.
T2.2.5,6 For translating back and forth between 3-dynamic and 1-dynamic, we observe that both translations keep the initial infradistribution over policy-tagged destinies the same, and our 3-dynamic to 1-dynamic, and 1-dynamic to 3-dynamic proofs verified that the →F and →T behave as they should when they go back and forth, so we don’t need to worry about this.
T2.2.7 Next is 1-dynamic to 1-static to 1-dynamic. The infrakernel is guaranteed to have the right form, we just need to show that ψF is unchanged. So, we must show
ψF=⊤Π⋉(λπ.pr(A×O)ω∗((u1{π}(ψF))⋉→πF:∞))
But, since our condition on 1-dynamic (that we showed is a property of applying the morphism to go from any belief function to first-person dynamic) was
ψF=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψF)))
The only thing we need to show is that, regardless of π,
At this point, we can realize something interesting.pr(A×O)ω∗(→π′F:∞(π′,h)) is “start with an initial policy of π′ and h compatible with π′, then repeatedly feed in actions as if they were created by π′ forever, and forget about the states, leaving just the action-observation sequence”. Now, since the action (being produced by π′) always lines up with what the π′ encoded in the state would do, this process never hits the ⊥ infradistribution, it keeps going on and on and advancing the history with no issue. In particular, the history unrolls to completion, and the resulting action-observation sequence you get would just be the original destiny packed up in the state. So, this infradistribution is just δh. Making this substitution in, we get
=ψF(λπ′,h.1π′=π⋅δh(U)+1π′≠π)
=ψF(λπ′,h.1π′=πU(h)+1π′≠π)
Now we pack up the update again, to get
=u1{π}(ψF)(λπ′,h.U(h))
And then realize that this is a projection, to get
=pr(A×O)ω∗(u1{π}(ψF))(U)
And we’re done, we showed our desired result to establish that going first-person dynamic to static back to dynamic is identity.
T2.2.8 Time for the last leg of the square, that going first-person static to first-person dynamic and first-person static is identity. We want to show that, regardless of π′,
Θ(π′)=pr(A×O)ω∗((u1{π′}(⊤Π⋉Θ))⋉→π′F:∞)
Again, like usual, we’ll take the complicated thing and repeatedly rewrite it to reach the simpler thing, for an arbitrary function.
pr(A×O)ω∗((u1{π′}(⊤Π⋉Θ))⋉→π′F:∞)(U)
Using s for states, we can rewrite the projection, to get
=((u1{π′}(⊤Π⋉Θ))⋉→π′F:∞)(λs0,aos1:∞.U(ao1:∞))
Then we rewrite the semidirect product, and unpack the initial state into a policy π and destiny h compatible with π, to get
=u1{π′}(⊤Π⋉Θ)(λπ,h.→π′F:∞(π,h)(λaos1:∞.U(ao1:∞)))
We can then rewrite the interior as a projection to get
=u1{π′}(⊤Π⋉Θ)(λπ,h.pr(A×O)ω∗(→π′F:∞(π,h))(U))
Now, it’s time to rewrite the update. It rewrites as:
And use our usual argument from before that pr(A×O)ω∗(→π′F:∞(π′,h)) is just the dirac-delta distribution on h, to get
=(⊤Π⋉Θ)(λπ,h.1π=π′U(h)+1π≠π′)
Now, we can unpack the semidirect product as well as what ⊤Π means, to get:
=infπΘ(π)(λh.1π=π′U(h)+1π≠π′)
Now, if π≠π′, then that inner function turns into 1 (or infinity) which is assigned maximum value by all Θ(π), so it’s minimized when π=π′, so we get:
=Θ(π′)(λh.U(h))=Θ(π′)(U)
And we’re done with the last “doing this path is identity” result. All that remains in our proof is just showing that taking both paths from one corner to another corner makes the same result, to show that going around in a loop is identity, ruling out nontrivial automorphisms.
T2.3 The easiest pair of corners for this is going from first-person static to third-person dynamic in both ways. Obviously, the transition kernel would be the same no matter which path you took, which just leaves verifying that the starting infradistribution is the same. Down one path, we have
D:=⊤Π⋉Θ,ψT:=D
Down the other path, we have
ψF:=⊤Π⋉Θ,ψT:=ψF
Obviously, both these paths produce the same result when you try to define ψT in both ways from Θ. And we’re done!
Theorem 3: Pseudocausal Commutative Square:The following diagram commutes for any choice of pseudocausal belief function Θ. Any choice of infradistribution D where D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D))) also makes this diagram commute.
Again, the work we need to put in is getting conditions in the four corners that are “came from a pseudocausal belief function”, then, for phase 1, verifying that all 8 morphisms preserve the relevant property in the target if the source had it, then for phase 2, verifying that all 8 ways of going from a corner to 1 away and back result in identity to get 4 isomorphisms, then finally showing that starting in one corner and going to the other corner by two different paths make the same result, to rule out nontrivial automorphisms.
The property for third-person static is D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
The property for third-person dynamic is ψT=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψT))) and →T(aoh)=δa,o,h
The property for first-person static is pseudocausality, ∀π′,π,U:u1∼π′(Θ(π))(U)≥Θ(π′)
The property for first-person dynamic is ψF=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψF))) and →F(aoh,a′)=δo,h when a′=a, and ⊥ when a′≠a.
Time to show preservation of these properties by all the morphisms, as well as that the translations make a belief function/infradistribution.
Our first step is showing that all 8 morphisms induce the relevant property in the target if the source has the property.
T3.1.1 3-static to 1-static. Our definition is
Θ(π):=u1∼π(D)
and we must check the five conditions of a belief function, as well as pseudocausality.
T3.1.1.1 Checking the five conditions of a belief function proceeds in almost exactly the same way as checking the five conditions of a belief function in the “Acausal Commutative Square” theorem. We leave it as an exercise to the reader, and will just elaborate on the only nontrivial new argument needed.
The only nontrivial modification is in our proof of lower-semicontinuity. In order for it to work, we need that for any m, the subset of (A×O)ω where h∼πm or h∼πm+1 or πm+2… or π∞ is closed.
We can rephrase the set {h|∃n≥m:h∼πn} as the projection down to the (A×O)ω coordinate of the set
({πn}n≥m×(A×O)ω)∩{π,h|h∼π}
(A×O)ω is compact, and {πn}n≥m is compact as well, since it limits to a point and the limit point is included. So that product is a compact set. The set {π,h|h∼π} is closed. So ({πn}n≥m×(A×O)ω)∩{π,h|h∼π} is the intersection of a compact and a closed set, and so is compact. And projections of compact sets are compact, and compactness implies closure. So, the set of histories {h|∃n≥m:h∼πn} is closed regardless of m, even if m=∞.
The rest of the proof works out with no issue, so translating over an infradistribution makes a belief function.
T3.1.1.2 We still have to check pseudocausality however. Our translation is
Θ(π):=u1∼π(D)
and we want to show
∀π′,π,U:u1∼π′(Θ(π))(U)≥Θ(π′)(U)
Reexpressing this desired statement in terms of D, we have
∀π′,π,U:u1∼π′(u1∼π(D))(U)≥u1∼π′(D)(U)
Let π′ and π be arbitrary. Then the left side is
u1∼π′(u1∼π(D))(U)
We unpack the update
=u1∼π(D)(λh.1h∼π′U(h)+1h≁π′)
And unpack the update again, to get
=D(λh.1h∼π⋅(1h∼π′f(h)+1h≁π′)+1h≁π)
Looking at this, the function is U when h is compatible with both π′ and π, and 1 otherwise. This is a greater function than U when h is compatible with just π′, and 1 otherwise, so by monotonicity for infradistributions, we can get:
≥D(λh.1h∼π′U(h)+1h≁π′)
and pack up the update to get
=u1∼π′(D)(U)
And we’re done, we get pseudocausality.
T3.1.2 1-static to 3-static. Our translation is
D:=pr(A×O)ω∗(⊤Π⋉Θ)
We’ll show the infradistribution conditions on D. First, to show it’s even well-defined, we have
D(U)=pr(A×O)ω∗(⊤Π⋉Θ)(U)=⊤Π(λπ.Θ(π)(λh.U(h)))
That inner function is lower-semicontinuous in π, because Θ was assumed to have that property in its first argument. Since the inner function is lower-semicontinuous, it can be evaluated.
Now for the condition that
D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
We start with D(U) and unpack the definition of D, to get
=pr(A×O)ω∗(⊤Π⋉Θ)(U)
And then undo the projection and semidirect product and what ⊤Π means, to get
=infπΘ(π)(λh.U(h))
And then, by pseudocausality for Θ (for all π′, u1∼π(Θ(π′))(U)≥Θ(π)(U) and also u1∼π(Θ(π))(U)=Θ(π)(U) because Θ(π) is supported entirely on histories compatible with π), we can swap Θ(π)(λh.U(h)) out for
=infπinfπ′u1∼π(Θ(π′))(λh.U(h))
And then unpack the update, to get
=infπinfπ′Θ(π′)(λh.1h∼πU(h)+1h≁π)
Now we pack up what ⊤Π means, and the semidirect product, to get
=infπ(⊤Π⋉Θ)(λπ′,h.1h∼πU(h)+1h≁π)
Write this as a projection
=infπpr(A×O)ω∗(⊤Π⋉Θ)(λh.1h∼πU(h)+1h≁π)
Pack up the definition of D to get
=infπD(λh.1h∼πU(h)+1h≁π)
Pack up the definition of “1-update on π” to get
=infπu1∼π(D)(λh.U(h))
Pack up what ⊤Π means and the semidirect product and the projection to get
=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))(U)
So we have
D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
as desired, since U was arbitrary.
Now to verify the infradistribution conditions on D. This proof is pretty much identical to the proof in the “Acausal Commutative Square” theorem, interested readers can fill it in.
T3.1.3 For 3-static to 3-dynamic, we have →T(aoh):=δa,o,h (which fulfills one of the 3-dynamic conditions), and ψT:=D
Since D is an infradistribution, ψT is too, and for the condition on ψT, we have:
and we’re done, since D had the relevant condition.
T3.1.4 For 3-dynamic to 3-static, we have
D:=pr(A×O)ω∗(ψT⋉→T:∞)
First, we must show that the niceness conditions are fulfilled for the infinite semidirect product to be defined.→T maps aoh to a,o,h. This is clearly a continuous mapping, so in particular, it’s lower-semicontinuous. Dirac-delta distributions are 1-Lipschitz when interpreted as inframeasures (all probability distributions have that property, actually). The compact-shared-CAS condition is redundant because A×O×(A×O)ω is already a compact space. And all probability distributions (dirac-delta distributions are probability distributions) map constants to the same constant, so we get the increase-in-constants property and the 1 maps to 1 property. So we can indeed take the infinite semidirect product of the →T.
Now, we’ll show that unrolling like this just produces ψT exactly. We’ll use h0 for the initial destiny state, and aoh1:∞ for the successive unrolled history of states, actions, and observations. First, by unpacking D’s definition,
D(U)=pr(A×O)ω∗(ψT⋉→T:∞)(U)
Then we unpack the projection and semidirect product
=ψT(λh0.→T:∞(h0)(λaoh1:∞.U(ao1:∞)))
And pack things up into a projection
=ψT(λh0.pr(A×O)ω∗(→T:∞(h0))(λao1:∞.U(ao1:∞)))
And then we can observe that unrolling the initial destiny forever via “advance the destiny 1 step, popping actions and observations off”, when you project down to just the actions and observations, yields just the point distribution on that destiny, so we get:
=ψT(λh.δh(λao1:∞.U(ao1:∞)))
And then substitute the value of that dirac-delta in to get
and we’re done, showing D had the relevant condition from ψT having it. Since ψT is an infradistribution, D is too.
T3.1.5 For 3-dynamic to 1-dynamic, we have
→F(h,a′):=prO×(A×O)ω∗(u1{a′}(→T(h)))
Let’s write h as aoh (unpacking the destiny a bit). Now, for the action a (which is compatible with the start of the history), we have
→F(aoh,a)(f)
Applying our definition, we have
=prO×(A×O)ω∗(u1{a}(→T(aoh)))(f)
Unpacking the projection, we have
=u1{a}(→T(aoh))(λa′,o′,h′.f(o′,h′))
Unpacking the update, we have
=→T(aoh)(λa′,o′,h′.1a′=af(o′,h′)+1a′≠a)
And then we consider that →T(aoh) is just the dirac-delta distribution on (a,o,h), so we have
=δa,o,h(λa′,o′,h′.1a′=af(o′,h′)+1a′≠a)
Substitute the values in, and we have
=f(o,h)=δo,h(f)
So, →F(a,aoh) is just the dirac-delta distribution on o and h, as it should be.
For a′ that aren’t a as it should be, we have
→F(aoh,a′)(f)
Applying our definition, we have
=prO×(A×O)ω∗(u1{a′}(→T(aoh)))(f)
Unpacking the projection, we have
=u1{a′}(→T(aoh))(λa′′,o′,h′.f(o′,h′))
Unpacking the update, we have
=→T(aoh)(λa′′,o′,h′.1a′′=a′f(o′,h′)+1a′′≠a′)
And then we consider that →T(aoh) is just the dirac-delta distribution on (a,o,h), so we have
=δa,o,h(λa′′,o′,h′.1a′′=a′f(o′,h′)+1a′′≠a′)
and because a′≠a, substituting the dirac-delta in produces 1 (or infinity), ie, ⊥(f). Therefore,
→F(aoh,a′)=⊥
So our transition kernel works out. For the other conditions, just observe that ψF:=ψT, and use the extreme similarity of their defining conditions.
T3.1.6 For 1-dynamic to 3-dynamic, we have ψT:=ψF so we can use our defining property to again, clearly show that the resulting 3-dynamic starting infradistribution has the relevant property, so that just leaves cleaning up the defining property for the →T infrakernel. We remember that
→T(h):=⊤A⋉(λa.→F(h,a))
Therefore, we have:
→T(aoh)(f)=(⊤A⋉(λa′.→F(aoh,a′)))(f)
Then, we can unpack the semidirect product as:
=⊤A(λa′.→F(aoh,a′)(λo′,h′.f(a′,o′,h′)))
We unpack our ⊤A, getting
=infa′(→F(aoh,a′)(λo′,h′.f(a′,o,s)))
Then, we observe that for any a′≠a, →F(aoh,a′)=⊥. So, a′ must be a, and we have
=→F(aoh,a)(λo′,h′.f(a,o′,h′))
Now, →F(aoh,a) is the dirac-delta distribution on (o,h), so making that substitution in, we have
=δo,h(λo′,h′.f(a,o′,h′))=f(a,o,h)=δa,o,h(f)
And this holds for all f, so we get our desired result that
→T(aoh)=δa,o,h
T3.1.7: For 1-static to 1-dynamic, we just have
ψF:=pr(A×O)ω∗(⊤Π⋉Θ)
→F(aoh,a):=δo,h
→F(aoh,a′):=⊥
So this automatically makes the infrakernel have the desired properties. For showing the relevant property of ψF, we just copy the proof at the start of getting the properties for D from Θ being a belief function, since D and ψF are both defined in the same way.
T3.1.8 For 1-dynamic to 1-static, we have
Θ(π′):=pr(A×O)ω∗(ψF⋉→π′F:∞)
We’ll work on getting that latter quantity into a better form, but first we have to verify that →πF:∞ has the →πF:n fulfilling the niceness conditions. It’s defined as
→πF:n(h0,aoh1:n):=δπ(ao1:n)⋉(λa.→F(hn,a))
For lower-semicontinuity, →F is lower-semicontinuous, because it returns ⊥ on every action but one (this is a clopen set), and for the action that pairs up with the initial destiny, it just advances the destiny, which is continuous.→πF:n behaves similarly, continuous variation in input leads to continuous variation in output, because eventually the behavior of the policy settles down to a fixed action as the history of actions and observations stabilizes amongst one of the finitely many options.
For making inframeasures, →πF:n always returns a dirac-delta distribution or ⊥, so we’re good there. Similarly, for 1-Lipschitzness, both ⊥ and dirac-delta distributions are 1-Lipschitz. For compact-shared CAS, as the target space is A×O×(A×O)ω, it’s compact and you don’t need to worry about that. Finally, both dirac-delta distributions and ⊥ map constants to either the same constant or higher, and in the [0,1] type sinature, both ⊥ and dirac-delta distributions map 1 to 1. So all niceness conditions are fulfilled.
Let’s work on unpacking the value
pr(A×O)ω∗(ψF⋉→πF:∞)(U)
unpacking the projection and semidirect product, we get
=ψF(λh0.→πF:∞(h)(λaoh1:∞.U(ao1:∞)))
And then we can write this as a projection in the inside
=ψF(λh0.pr(A×O)ω∗(→πF:∞(h))(λao1:∞.U(ao1:∞)))
One of two things can happen. In the first case, h is compatible with π, so playing π against it never hits ⊥, and it unrolls to completion, and projecting down just yields the dirac-delta distribution on h itself. In the second case, h is incompatible with π, so eventually we hit ⊥, which maps everything to maximal value (either 1 or infinity). Thus, we can write this as an indicator function
=ψF(λh.1h∼πδh(λao1:∞.U(ao1:∞))+1h≁π′)
Substituting the dirac-delta value in, we have
=ψF(λh.1h∼πU(h)+1h≁π)
And we pack the update back up to get
=u1∼π(ψF)(U)
Since U was arbitrary, we have
pr(A×O)ω∗(ψF⋉→πF:∞)=u1∼π′(ψF)
And so, an alternate way to define Θ(π) is as
=u1∼π(ψF)
We can then run through the same proof that Θ is indeed a belief function, back from the 3-static to 1-static case, because ψF fulfills all the same properties that D did.
Alright, now we have, for all four corners, a condition that’s basically “came from a pseudocausal belief function” that is preserved under morphisms. Now we need to show that all 8 back-and-forth directions are identity.
T3.2.1 For 1-static to 3-static back to 1-static, we want to show
Θ(π′)=u1∼π′(pr(A×O)ω∗(⊤Π⋉Θ))
This is (we assume the hard version first)
u1∼π′(pr(A×O)ω∗(⊤Π⋉Θ))(U)
Then unpack the update
=pr(A×O)ω∗(⊤Π⋉Θ)(λh.1h∼π′U(h)+1h≁π′)
And the projection
=(⊤Π⋉Θ)(λπ,h.1h∼π′U(h)+1h≁π′)
Then unpack the semidirect product and ⊤Π to get
=infπΘ(π)(λh.1h∼π′U(h)+1h≁π′)
then rewrite this as an update
=infπu1∼π′(Θ(π))(λh.U(h))
And then we can apply pseudocausality of \BF to get that the minimizer must pick π′ exactly because u1∼π′(Θ(π))(U)≥Θ(π′)(U)=u1∼π′(Θ(π′))(U) and so we get
=Θ(π′)(U)
and we’re done.
T3.2.2 For 3-static to 1-static to 3-static, we must show
D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
Well, the property for a third-person static D is exactly that.
T3.2.3 For 3-static to 3-dynamic to 3-static, we want that
D=pr(A×O)ω∗(D⋉→T:∞)
Taking the complicated part, it’s
pr(A×O)ω∗(D⋉→T:∞)(U)
We unpack the projection
=(D⋉→T:∞)(λh0,aoh1:∞.U(ao:∞))
And then we unpack the semidirect product a bit, to get
=D(λh0.→T:∞(h0)(λaoh1:∞.U(ao1:∞)))
And then we can write this as a projection, to get
=D(λh0.pr(A×O)ω∗(→T:∞(h0))(λao1:∞.U(ao1:∞)))
Now, we observe that because →T, when repeatedly unrolled, is just emitting the actions and observations from the starting destiny, this projection is just the point distribution on the action-observation sequence that is h0.
=D(λh0.δh0(λao1:∞.U(ao1:∞)))
Then we evaluate the expectation, yielding
=D(λh.U(h))=D(U)
And we’re done.
T3.2.4 For going 3-dynamic to 3-static to 3-dynamic, we need that
ψT=pr(A×O)ω∗(ψT⋉→T:∞)
To show this, just reuse the exact same proof from above, just with ψT instead of D. Also, →F(h) works out as it should.
At this point, we’ve established isomorphism for two sides of the square. We’ve just got two more sides left to address, then showing that going around the square results in identity instead of a nontrivial automorphism.
T3.2.5,6 For translating back and forth between 3-dynamic and 1-dynamic, we observe that both translations keep the initial infradistribution over destinies the same, and our 3-dynamic to 1-dynamic, and 1-dynamic to 3-dynamic proofs verified that the →F and →T behave as they should when they go back and forth, so we don’t need to worry about this.
T3.2.7 Next is 1-dynamic to 1-static to 1-dynamic. The infrakernel is guaranteed to have the right form, we just need to show that ψF is unchanged. So, we must show
ψF=pr(A×O)ω∗(⊤Π⋉(λπ.pr(A×O)ω∗(ψF⋉→πF:∞)))
But, since our condition on 1-dynamic (that we showed is a property of applying the morphism to go from any belief function to first-person dynamic) was
ψF=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψF)))
The only thing we need to show is that, regardless of π,
and we’d be done. However, from the 1-dynamic to 1-static proof of property preservation, we already showed that
pr(A×O)ω∗(ψF⋉→πF:∞)(U)=u1∼π(ψF)(U)
back then, so we’re just done.
T3.2.8 Time for the last leg of the square, that going first-person static to dynamic and back to static is identity. We want to show that, regardless of π′,
Θ(π′)=pr(A×O)ω∗((pr(A×O)ω∗(⊤Π⋉Θ))⋉→π′F:∞)
To do this, we use the result from the 1-dynamic to 1-static proof of property preservation, that
pr(A×O)ω∗(ψF⋉→πF:∞)=u1∼π(ψF)
Applying this fact with ψF as an abbreviation for pr(A×O)ω∗(⊤Π⋉Θ), we can rewrite our proof goal equivalently with this fact as
Θ(π′)=pr(A×O)ω∗(u1∼π′(pr(A×O)ω∗(⊤Π⋉Θ)))
But we showed this exact result when we showed that 1-static to 3-static to 1-static was identity, so we’re done.
And we’re done with the last “doing this path is identity” result. All that remains in our proof is just showing that taking both paths from one corner to another corner makes the same result, to show that going around in a loop is identity, ruling out nontrivial automorphisms.
T3.3 The easiest pair of corners for this is going from first-person static to third-person dynamic in both ways. Obviously, the transition kernel would be the same no matter which path you took, which just leaves verifying that the starting infradistribution is the same. Down one path, we have
D:=pr(A×O)ω∗(⊤Π⋉Θ),ψT:=D
Down the other path, we have
ψF:=pr(A×O)ω∗(⊤Π⋉Θ),ψT:=ψF
Obviously, both these paths produce the same result when you try to define ψT in both ways from Θ. And we’re done!
Proofs Theorems 2,3
Theorem 2: Acausal Commutative Square: The following diagram commutes for any belief function Θ. Any infradistribution D where D=⊤Π⋉(λπ.u1{π}(D)) also makes this diagram commute and induces a belief function.
Here’s how the proof proceeds. In order to show that the full diagram commutes, we first select properties on the four corners such that for each morphism, if the source fulfills the relevant property, then the target fulfills the relevant property. Obviously, the property for the bottom-right corner must be “is a belief function”. There are eight phases for this, some of which are quite easy, but the two most difficult are at the start, corresponding to the two morphisms on the bottom side, because we must exhaustively verify the belief function conditions and infradistribution conditions. As usual, lower-semicontinuity is the trickiest part to show. Then, since we have four sides of the square to show are isomorphisms, we have another eight phases where we verify going forward to a different corner and back is identity. Finally, there’s an easy step where we rule out that going around in a loop produces a nontrivial automorphism, by showing that both paths from first-person static to third-person dynamic are equal.
First, let’s define our properties.
The property for third-person static is D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D))), and D is an infradistribution.
The property for third-person dynamic is ψT=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψT))), and →T(π,aoh)=δa,o,(πao,h), and ψT is an infradistribution.
The property for first-person static is that Θ is a belief function.
The property for first-person dynamic is ψF=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψF))), and →F((π,aoh),a′)=δo,(πao,h) when a′=a, and ⊥ when a′≠a, and ψF is an infradistribution.
T2.1 Our first step is showing that all 8 morphisms induce the relevant property in the target if the source has the property.
T2.1.1 3-static to 1-static. Our definition is
Θ(π):=pr(A×O)ω∗(u1{π}(D))
and we must check the five conditions of a belief function for Θ, assuming D is an infradistribution on PD and it has the property D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
T2.1.1.1 First, bounded Lipschitz constant. Our proof goal is
∃λ⊙<∞∀π′,U,U′:|Θ(π)(U)−Θ(π)(U′)|≤λ⊙d(U,U′)
Let λ⊙ be the Lipschitz constant of D, which is finite because D is an infradistribution. Then let π′,U,U′ be arbitrary (they must be bounded-continuous, though). We have
|Θ(π′)(U)−Θ(π′)(U′)|=|pr(A×O)ω∗(u1{π′}(D))(U)−pr(A×O)ω∗(u1{π′}(D))(U′)|
and then unpack the projection to get
=|u1{π′}(D))(λπ,h.U(h))−u1{π′}(D))(λπ,h.U′(h))|
and unpack the update to get
=|D(λπ,h.1π=π′U(h)+1π≠π′)−D(λπ,h.1π=π′U′(h)+1π≠π′)|
and then by Lipschitzness for D,
≤λ⊙supπ,h∼π|1π=π′U(h)+1π≠π′−(1π=π′U′(h)+1π≠π′)|
Now, when π≠π′, the two functions are equal, so we have
=λ⊙suph∼π′|U(h)−U′(h)|≤λ⊙suph|U(h)−U′(h)|=λ⊙d(U,U′)
We’ll leave lower-semicontinuity for the end, it’s hard.
T2.1.1.2 Now, normalization, ie infπΘ(π)(1)=1∧infπΘ(π)(0)=0 To do this, we have
infπΘ(π)(1)=infπpr(A×O)ω∗(u1{π}(D))(1)
Then this can be rewritten as
=⊤Π(λπ.pr(A×O)ω∗(u1{π}(D))(1))
and then as a semidirect product.
=(⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))(1)
and then, from our defining property for D, and normalization for D since it’s an infradistribution, we have
=D(1)=1
The same proof works for 0 as well.
T2.1.1.3 Now, sensible supports. Pick a π′,U,U′ where U and U′ are identical on histories compatible with π′. We want to show
Θ(π′)(U)=Θ(π′)(U′)
To start,
Θ(π′)(U)=pr(A×O)ω∗(u1{π′}(D))(U)
and undoing the projection and update, we have
=D(λπ,h.1π=π′U(h)+1π≠π′)
Remember that actually, D is supported over PD, ie, the subset of Π×(A×O)ω where the destiny is compatible with the policy, so when π=π′, h∼π=π′ and so U and U′ behave identically. Making this substitution in, we have
=D(λπ,h.1π=π′U′(h)+1π≠π′)
which then packs up as Θ(π′)(U′) and we’re done.
T2.1.1.4 Now for agreement on max value. For the [0,1] type signature on inframeasures, we want to show
∀π′,π′′:Θ(π′)(1)=Θ(π′′)(1)
Let π′,π′′ be arbitrary. We have
Θ(π′)(1)=pr(A×O)ω∗(u1{π′}(D))(1)
and then the projection and update unpack to
D(λπ,h.1π=π′1+1π≠π′)=D(1)=D(λπ,h.1π=π′′1+1π≠π′′)
and then, working in reverse from there, we get =Θ(π′′)(1) and we’re done. For the R type signature, we want to show
∀π′,π′′:Θ(π′)(∞)=Θ(π′′)(∞)
Let π′,π′′ be arbitrary. We have
Θ(π′)(∞)=pr(A×O)ω∗(u∞{π′}(D))(∞)
which then unpacks as
D(λπ,h.1π=π′∞+1π≠π′∞)=D(∞)=D(λπ,h.1π=π′′∞+1π≠π′′∞)
and then, working in reverse, we get =Θ(π′′)(∞) and we’re done.
T2.1.1.5 Time to revisit lower-semicontinuity. We want to show that
liminfn→∞Θ(πn)(U)≥Θ(π∞)(U)
This is going to take some rather intricate work with Propositions 2 and 3, and Theorem 1 from Inframesures and Domain Theory. Namely, the supremum of continuous functions is lower-semicontinuous, any lower-semicontinuous lower-bounded function has an ascending sequence of lower-bounded continuous functions which limits to it pointwise, and for any ascending sequence which limits to a lower-semicontinuous function pointwise, you can shuffle the limit outside of an inframeasure. Fix some continuous bounded U, either in [0,1] or in R, and sequence of policies πn which converge to π∞.
Our first task is to establish that the subset of PD where π=πn or πn+1 or πn+2… or π∞ is closed. This can be viewed as the following subset of Π×(A×O)ω:
({πn}n≥m×(A×O)ω)∩{π,h|h∼π}
(A×O)ω is compact, and {πn}n≥m is compact as well, since it limits to a point and the limit point is included. So that product is a compact set, and must be closed. The latter set is PD itself. And so, since this set is the intersection of a closed set with PD, it’s closed in PD with the subspace topology.
Now, define the functions u≥m:PD→R⊔{∞} as follows. If (π,h) has π=πm or πm+1 or πm+2 or… or π∞, then return U(h). Otherwise, return 1 (or ∞ for the other type signature).
u∞ is the function where, if π=π∞, return U(h), otherwise return 1 (or ∞).
Our first task is to show that all the u≥m, and u∞ are lower-semicontinuous. This is pretty easy to do. They are 1 (or ∞) outside of a closed set, and less than that inside the closed set.
Lower-semicontinuity happens because we have one of three cases holding. In the first case, the limit point is outside of the closed set (in the complement open set). Then, at some finite stage and forever afterwards, the sequence is outside closed set of interest, and the sequence becomes just a constant value of 1 (or ∞), and so liminf equals the value of the limit.
In the second case, the limit point is inside the closed set, and there’s a subsequence of policy-tagged destinies which remains within the closed set of interest. The liminf of the original sequence of policy-tagged destinies evaluated by u≥m equals the liminf of the subsequence which stays in the “safe” closed set, because all other points in the sequence get maximal value according to u≥m (because they’re outside the closed set), so they’re irrelevant for evaluating the liminf. Then, inside the closed set, u≥m acts like U, which is continuous, so the liminf of u≥m inside the closed set equals the limit of U inside the closed set, which equals u≥m applied to the limit point (in the closed set)
In the third case, the limit point is inside the closed set, but there’s no subsequence of policy-tagged destinies which remains within the closed set of interest. So, after some finite stage, the value of u≥m is 1 or infinity, so the liminf is high and the value of the limit point is low, and again we get lower-semicontinuity.
Finally, we need to show that u≥m is an ascending sequence of functions which has u∞ as the pointwise limit. The “ascending sequence” part is easy, because as m increases, u≥m maps a smaller and smaller set (fewer policies) to the value U reports, and more points to 1 (or infinity).
If π,h has π=π∞, then regardless of n, u≥n(π,h)=U(h)=u∞(π,h), so that works out. If π,h has π≠π∞, then the sequence πn can’t hit π infinitely often, otherwise π=π∞, so after some finite n, π,h drops out of the closed set associated with u≥n, and we have limn→∞(u≥n(π,h))=1=u∞(π,h).
Just one more piece of setup. For each u≥m, since it’s lower-semicontinuous, we can apply our Proposition 3 from the domain theory post to construct an ascending sequence of bounded continuous functions fm,n where limn→∞fm,n=u≥m pointwise.
NOW we can begin showing lower-semicontinuity of Θ.
liminfn→∞Θ(πn)(U)=liminfn→∞pr(A×O)ω∗(u1{πn}(D))(U)
by how Θ was defined. And then we unpack the projection and update to get
=liminfn→∞D(λπ,h.1π=πnU(h)+1π≠πn)
First, we observe that the function inside D is greater than u≥n, because u≥n copies U(h) when π=πn or πn+1 or πn+2… and is maximal otherwise, while the interior function just copies U(h) when π=πn and is maximal otherwise. So, by monotonicity, we can go
≥liminfn→∞D(u≥n)
And then, we can use monotonicity again to get
≥liminfn→∞D(supm≤nfm,n)
Wait, what’s this? Well, if m≤n, then fm,n≤u≥m≤u≥n, because fm,n was constructed to be an ascending sequence of functions (in n) limiting to u≥m. And u≥m is an ascending sequence of functions (in m), so it’s below u≥n. Thus, the function supm≤nfm,n lies below u≥n, we just applied monotonicity.
Our next claim is that supm≤nfm,n is an ascending sequence of continuous functions which limits pointwise to u∞. Here’s how to show it. The functions are continuous because they’re all the supremum of finitely many continuous functions fm,n. They ascend because
supm≤n+1fm,n+1≥supm≤nfm,n+1≥supm≤nfm,n
The first inequality was from considering one less function in the sup. the second inequality is because, for all m, fm,n is an ascending sequence of functions in n.
For showing that the sequence supm≤nfm,n limits (in n) pointwise to u∞, we fix an arbitrary m∗ and do
limn→∞maxm≤nfm,n(π,h)≥limn→∞fm∗,n(π,h)=u≥m∗(π,h)
The inequality is because eventually n goes past the fixed number m∗, and then we’re selecting a single function out of the supremum of functions. The equality is because fm∗,n limits pointwise in n to u≥m∗. Now, with this, we can go:
u∞(π,h)=limn→∞u≥n(π,h)=limn→∞maxm≤nu≥m(π,h)
≥limn→∞maxm≤nfm,n(π,h)≥supm∗u≥m∗(π,h)=u∞(π,h)
Let’s break that down. The first equality was because we showed that u∞ is the pointwise limit of the u≥n. The second equality is because u≥m makes an ascending sequence of functions, so the supremum of the functions would just be u≥n itself. The first inequality is because fm,n≤u≥m, it’s one of the approximating continuous functions. The second inequality was the thing we showed earlier, about how the limit exceeds u≥m∗(π,h) regardless of m∗. Finally, we use again that u∞ is the pointwise limit of the u≥m∗.
We just showed a quantity is greater than or equal to itself, so all these inequalities must be equalities, and we have
u∞(π,h)=limn→∞maxm≤nfm,n(π,h)
So, the ascending sequence of continuous functions supm≤nfm,n limits pointwise to u∞. Now that we have that, let’s recap what we’ve got so far.We have
liminfn→∞Θ(πn)(U)≥liminfn→∞D(supm≤nfm,n)
Since supm≤nfm,n is an ascending sequence of continuous functions in n, then by monotonicity of D, that sequence must be increasing, so we have
=limn→∞D(supm≤nfm,n)
and then, we can apply the monotone convergence theorem for infradistributions, that a sequence of continuous functions ascending pointwise to a lower-semicontinuous function has the limit of the infradistribution expectations equaling the expectation of the lower-semicontinuous supremum, so we have
=D(u∞)=D(λπ,h.1π=π∞U(h)+1π≠π∞)
and then we pack this back up as an update and projection
=pr(A×O)ω∗(u1{π∞}(D))(U)
and back up to get
=Θ(π∞)(U)
and lower-semicontinuity of Θ has been shown. So translating over an infradistribution makes a belief function.
T2.1.2 1-static to 3-static.
D:=⊤Π⋉Θ
Now, we’ll show the five infradistribution conditions on D, along with the sixth condition on D that it equals ⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D))).
First, to show it’s even well-defined, we have
D(f)=(⊤Π⋉Θ)(f)=⊤Π(λπ.Θ(π)(λh.f(π,h)))
That inner function is lower-semicontinuous in π, because, starting with
liminfn→∞Θ(πn)(λh.f(πn,h))
then we have that, because f is continuous and Π×(A×O)ω is compact, f is uniformly continuous. So, as n increases, the function λh.f(πn,h) uniformly limits to λh.f(π∞,h). Since all the Θ(πn) have a uniform upper bound on the Lipschitz constant (from Θ being a belief function) they converge to agreeing that λh.f(πn,h) has similar value as λh.f(π∞,h), so we get
=liminfn→∞Θ(πn)(λh.f(π∞,h))
and then, by lower-semicontinuity for Θ,
≥Θ(π∞)(λh.f(π∞,h))
So, since the inner function is lower-semicontinuous, it can be evaluated.
T2.1.2.1 Now for the condition that
D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
We start with D(f) and unpack the definition of D, to get
=(⊤Π⋉Θ)(f)
And unpack the definition of semidirect product and what ⊤Π means, to get
=infπΘ(π)(λh.f(π,h))
=infπinfπ′Θ(π′)(λh.1π′=πf(π,h)+1π′≠π)
This step happens because, if π′≠π, then Θ(π′) will be assessing the value of the constant-1 function (or constant-infinity function), and will return a maximal value (by condition 5 on belief functions, that they all agree on the value of 1 (or ∞)). So the only way to minimize that line via a choice of π′ is to make it equal to π, as that means that we’re evaluating Θ(π)(λh.f(π,h)) instead, which may be less than the maximal value.
Now we pack up what ⊤Π means, and the semidirect product, to get
=infπ(⊤Π⋉Θ)(λπ′,h.1π′=πf(π,h)+1π′≠π)
Pack up the definition of D to get
=infπD(λπ′,h.1π′=πf(π,h)+1π′≠π)
Pack up the definition of “1-update on π” to get
=infπu1{π}(D)(λπ′,h.f(π,h))
Pack up how projection mappings work, to get:
=infπpr(A×O)ω∗(u1{π}(D))(λh.f(π,h))
Pack up what ⊤Π means and the semidirect product to get
=(⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D))))(f)
So we have
D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
as desired, since f was arbitrary.
T2.1.2.2 Now to verify the infradistribution conditions on D. First, monotonicity. Let f′≥f
D(f′)=(⊤Π⋉Θ)(f′)
And unpack the definition of semidirect product and what ⊤Π means, to get
=infπΘ(π)(λh.f′(π,h))
Then, by monotonicity for all the Θ(π), we have
≥infπΘ(π)(λh.f(π,h))
which packs back up in reverse as D(f).
T2.1.2.3 Now for concavity. Start with D(pf+(1−p)f′) and unpack in the usual way to get
=infπΘ(π)(λh.pf(π,h)+(1−p)f′(π,h))
Then, by concavity for the Θ(π), as they’re inframeasures, we have
≥infπ(pΘ(π)(λh.f(π,h))+(1−p)Θ(π)(λh.f′(π,h)))
and distribute the inf
≥infπ(pΘ(π)(λh.f(π,h)))+infπ((1−p)Θ(π)(λh.f′(π,h)))
Pulling the constants out, we have
=pinfπ(Θ(π)(λh.f(π,h)))+(1−p)infπ(Θ(π)(λh.f′(π,h)))
and packing back up in the usual way, we have
=pD(f)+(1−p)D(f′)
Concavity is shown.
T2.1.2.4 For Lipschitzness, we start with |D(f)−D(f′)| and then partially unpack in the usual way to get
=|⊤Π(λπ.Θ(π)(λh.f(π,h)))−⊤Π(λπ.Θ(π)(λh.f′(π,h)))|
And, since ⊤Π is a sharp infradistribution, it has a Lipschitz constant of 1, so we have
≤supπ|Θ(π)(λh.f(π,h))−Θ(π)(λh.f′(π,h))|
Since all theΘ(π) have a uniform bound on their Lipschitz constants (one of the belief function conditions), we then have
≤supπλ⊙suph|f(π,h)−f′(π,h)|
which rearranges as
=λ⊙supπ,h|f(π,h)−f′(π,h)|=λ⊙d(f,f′)
and we’re done, we got a Lipschitz constant.
T2.1.2.5 The CAS property for D is trival because PD is a closed subset of Π×(A×O)ω which is a compact space, and so you can just take all of PD as a compact support.
T2.1.2.6 That just leaves normalization. We have
D(1)=infπΘ(π)(1)=1
D(0)=infπΘ(π)(0)=0
by unpacking D and applying normalization for a belief function.
So now, D is an infradistribution.
T2.1.3 For 3-static to 3-dynamic, we have
→T(π,aoh):=δa,o,πao,h (which fulfills one of the 3-dynamic conditions), and ψT:=D
Since D is an infradistribution, ψT is too, and for the condition on ψT, we have:
ψT=D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψT)))
and we’re done, since D had the relevant condition.
T2.1.4 For 3-dynamic to 3-static, we have
D:=prΠ×(A×O)ω∗(ψT⋉→T:∞)
First, we must show that the niceness conditions are fulfilled for the infinite semidirect product to be defined at all.→T maps π,aoh to a,o,(πao,h). This is clearly a continuous mapping, so in particular, it’s lower-semicontinuous. Dirac-delta distributions are 1-Lipschitz when interpreted as inframeasures (all probability distributions have that property, actually). The compact-shared-CAS condition is redundant because A×O×PD is already a compact space. And all probability distributions (dirac-delta distributions are probability distributions) map constants to the same constant, so we get the increase-in-constants property and the 1 maps to 1 property. So we can indeed take the infinite semidirect product of the →T.
Now, we’ll show that unrolling like this just produces ψT exactly. We’ll use s0 for the initial state (pair of policy and destiny, also written as (π,h)0), and aos1:∞ for the successive unrolled history of states, actions, and observations. First, by unpacking D’s definition,
D(f)=prΠ×(A×O)ω∗(ψT⋉→T:∞)(f)
Then we unpack the projection and semidirect product
=ψT(λ(π,h)0.→T:∞((π,h)0)(λaos1:∞.f(π,ao1:∞)))
And pack things up into a projection
=ψT(λ(π,h)0.pr(A×O)ω∗(→T:∞((π,h)0))(λao1:∞.f(π,ao1:∞)))
And then we can observe that unrolling the initial state forever via “advance the destiny 1 step, popping actions and observations off, and advance the policy”, when you project down to just the actions and observations (not the intermediate states), yields just the point distribution on that destiny, so we get:
=ψT(λ(π,h)0.δh(λh′.f(π,h′)))
And then substitute the value of that dirac-delta in to get
=ψT(λπ,h.f(π,h))=ψT(f)
So, our D=ψT. Now we can go
D=ψT=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψT)))=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
and we’re done, showing D had the relevant condition from ψT having it. Since ψT is an infradistribution, D is too.
T2.1.5 For 3-dynamic to 1-dynamic, we have
→F(π,h,a):=prO×PD∗(u1{a}(→T(π,h)))
For the action π() (which is compatible with the start of the history), we have
→F(π,π()oh,π())(f)
Applying our definition, we have
=prO×PD∗(u1{π()}(→T(π,π()oh)))(f)
Unpacking the projection, we have (where s is a state)
=u1{π()}(→T(π,π()oh))(λa,o,s.f(o,s))
Unpacking the update, we have
=→T(π,π()oh)(λa,o,s.1a=π()f(o,s)+1a≠π())
And then we consider that →T(π,π()oh) is just the dirac-delta distribution on (π(),o,(ππ()o,h)), so we have
=δπ(),o,(ππ()o,h)(λa,o,s.1a=π()f(o,s)+1a≠π())
Substitute the values in, and we have
=f(o,(ππ()o,h))
and then pack up the dirac-delta and we get
=δo,(ππ()o,h)(f)
So, →F(π,π()oh,π()) is just the dirac-delta distribution on o and the pair of h and the policy advanced one step, as it should be. For a′ that aren’t π() as it should be, we have
→F(π,π()oh,a′)(f)
Applying our definition, we have
=prO×PD∗(u1{a′}(→T(π,π()oh)))(f)
Unpacking the projection, we have (where s is a state)
=u1{a′}(→T(π,π()oh))(λa,o,s.f(o,s))
Unpacking the update, we have
=→T(π,π()oh)(λa,o,s.1a=a′f(o,s)+1a≠a′)
And then we consider that →T(π,π()oh) is just the dirac-delta distribution on (π(),o,(ππ()o,h)), so we have
=δπ(),o,(ππ()o,h)(λa,o,s.1a=a′f(o,s)+1a≠a′)
Substitute the values in, and remembering that a′ and π() are assumed to be different, and we have (regardless of f)
=1
Or infinity for the other type signature. Therefore,
→F(π,π()oh,a′)=⊥
So our transition kernel works out. For the other conditions on ψF, just observe that ψF:=ψT, and use the extreme similarity of their defining conditions.
T2.1.6 For 1-dynamic to 3-dynamic, we have ψT:=ψF so we can use our defining property of ψF again, clearly show that ψT has the relevant defining properties, so that just leaves cleaning up the defining property for the →T infrakernel. We remember that
→T(π,h):=⊤A⋉(λa.→F(π,h,a))
Therefore, we have:
→T(π,π()oh)(f)=(⊤A⋉(λa.→F(π,π()oh,a)))(f)
This occurs because the history must be compatible with the policy. Then, we can unpack as:
=⊤A(λa.→F(π,π()oh,a)(λo,s.f(a,o,s)))
We unpack our ⊤A, getting
=infa(→F(π,π()oh,a)(λo,s.f(a,o,s)))
Then, we observe that for any a≠π(), →F(π,π()oh,a)=⊥. So, a must be π(), and we have
=→F(π,π()oh,π())(λo,s.f(π(),o,s))
Now, →F(π,π()oh,π()) is the dirac-delta distribution on (o,(h,ππ()o)), so making that substitution in, we have
=δo,(h,ππ()o)(λo,s.f(π(),o,s))
Ie
=f(π(),o,(h,ππ()o))=δπ(),o,(h,ππ()o)(f)
And this holds for all f, so we get our desired result that
→T(π,π()oh)=δπ(),o,(h,ππ()o)
T2.1.7 For 1-static to 1-dynamic, we just have
ψF:=⊤Π⋉Θ
→F(π,aoh,a):=δo,(πao,h)
→F(π,aoh,a′):=⊥
So this automatically makes the infrakernel have the desired properties. For showing the relevant properties of ψF, we just copy the proof at the start of getting the properties for D from Θ being a belief function, since D and ψF are both defined in the same way.
T2.1.8 For 1-dynamic to 1-static, we have
Θ(π′):=pr(A×O)ω∗((u1{π′}(ψF))⋉→π′F:∞)
We’ll work on getting that latter quantity into a better form, but first we have to verify that
→πF:∞ is even well-defined at all, w need the →πF:n to fulfill the niceness conditions. They’re defined as
→πF:n(s0,aos1:n):=δπ(ao1:n)⋉(λa.→F(sn,a))
For lower-semicontinuity, →F is lower-semicontinuous, because it returns ⊥ on every action but one (this is a clopen set), and for the action that pairs up with the initial state, the policy-tagged history just has the history and policy advanced, which is continuous.→πF:n behaves similarly, continuous variation in input leads to continuous variation in output, because eventually the behavior of the policy settles down to a fixed action as the history of actions and observations stabilizes amongst one of the finitely many options.
For making inframeasures, →πF:n always returns a dirac-delta distribution or ⊥, so we’re good there. Similarly, for 1-Lipschitzness, both ⊥ and dirac-delta distributions are 1-Lipschitz. For compact-shared CAS, as the target space is A×O×PD, it’s compact and you don’t need to worry about that. Finally, both dirac-delta distributions and ⊥ map constants to either the same constant or higher, and in the [0,1] type signature, both ⊥ and dirac-delta distributions map 1 to 1. So all niceness conditions are fulfilled.
Let’s work on unpacking the value
pr(A×O)ω∗((u1{π′}(ψF))⋉→πF:∞)(U)
unpacking the projection and semidirect product, we get
=u1{π′}(ψF)(λπ,h.→π′F:∞(π,h)(λaos1:∞.U(ao1:∞)))
And unpacking the update, we have
=ψF(λπ,h.1π=π′→π′F:∞(π,h)(λaos1:∞.U(ao1:∞))+1π≠π′)
We can make a substitution which changes nothing (in the scope of the indicator function where π=π′), that
=ψF(λπ,h.1π=π′→π′F:∞(π′,h)(λaos1:∞.U(ao1:∞))+1π≠π′)
and then we can write this as a projection in the inside
=ψF(λπ,h.1π=π′pr(A×O)ω∗(→π′F:∞(π′,h))(λao1:∞.U(ao1:∞))+1π≠π′)
And then we can realize that when we unroll →π′F:∞(π′,h), it just always deterministically unrolls the history, along with the policy, since for π′,h to be an input state, π′ must be compatible with h, so picking actions from π′ means there’s never any action mismatches. Projecting this down to the actions and observations just makes the history h exactly. So we have
=ψF(λπ,h.1π=π′δh(λao1:∞.U(ao1:∞))+1π≠π′)
Substituting the dirac-delta value in, we have
=ψF(λπ,h.1π=π′U(h)+1π≠π′)
And we pack the update and projection back up to get
=pr(A×O)ω∗(u1{π′}(ψF))(U)
Since U was arbitrary, we have
pr(A×O)ω∗((u1{π′}(ψF))⋉→πF:∞)=pr(A×O)ω∗(u1{π′}(ψF))
And so, an alternate way to define Θ(π′) is as
=pr(A×O)ω∗(u1{π′}(ψF))
We can run through the same proof that Θ is indeed a belief function, back from the 3-static to 1-static case, because ψF fulfills all the same properties that D did.
Alright, now we have, for all four corners, a condition that’s basically “came from an acausal belief function” that is preserved under morphisms. Now we need to show that all 8 back-and-forth directions are identity.
T2.2.1 For 1-static to 3-static back to 1-static, we want to show
Θ(π′)=pr(A×O)ω∗(u1{π′}(⊤Π⋉Θ))
This is (we assume the hard version first)
pr(A×O)ω∗(u1{π′}(⊤Π⋉Θ))(U)
Then unpack the projection and update
=(⊤Π⋉Θ)(λπ,h.1π=π′U(h)+1π≠π′)
Then unpack the semidirect product and ⊤Π to get
=infπΘ(π)(λh.1π=π′U(h)+1π≠π′)
Then realize that the minimizer is picking π′ exactly, otherwise you’d just get maximal value and all the Θ(π) agree on what a maximal input maps to.
=Θ(π′)(λh.U(h))=Θ(π′)(U)
And we’re done.
T.2.2.2 For 3-static to 1-static to 3-static being identity, we want to show
D=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(D)))
This is just exactly the condition we’re assuming on D, so we trivially fulfill this (the nontrivial part was shoved into showing that going belief-function to infradistribution over policy-tagged states produced this property).
T2.2.3 For 3-static to 3-dynamic back to 3-static, we need to show that
D=prΠ×(A×O)ω∗(D⋉→T:∞)
For this, we abbreviate states beyond the first one as s, so (h,π)0 is the initial state, and aos1:∞ is an element of (A×O×PD)ω. Taking the complicated part, it’s
prΠ×(A×O)ω∗(D⋉→T:∞)(f)
=(D⋉→T:∞)(λ(π,h)0,aos1:∞.f(π,ao1:∞))
And then we unpack the semidirect product a bit, to get
=D(λ(π,h)0.→T:∞((π,h)0)(λaos1:∞.f(π,ao1:∞)))
And then we can write this as a projection, to get
=D(λ(π,h)0.pr(A×O)ω∗(→T:∞((π,h)0))(λao1:∞.f(π,ao1:∞)))
Now, we observe that because →T, when repeatedly unrolled, is just emitting the actions and observations from the starting destiny/history, this projection is just the point distribution on the action-observation sequence that is h.
=D(λ(π,h)0.δh(λao1:∞.f(π,ao1:∞)))
Then we evaluate the expectation, yielding
=D(λπ,h.f(π,h))=D(f)
And we’re done.
T2.2.4 For going 3-dynamic to 3-static to 3-dynamic, we need that
ψT=prΠ×(A×O)ω∗(ψT⋉→T:∞)
To show this, just reuse the exact same proof from above, just with ψT instead of D. Also, →F(π,h) works out.
At this point, we’ve established isomorphism for two sides of the square. We’ve just got two more sides left to address, then showing that going around the square results in identity instead of a nontrivial automorphism.
T2.2.5,6 For translating back and forth between 3-dynamic and 1-dynamic, we observe that both translations keep the initial infradistribution over policy-tagged destinies the same, and our 3-dynamic to 1-dynamic, and 1-dynamic to 3-dynamic proofs verified that the →F and →T behave as they should when they go back and forth, so we don’t need to worry about this.
T2.2.7 Next is 1-dynamic to 1-static to 1-dynamic. The infrakernel is guaranteed to have the right form, we just need to show that ψF is unchanged. So, we must show
ψF=⊤Π⋉(λπ.pr(A×O)ω∗((u1{π}(ψF))⋉→πF:∞))
But, since our condition on 1-dynamic (that we showed is a property of applying the morphism to go from any belief function to first-person dynamic) was
ψF=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψF)))
The only thing we need to show is that, regardless of π,
pr(A×O)ω∗((u1{π}(ψF))⋉→πF:∞)=pr(A×O)ω∗(u1{π}(ψF))
For if we could show that, then we could go:
⊤Π⋉(λπ.pr(A×O)ω∗((u1{π}(ψF))⋉→πF:∞))=⊤Π⋉(λπ.pr(A×O)ω∗(u1{π}(ψF)))=ψF
and we’d be done. So, we’ll show that
pr(A×O)ω∗((u1{π}(ψF))⋉→πF:∞)=pr(A×O)ω∗(u1{π}(ψF))
instead, as that’s all we need. Again, we’ll massage the more complicated side until we get it into the simple form for the other side.
pr(A×O)ω∗((u1{π}(ψF))⋉→πF:∞)(U)
We undo the projection, and abbreviate states as s0, and action-observation-state sequences as aos1:∞, to yield
=((u1{π}(ψF))⋉→πF:∞)(λs0,aos1:∞U(ao1:∞))
Then we reexpress the semidirect product, to yield
=u1{π}(ψF)(λs0.→πF:∞(s0)(λaos1:∞.U(ao1:∞)))
We unpack the initial state a fuzz to yield
=u1{π}(ψF)(λπ′,h.→πF:∞(π′,h)(λaos1:∞.U(ao1:∞)))
We apply the update to yield
=ψF(λπ′,h.1π′=π⋅(→πF:∞(π′,h)(λaos1:∞.U(ao1:∞)))+1π′≠π)
Then, we realize that if π′=π, we can swap π′ out for π in the relevant function associated with it.
=ψF(λπ′,h.1π′=π⋅(→π′F:∞(π′,h)(λaos1:∞.U(ao1:∞)))+1π′≠π)
Now, we can partially pack this up as a projection, to get
=ψF(λπ′,h.1π′=π⋅(pr(A×O)ω∗(→π′F:∞(π,h))(U))+1π′≠π)
At this point, we can realize something interesting.pr(A×O)ω∗(→π′F:∞(π′,h)) is “start with an initial policy of π′ and h compatible with π′, then repeatedly feed in actions as if they were created by π′ forever, and forget about the states, leaving just the action-observation sequence”. Now, since the action (being produced by π′) always lines up with what the π′ encoded in the state would do, this process never hits the ⊥ infradistribution, it keeps going on and on and advancing the history with no issue. In particular, the history unrolls to completion, and the resulting action-observation sequence you get would just be the original destiny packed up in the state. So, this infradistribution is just δh. Making this substitution in, we get
=ψF(λπ′,h.1π′=π⋅δh(U)+1π′≠π)
=ψF(λπ′,h.1π′=πU(h)+1π′≠π)
Now we pack up the update again, to get
=u1{π}(ψF)(λπ′,h.U(h))
And then realize that this is a projection, to get
=pr(A×O)ω∗(u1{π}(ψF))(U)
And we’re done, we showed our desired result to establish that going first-person dynamic to static back to dynamic is identity.
T2.2.8 Time for the last leg of the square, that going first-person static to first-person dynamic and first-person static is identity. We want to show that, regardless of π′,
Θ(π′)=pr(A×O)ω∗((u1{π′}(⊤Π⋉Θ))⋉→π′F:∞)
Again, like usual, we’ll take the complicated thing and repeatedly rewrite it to reach the simpler thing, for an arbitrary function.
pr(A×O)ω∗((u1{π′}(⊤Π⋉Θ))⋉→π′F:∞)(U)
Using s for states, we can rewrite the projection, to get
=((u1{π′}(⊤Π⋉Θ))⋉→π′F:∞)(λs0,aos1:∞.U(ao1:∞))
Then we rewrite the semidirect product, and unpack the initial state into a policy π and destiny h compatible with π, to get
=u1{π′}(⊤Π⋉Θ)(λπ,h.→π′F:∞(π,h)(λaos1:∞.U(ao1:∞)))
We can then rewrite the interior as a projection to get
=u1{π′}(⊤Π⋉Θ)(λπ,h.pr(A×O)ω∗(→π′F:∞(π,h))(U))
Now, it’s time to rewrite the update. It rewrites as:
=(⊤Π⋉Θ)(λπ,h.1π=π′pr(A×O)ω∗(→π′F:∞(π,h))(U)+1π≠π′)
Now, since π=π′ inside the scope of the indicator function, we can rewrite as
=(⊤Π⋉Θ)(λπ,h.1π=π′pr(A×O)ω∗(→π′F:∞(π′,h))(U)+1π≠π′)
And use our usual argument from before that pr(A×O)ω∗(→π′F:∞(π′,h)) is just the dirac-delta distribution on h, to get
=(⊤Π⋉Θ)(λπ,h.1π=π′U(h)+1π≠π′)
Now, we can unpack the semidirect product as well as what ⊤Π means, to get:
=infπΘ(π)(λh.1π=π′U(h)+1π≠π′)
Now, if π≠π′, then that inner function turns into 1 (or infinity) which is assigned maximum value by all Θ(π), so it’s minimized when π=π′, so we get:
=Θ(π′)(λh.U(h))=Θ(π′)(U)
And we’re done with the last “doing this path is identity” result. All that remains in our proof is just showing that taking both paths from one corner to another corner makes the same result, to show that going around in a loop is identity, ruling out nontrivial automorphisms.
T2.3 The easiest pair of corners for this is going from first-person static to third-person dynamic in both ways. Obviously, the transition kernel would be the same no matter which path you took, which just leaves verifying that the starting infradistribution is the same. Down one path, we have
D:=⊤Π⋉Θ,ψT:=D
Down the other path, we have
ψF:=⊤Π⋉Θ,ψT:=ψF
Obviously, both these paths produce the same result when you try to define ψT in both ways from Θ. And we’re done!
Theorem 3: Pseudocausal Commutative Square: The following diagram commutes for any choice of pseudocausal belief function Θ. Any choice of infradistribution D where D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D))) also makes this diagram commute.
Again, the work we need to put in is getting conditions in the four corners that are “came from a pseudocausal belief function”, then, for phase 1, verifying that all 8 morphisms preserve the relevant property in the target if the source had it, then for phase 2, verifying that all 8 ways of going from a corner to 1 away and back result in identity to get 4 isomorphisms, then finally showing that starting in one corner and going to the other corner by two different paths make the same result, to rule out nontrivial automorphisms.
The property for third-person static is D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
The property for third-person dynamic is ψT=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψT))) and →T(aoh)=δa,o,h
The property for first-person static is pseudocausality, ∀π′,π,U:u1∼π′(Θ(π))(U)≥Θ(π′)
The property for first-person dynamic is ψF=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψF))) and →F(aoh,a′)=δo,h when a′=a, and ⊥ when a′≠a.
Time to show preservation of these properties by all the morphisms, as well as that the translations make a belief function/infradistribution.
Our first step is showing that all 8 morphisms induce the relevant property in the target if the source has the property.
T3.1.1 3-static to 1-static. Our definition is
Θ(π):=u1∼π(D)
and we must check the five conditions of a belief function, as well as pseudocausality.
T3.1.1.1 Checking the five conditions of a belief function proceeds in almost exactly the same way as checking the five conditions of a belief function in the “Acausal Commutative Square” theorem. We leave it as an exercise to the reader, and will just elaborate on the only nontrivial new argument needed.
The only nontrivial modification is in our proof of lower-semicontinuity. In order for it to work, we need that for any m, the subset of (A×O)ω where h∼πm or h∼πm+1 or πm+2… or π∞ is closed.
We can rephrase the set {h|∃n≥m:h∼πn} as the projection down to the (A×O)ω coordinate of the set
({πn}n≥m×(A×O)ω)∩{π,h|h∼π}
(A×O)ω is compact, and {πn}n≥m is compact as well, since it limits to a point and the limit point is included. So that product is a compact set. The set {π,h|h∼π} is closed. So ({πn}n≥m×(A×O)ω)∩{π,h|h∼π} is the intersection of a compact and a closed set, and so is compact. And projections of compact sets are compact, and compactness implies closure. So, the set of histories {h|∃n≥m:h∼πn} is closed regardless of m, even if m=∞.
The rest of the proof works out with no issue, so translating over an infradistribution makes a belief function.
T3.1.1.2 We still have to check pseudocausality however. Our translation is
Θ(π):=u1∼π(D)
and we want to show
∀π′,π,U:u1∼π′(Θ(π))(U)≥Θ(π′)(U)
Reexpressing this desired statement in terms of D, we have
∀π′,π,U:u1∼π′(u1∼π(D))(U)≥u1∼π′(D)(U)
Let π′ and π be arbitrary. Then the left side is
u1∼π′(u1∼π(D))(U)
We unpack the update
=u1∼π(D)(λh.1h∼π′U(h)+1h≁π′)
And unpack the update again, to get
=D(λh.1h∼π⋅(1h∼π′f(h)+1h≁π′)+1h≁π)
Looking at this, the function is U when h is compatible with both π′ and π, and 1 otherwise. This is a greater function than U when h is compatible with just π′, and 1 otherwise, so by monotonicity for infradistributions, we can get:
≥D(λh.1h∼π′U(h)+1h≁π′)
and pack up the update to get
=u1∼π′(D)(U)
And we’re done, we get pseudocausality.
T3.1.2 1-static to 3-static. Our translation is
D:=pr(A×O)ω∗(⊤Π⋉Θ)
We’ll show the infradistribution conditions on D. First, to show it’s even well-defined, we have
D(U)=pr(A×O)ω∗(⊤Π⋉Θ)(U)=⊤Π(λπ.Θ(π)(λh.U(h)))
That inner function is lower-semicontinuous in π, because Θ was assumed to have that property in its first argument. Since the inner function is lower-semicontinuous, it can be evaluated.
Now for the condition that
D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
We start with D(U) and unpack the definition of D, to get
=pr(A×O)ω∗(⊤Π⋉Θ)(U)
And then undo the projection and semidirect product and what ⊤Π means, to get
=infπΘ(π)(λh.U(h))
And then, by pseudocausality for Θ (for all π′, u1∼π(Θ(π′))(U)≥Θ(π)(U) and also u1∼π(Θ(π))(U)=Θ(π)(U) because Θ(π) is supported entirely on histories compatible with π), we can swap Θ(π)(λh.U(h)) out for
=infπinfπ′u1∼π(Θ(π′))(λh.U(h))
And then unpack the update, to get
=infπinfπ′Θ(π′)(λh.1h∼πU(h)+1h≁π)
Now we pack up what ⊤Π means, and the semidirect product, to get
=infπ(⊤Π⋉Θ)(λπ′,h.1h∼πU(h)+1h≁π)
Write this as a projection
=infπpr(A×O)ω∗(⊤Π⋉Θ)(λh.1h∼πU(h)+1h≁π)
Pack up the definition of D to get
=infπD(λh.1h∼πU(h)+1h≁π)
Pack up the definition of “1-update on π” to get
=infπu1∼π(D)(λh.U(h))
Pack up what ⊤Π means and the semidirect product and the projection to get
=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))(U)
So we have
D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
as desired, since U was arbitrary.
Now to verify the infradistribution conditions on D. This proof is pretty much identical to the proof in the “Acausal Commutative Square” theorem, interested readers can fill it in.
T3.1.3 For 3-static to 3-dynamic, we have →T(aoh):=δa,o,h (which fulfills one of the 3-dynamic conditions), and ψT:=D
Since D is an infradistribution, ψT is too, and for the condition on ψT, we have:
ψT=D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψT)))
and we’re done, since D had the relevant condition.
T3.1.4 For 3-dynamic to 3-static, we have
D:=pr(A×O)ω∗(ψT⋉→T:∞)
First, we must show that the niceness conditions are fulfilled for the infinite semidirect product to be defined.→T maps aoh to a,o,h. This is clearly a continuous mapping, so in particular, it’s lower-semicontinuous. Dirac-delta distributions are 1-Lipschitz when interpreted as inframeasures (all probability distributions have that property, actually). The compact-shared-CAS condition is redundant because A×O×(A×O)ω is already a compact space. And all probability distributions (dirac-delta distributions are probability distributions) map constants to the same constant, so we get the increase-in-constants property and the 1 maps to 1 property. So we can indeed take the infinite semidirect product of the →T.
Now, we’ll show that unrolling like this just produces ψT exactly. We’ll use h0 for the initial destiny state, and aoh1:∞ for the successive unrolled history of states, actions, and observations. First, by unpacking D’s definition,
D(U)=pr(A×O)ω∗(ψT⋉→T:∞)(U)
Then we unpack the projection and semidirect product
=ψT(λh0.→T:∞(h0)(λaoh1:∞.U(ao1:∞)))
And pack things up into a projection
=ψT(λh0.pr(A×O)ω∗(→T:∞(h0))(λao1:∞.U(ao1:∞)))
And then we can observe that unrolling the initial destiny forever via “advance the destiny 1 step, popping actions and observations off”, when you project down to just the actions and observations, yields just the point distribution on that destiny, so we get:
=ψT(λh.δh(λao1:∞.U(ao1:∞)))
And then substitute the value of that dirac-delta in to get
=ψT(λh.U(h))=ψT(U)
So, our D=ψT. Now we can go
D=ψT=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψT)))=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
and we’re done, showing D had the relevant condition from ψT having it. Since ψT is an infradistribution, D is too.
T3.1.5 For 3-dynamic to 1-dynamic, we have
→F(h,a′):=prO×(A×O)ω∗(u1{a′}(→T(h)))
Let’s write h as aoh (unpacking the destiny a bit). Now, for the action a (which is compatible with the start of the history), we have
→F(aoh,a)(f)
Applying our definition, we have
=prO×(A×O)ω∗(u1{a}(→T(aoh)))(f)
Unpacking the projection, we have
=u1{a}(→T(aoh))(λa′,o′,h′.f(o′,h′))
Unpacking the update, we have
=→T(aoh)(λa′,o′,h′.1a′=af(o′,h′)+1a′≠a)
And then we consider that →T(aoh) is just the dirac-delta distribution on (a,o,h), so we have
=δa,o,h(λa′,o′,h′.1a′=af(o′,h′)+1a′≠a)
Substitute the values in, and we have
=f(o,h)=δo,h(f)
So, →F(a,aoh) is just the dirac-delta distribution on o and h, as it should be.
For a′ that aren’t a as it should be, we have
→F(aoh,a′)(f)
Applying our definition, we have
=prO×(A×O)ω∗(u1{a′}(→T(aoh)))(f)
Unpacking the projection, we have
=u1{a′}(→T(aoh))(λa′′,o′,h′.f(o′,h′))
Unpacking the update, we have
=→T(aoh)(λa′′,o′,h′.1a′′=a′f(o′,h′)+1a′′≠a′)
And then we consider that →T(aoh) is just the dirac-delta distribution on (a,o,h), so we have
=δa,o,h(λa′′,o′,h′.1a′′=a′f(o′,h′)+1a′′≠a′)
and because a′≠a, substituting the dirac-delta in produces 1 (or infinity), ie, ⊥(f). Therefore,
→F(aoh,a′)=⊥
So our transition kernel works out. For the other conditions, just observe that ψF:=ψT, and use the extreme similarity of their defining conditions.
T3.1.6 For 1-dynamic to 3-dynamic, we have ψT:=ψF so we can use our defining property to again, clearly show that the resulting 3-dynamic starting infradistribution has the relevant property, so that just leaves cleaning up the defining property for the →T infrakernel. We remember that
→T(h):=⊤A⋉(λa.→F(h,a))
Therefore, we have:
→T(aoh)(f)=(⊤A⋉(λa′.→F(aoh,a′)))(f)
Then, we can unpack the semidirect product as:
=⊤A(λa′.→F(aoh,a′)(λo′,h′.f(a′,o′,h′)))
We unpack our ⊤A, getting
=infa′(→F(aoh,a′)(λo′,h′.f(a′,o,s)))
Then, we observe that for any a′≠a, →F(aoh,a′)=⊥. So, a′ must be a, and we have
=→F(aoh,a)(λo′,h′.f(a,o′,h′))
Now, →F(aoh,a) is the dirac-delta distribution on (o,h), so making that substitution in, we have
=δo,h(λo′,h′.f(a,o′,h′))=f(a,o,h)=δa,o,h(f)
And this holds for all f, so we get our desired result that
→T(aoh)=δa,o,h
T3.1.7: For 1-static to 1-dynamic, we just have
ψF:=pr(A×O)ω∗(⊤Π⋉Θ)
→F(aoh,a):=δo,h
→F(aoh,a′):=⊥
So this automatically makes the infrakernel have the desired properties. For showing the relevant property of ψF, we just copy the proof at the start of getting the properties for D from Θ being a belief function, since D and ψF are both defined in the same way.
T3.1.8 For 1-dynamic to 1-static, we have
Θ(π′):=pr(A×O)ω∗(ψF⋉→π′F:∞)
We’ll work on getting that latter quantity into a better form, but first we have to verify that
→πF:∞ has the →πF:n fulfilling the niceness conditions. It’s defined as
→πF:n(h0,aoh1:n):=δπ(ao1:n)⋉(λa.→F(hn,a))
For lower-semicontinuity, →F is lower-semicontinuous, because it returns ⊥ on every action but one (this is a clopen set), and for the action that pairs up with the initial destiny, it just advances the destiny, which is continuous.→πF:n behaves similarly, continuous variation in input leads to continuous variation in output, because eventually the behavior of the policy settles down to a fixed action as the history of actions and observations stabilizes amongst one of the finitely many options.
For making inframeasures, →πF:n always returns a dirac-delta distribution or ⊥, so we’re good there. Similarly, for 1-Lipschitzness, both ⊥ and dirac-delta distributions are 1-Lipschitz. For compact-shared CAS, as the target space is A×O×(A×O)ω, it’s compact and you don’t need to worry about that. Finally, both dirac-delta distributions and ⊥ map constants to either the same constant or higher, and in the [0,1] type sinature, both ⊥ and dirac-delta distributions map 1 to 1. So all niceness conditions are fulfilled.
Let’s work on unpacking the value
pr(A×O)ω∗(ψF⋉→πF:∞)(U)
unpacking the projection and semidirect product, we get
=ψF(λh0.→πF:∞(h)(λaoh1:∞.U(ao1:∞)))
And then we can write this as a projection in the inside
=ψF(λh0.pr(A×O)ω∗(→πF:∞(h))(λao1:∞.U(ao1:∞)))
One of two things can happen. In the first case, h is compatible with π, so playing π against it never hits ⊥, and it unrolls to completion, and projecting down just yields the dirac-delta distribution on h itself. In the second case, h is incompatible with π, so eventually we hit ⊥, which maps everything to maximal value (either 1 or infinity). Thus, we can write this as an indicator function
=ψF(λh.1h∼πδh(λao1:∞.U(ao1:∞))+1h≁π′)
Substituting the dirac-delta value in, we have
=ψF(λh.1h∼πU(h)+1h≁π)
And we pack the update back up to get
=u1∼π(ψF)(U)
Since U was arbitrary, we have
pr(A×O)ω∗(ψF⋉→πF:∞)=u1∼π′(ψF)
And so, an alternate way to define Θ(π) is as
=u1∼π(ψF)
We can then run through the same proof that Θ is indeed a belief function, back from the 3-static to 1-static case, because ψF fulfills all the same properties that D did.
Alright, now we have, for all four corners, a condition that’s basically “came from a pseudocausal belief function” that is preserved under morphisms. Now we need to show that all 8 back-and-forth directions are identity.
T3.2.1 For 1-static to 3-static back to 1-static, we want to show
Θ(π′)=u1∼π′(pr(A×O)ω∗(⊤Π⋉Θ))
This is (we assume the hard version first)
u1∼π′(pr(A×O)ω∗(⊤Π⋉Θ))(U)
Then unpack the update
=pr(A×O)ω∗(⊤Π⋉Θ)(λh.1h∼π′U(h)+1h≁π′)
And the projection
=(⊤Π⋉Θ)(λπ,h.1h∼π′U(h)+1h≁π′)
Then unpack the semidirect product and ⊤Π to get
=infπΘ(π)(λh.1h∼π′U(h)+1h≁π′)
then rewrite this as an update
=infπu1∼π′(Θ(π))(λh.U(h))
And then we can apply pseudocausality of \BF to get that the minimizer must pick π′ exactly because u1∼π′(Θ(π))(U)≥Θ(π′)(U)=u1∼π′(Θ(π′))(U) and so we get
=Θ(π′)(U)
and we’re done.
T3.2.2 For 3-static to 1-static to 3-static, we must show
D=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(D)))
Well, the property for a third-person static D is exactly that.
T3.2.3 For 3-static to 3-dynamic to 3-static, we want that
D=pr(A×O)ω∗(D⋉→T:∞)
Taking the complicated part, it’s
pr(A×O)ω∗(D⋉→T:∞)(U)
We unpack the projection
=(D⋉→T:∞)(λh0,aoh1:∞.U(ao:∞))
And then we unpack the semidirect product a bit, to get
=D(λh0.→T:∞(h0)(λaoh1:∞.U(ao1:∞)))
And then we can write this as a projection, to get
=D(λh0.pr(A×O)ω∗(→T:∞(h0))(λao1:∞.U(ao1:∞)))
Now, we observe that because →T, when repeatedly unrolled, is just emitting the actions and observations from the starting destiny, this projection is just the point distribution on the action-observation sequence that is h0.
=D(λh0.δh0(λao1:∞.U(ao1:∞)))
Then we evaluate the expectation, yielding
=D(λh.U(h))=D(U)
And we’re done.
T3.2.4 For going 3-dynamic to 3-static to 3-dynamic, we need that
ψT=pr(A×O)ω∗(ψT⋉→T:∞)
To show this, just reuse the exact same proof from above, just with ψT instead of D. Also, →F(h) works out as it should.
At this point, we’ve established isomorphism for two sides of the square. We’ve just got two more sides left to address, then showing that going around the square results in identity instead of a nontrivial automorphism.
T3.2.5,6 For translating back and forth between 3-dynamic and 1-dynamic, we observe that both translations keep the initial infradistribution over destinies the same, and our 3-dynamic to 1-dynamic, and 1-dynamic to 3-dynamic proofs verified that the →F and →T behave as they should when they go back and forth, so we don’t need to worry about this.
T3.2.7 Next is 1-dynamic to 1-static to 1-dynamic. The infrakernel is guaranteed to have the right form, we just need to show that ψF is unchanged. So, we must show
ψF=pr(A×O)ω∗(⊤Π⋉(λπ.pr(A×O)ω∗(ψF⋉→πF:∞)))
But, since our condition on 1-dynamic (that we showed is a property of applying the morphism to go from any belief function to first-person dynamic) was
ψF=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψF)))
The only thing we need to show is that, regardless of π,
u1∼π(ψF)=pr(A×O)ω∗(ψF⋉→πF:∞)
For if we could show that, then we could go:
pr(A×O)ω∗(⊤Π⋉(λπ.pr(A×O)ω∗(ψF⋉→πF:∞)))=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π(ψF)))=ψF
and we’d be done. However, from the 1-dynamic to 1-static proof of property preservation, we already showed that
pr(A×O)ω∗(ψF⋉→πF:∞)(U)=u1∼π(ψF)(U)
back then, so we’re just done.
T3.2.8 Time for the last leg of the square, that going first-person static to dynamic and back to static is identity. We want to show that, regardless of π′,
Θ(π′)=pr(A×O)ω∗((pr(A×O)ω∗(⊤Π⋉Θ))⋉→π′F:∞)
To do this, we use the result from the 1-dynamic to 1-static proof of property preservation, that
pr(A×O)ω∗(ψF⋉→πF:∞)=u1∼π(ψF)
Applying this fact with ψF as an abbreviation for pr(A×O)ω∗(⊤Π⋉Θ), we can rewrite our proof goal equivalently with this fact as
Θ(π′)=pr(A×O)ω∗(u1∼π′(pr(A×O)ω∗(⊤Π⋉Θ)))
But we showed this exact result when we showed that 1-static to 3-static to 1-static was identity, so we’re done.
And we’re done with the last “doing this path is identity” result. All that remains in our proof is just showing that taking both paths from one corner to another corner makes the same result, to show that going around in a loop is identity, ruling out nontrivial automorphisms.
T3.3 The easiest pair of corners for this is going from first-person static to third-person dynamic in both ways. Obviously, the transition kernel would be the same no matter which path you took, which just leaves verifying that the starting infradistribution is the same. Down one path, we have
D:=pr(A×O)ω∗(⊤Π⋉Θ),ψT:=D
Down the other path, we have
ψF:=pr(A×O)ω∗(⊤Π⋉Θ),ψT:=ψF
Obviously, both these paths produce the same result when you try to define ψT in both ways from Θ. And we’re done!