Proposition 2.16:Consider some Γ0, Γ1,
Φ and Θ∈□c(Γ0×Γ1×Φ).
Define π:elΓ0×Γ1→elΓ0
by π(y,z,α):=(y,{y′∈Γ0∣(y′,z)∈α}).
Then,(π×idΦ)∗BrΓ0×Γ1(Θ)⊆prelΓ0×ΦBrΓ0(Θ)⊆BrΓ0(prΓ0×ΦΘ)
We can prove the second subset inclusion directly as a corollary of Proposition 2.10, just let the t of Proposition 2.10 be the projection function Γ1×Φ→Φ, so that just leaves the first subset inclusion direction. If you’ve seen the proofs so far, you know we do a thing where we try to show subset inclusion with expectations of functions and inequalities instead. And that the proofs all proceed by transforming the expectations until we get a maximum over contribution expectation values, and that’s always where the hard part of proving the inequalities shows up. So, let’s just get that part over with, an interested reader can work it out with previous proofs as a guide. Unusually, we’ll be keeping track of identity functions here.
Plugging in some f, and doing our usual activities to get every term into the appropriate form, we can get this result if we manage to show that
maxθ′∈BrΓ0×Γ1(Θ)(π×idΦ)∗θ′(λy0α0x.f(y0,x,α0))≤maxθ∈BrΓ0(Θ)prelΓ0×Φθ(λy0α0x.f(y0,x,α0))
So, to establish this, we’ll show that, given some θ′∈BrΓ0×Γ1(Θ), we have
(π×idΓ1×Φ)∗(θ′)∈BrΓ0(Θ), and that
prelΓ0×Φ((π×idΓ1×Φ)∗θ′)=(π×idΦ)∗θ′
because, if we show that, then it means that BrΓ0(Θ) is a rich enough set for the right-hand side of the equation to match anything the left-hand-side can put out.
First off,
prelΓ0×Φ((π×idΓ1×Φ)∗θ′)=(π×idΦ)∗θ′
is pretty trivial to show. The only difference between the two processes is that the Γ1 coordinate of θ′ is discarded immediately on the right-hand-side, and it’s preserved for one step and then discarded on the second step for the left-hand-side.
Now for our inequality of interest. Let θ′∈BrΓ0×Γ1(Θ), and we’re trying to show that
(π×idΓ1×Φ)∗(θ′)∈BrΓ0(Θ)
First off, showing the support condition for (π×idΓ1×Φ)∗(θ′) which is somewhat nontrivial this time around. We start off with a guarantee that (y0,y1)∈α. This happens iff
y0∈{y′0|(y′0,y1)∈α}=π(y0,y1,α)2Γ0
And so, we get that y0∈α0 is guaranteed for that pushforward, support condition established.
endofunction condition time. It’s important to remember that we want to treat elΓ0 as the computation side of things, and Γ1×Φ as the environment side of things, for our bridge transform we’re working with.s:Γ0→Γ0 and g:Γ0×Γ1×Φ→[0,1]. Begin.
(π×idΓ1×Φ)∗θ′(λy0y1α0x.χs(y0)∈α0g(s(y0),y1,x))=θ′(λy0y1αx.χs(y0)∈π(y0,α,y1)2Γ0g(s(y0),y1,x))
Let’s unpack precisely what that set is.
=θ′(λy0y1αx.χs(y0)∈{y′0|(y′0,y1)∈α}g(s(y0),y1,x))=θ′(λy0y1αx.χ(s(y0),y1)∈αg(s(y0),y1,x))
And we can rewrite the endofunction a little bit
=θ′(λy0y1αx.χ(s×idΓ1)(y0,y1)∈αg((s×idΓ1)(y0,y1),x))
And finally apply our endofunction condition, since we’ve now got the function in a form that’s treating y0,y1 as part of the computational universe…
≤Θ(λy0y1x.g(y0,y1,x))
And we’re done, this establishes our desired result. ■
Proposition 2.17:Br(Θ) is a continuous function
of Θ.
The way this proof will work is by describing a composition of functions that makes Br(Θ) from Θ, and then showing that each of these functions is continuous, if elΓ×Φ is a finite set.
Claim: The bridge transform of some Θ is equal to (using χelΓ to denote restricting an ultradistribution to the event y∈α and χ−1elΓ to denote the inverse of said function, mapping an ultradistribution on elΓ to the largest ultradistribution that could have produced it via restriction)
χelΓ(⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ)))))
Breaking down the unfamilar notation, the type of pr is elΓ×Φ→Γ×Φ, just the usual projection. That asterisk up top is pullback along that function. The type of ι is elΓ×Φ→Γ×2Γ×Φ. And s∗ is pullback along the function Γ×2Γ×Φ→Γ×2Γ×Φ given by (s,id2Γ,idΦ).
Let’s unpack the exact conditions that cause a θ to lie in the set
χelΓ(⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ)))))
First off, a θ is in this set iff it is supported over the event y∈α, and it lies in the set ⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ))))
Which occurs iff θ is supported over the event y∈α, and for all s:Γ→Γ, θ lies in the set
s∗(χ−1elΓ(ι∗(pr∗(Θ))))
Which occurs iff θ is suported over the event y∈α, and for all s:Γ→Γ, s∗(θ) lies in the set
χ−1elΓ(ι∗(pr∗(Θ)))
Which occurs iff θ is supported over the event y∈α, and for all s:Γ→Γ, χelΓ(s∗(θ)) lies in the set ι∗(pr∗(Θ))
Now, ι is just doing a little bit of type conversion, so we’re justified in ignoring it. Anways, the previous thing occurs iff θ is supported over the event y∈α, and for all s:Γ→Γ, pr∗(χelΓ(s∗(θ)))∈Θ.
Which happens iff θ is supported over the event y∈α and for all s:Γ→Γ and g:Γ×Φ→[0,1],
pr∗(χelΓ(s∗(θ)))(λyx.g(y,x))≤Θ(λyx.g(y,x))
However, unpacking the left-hand side, we get
pr∗(χelΓ(s∗(θ)))(λyx.g(y,x))=χelΓ(s∗(θ))(λyαx.g(y,x))=s∗(θ)(λyαx.χy∈αg(y,x))=θ(λyαx.χs(y)∈αg(s(y),x))
Which is the exact condition for θ to lie in the bridge transform. So, we have an equivalence.
Now, since we’ve phrased the bridge transform as
χelΓ(⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ)))))
We just need to establish that when all the sets are finite, then pullbacks are continuous, pushforwards are continuous, un-restrictions are continuous, intersections are continuous, and restrictions are continuous. Then, this would just be a particularly fancy continuous function, and accordingly, if Θn limited to Θ, then Br(Θn) would limit to Br(Θ).
Let’s establish that when the sets are finite, pullbacks are continuous. Let g:X→Y, and Y and X be finite sets, and ψ∈□Y. Then, we have
g∗(ψ)(λx.f(x)):=ψ(λy.maxx∈g−1(y)f(x))
With the convention that maximizing over the empty set produces a value of 0. That is an alternate phrasing of pullback. We can then go
limn→∞d(g∗(ψn),g∗(ψ))=limn→∞supf:X→[0,1]|g∗(ψn)(f)−g∗(ψ)(f)|=limn→∞supf:X→[0,1]|ψn(λy.maxx∈g−1(y)f(x))−ψ(λy.maxx∈g−1(y)f(x))|≤limn→∞suph:Y→[0,1]|ψn(h)−ψ(h)|=limn→∞d(ψn,ψ)=0
Admittedly, this isn’t quite what our usual modified KR metric usually looks like. The reason we can do this is because we’re just dealing with functions in [0,1], so the norm part of the modified KR metric doesn’t matter, and since our sets are finite, we can say that all points are distance 1 from each other, so all functions are 1-Lipschitz, and then the two metrics coincide. So, pullback along any function is continuous.
For pushforward, it’s easy because, if ψ∈□X, then we’ve got
limn→∞d(g∗(ψn),g∗(ψ))=limn→∞suph:Y→[0,1]|g∗(ψn)(h)−g∗(ψ)(h)|=limn→∞suph:Y→[0,1]|ψn(λx.h(g(x)))−ψ(λx.h(g(x)))|≤limn→∞supf:X→[0,1]|ψn(f)−ψ(f)|=limn→∞d(ψn,ψ)=0
For showing restrictions continuous, for the set E⊆X that we’re updating on,
limn→∞d(χE(ψn),χE(ψ))=limn→∞supf:X→[0,1]|χE(ψn)(f)−χE(ψ)(f)|=limn→∞supf:X→[0,1]|ψn(χx∈Ef(x))−ψn(χx∈Ef(x))|≤limn→∞supf:X→[0,1]|ψn(f)−ψ(f)|=limn→∞d(ψn,ψ)=0
For intersections… that will take a bit more work. We’ll have to use the equivalent formulation of closeness, that ψn limits to ψ iff the Hausdorff distance between the corresponding sets (according to the generalized KR measure) limits to 0. So, our task is to assume that ψn limits to ψ, and ϕn limits to ϕ, and show that ψn∩ϕn limits to ψ∩ϕ. The bound we’ll manage to prove is that
d(ψn∩ϕn,ψ∩ϕ)≤|X|max(d(ψn,ψ),d(ϕn,ϕ))
Where |X| is the number of elements in the finite set X. Here’s the basic argument. For any particular point in the set ψn, there’s a nearby point in ψ (since the Hausdorff distance is low) with only ϵ measure moved around or deleted. So, in particular, if all the measure moved or deleted was just deleted from ψn instead, then that resulting contribution would be below the nearby contribution in ψ that we picked, and so it would lie in ψ as well due to downwards closure.
So, in particular, if ψn and ψ only have a Hausdorff distance of ϵ, then, taking any contribution in ψn and subtracting ϵ measure from \emph{all points} (if possible, if not, just remove measure till you’re at 0) is \emph{guaranteed} to make a point in ψ, and vice-versa.
And a corollary of that is that, given any contribution in ψn∩ϕn, the “subtract max(d(ψn,ψ),d(ϕn,ϕ)) measure from each point” contribution is in ψ, also in ϕ, and at a maximum distance of |X|max(d(ψn,ψ),d(ϕn,ϕ)) from the original contribution. And this argument can be reversed to show that the limit of the intersections is the intersection of the limits (because hausdorff distance between the two goes to 0), so we do in fact have intersection being continuous.
And that just leaves un-restricting. Again, this will take a Hausdorff-distance argument. Fixing some contribution in χ−1E(ψn), it can be broken down into an on-E part θn,E, and an off-E part θn,¬E. When you restrict to E, then θn,E∈ψn. Since ψn is within ϵ of ψ, there’s some θE∈ψ that’s within ϵ of θn,E. Then, let your point in χ−1E(ψ) be θE+θn,¬E (if there’s slightly more than 1 measure there, delete ϵ measure from θn,¬E, or all the measure if there’s less than ϵ present). It’s close to θn,E+θn,¬E because θn,E is close to θE, the other component of it is unchanged, and maybe we deleted a little bit of excess measure which didn’t do much.
This line of argument shows that ψn being close to the limit ψ is sufficient to establish that the un-restriction of the two of them are comparably close together. So we have continuity for that, which is the last thing we needed.
Since we wrote the bridge transform as a sequence of continuous functions, we know it’s continuous (as long as all the involved sets are finite) ■
Proposition 3.1:Let X be a finite poset, f:X→R
and Θ∈□cX downward closed. Define fmax:X→R
by fmax(x):=maxy≤xf(y). Observe that fmax is
always non-decreasing. Then, Θ(f)=Θ(fmax).
Proof: Pick a θ′∈Θ s.t. θ′(fmax)=Θ(fmax). Ie, a maximizing contribution. Let k:X→X be defined as k:=λx.argmaxy≤xf(y). Ie, it moves a point down to somewhere below it where it can attain the highest value according to f. Now, consider k∗(θ′). It’s present in Θ because Θ was, by assumption, downwards closed, and we just moved all the measure down.
Now, we have
Θ(f)=maxθ∈Θθ(f)≥k∗(θ′)(f)=θ′(λx.f(k(x)))=θ′(λx.f(argmaxy≤xf(y)))=θ′(λx.maxy≤xf(y))=θ′(fmax)=Θ(fmax)≥Θ(f)
And so, all inequalities must be equalities, proving that Θ(fmax)≥Θ(f). In order, the connectives were: unpacking definitions, using downward closure to conclude that k∗(θ′)∈Θ, unpacking pushforwards, unpacking the definition of k, using that applying a function to the argmax of inputs to that function just makes the max of the function, folding the definition of fmax back up, using that θ′ was selected to maximize fmax, and applying monotonicity. Done! ■
Proposition 4.1:Consider some Γ, Φ,
a relation Q⊆Γ×Φ and a PUCK Ξ over Q.
Let Θ:=⊤Γ⋉Ξ. Then,Br(Θ)=[⊤Γ⋉(susΘ⋊Ξ)]↓=[⊤Γ⋉(Q−1⋊Ξ)]↓
First off, I’m not terribly picky about variable ordering, so I’ll just write our desired proof target as
Br(Θ)=[⊤Γ⋉Ξ⋉susΘ]↓=[⊤Γ⋉Ξ⋉Q−1]↓
The way we’ll do this is by establishing the following result. For all monotone f′:elΓ×Φ→[0,1], we have
Br(Θ)(f′)≤[⊤Γ⋉Ξ⋉susΘ](f′)≤[⊤Γ⋉Ξ⋉Q−1](f′)≤Br(Θ)(f′)
Why does that suffice? Well, assume hypothetically that the result held. Since the inequalities go in a circle, we have equality for all monotone functions. And then, for some non-monotone function f, we can go
Br(Θ)(f)=Br(Θ)(fmax)=[⊤Γ⋉Ξ⋉susΘ](fmax)=[⊤Γ⋉Ξ⋉susΘ]↓(fmax)=[⊤Γ⋉Ξ⋉susΘ]↓(f)
and swap out susΘ for Q−1 to show the other equality, and then we’d have equality of the three ultradistributions on all functions, so they’re equal.
For the equalities in the above equation, the first one arose because of Proposition 2.4 (bridge transforms are always downwards closed) and Proposition 3.1 (downwards-closed things let you swap out f for fmax and it doesn’t affect the value). The second equality arose because fmax is a monotone function and by assumption, we have equality for monotone functions. The third equality would arise because taking the downwards closure doesn’t affect the expectation value of monotone functions. If you add a bunch of contributions made by measure flowing down, that’s just strictly worse from the perspective of a monotone function and doesn’t change expectation value. And the fourth equality arises from Proposition 3.1 again.
So, we just need to prove the following three inequalities, for monotone functions f.
Br(Θ)(f)≤[⊤Γ⋉Ξ⋉susΘ])(f)≤[⊤Γ⋉Ξ⋉Q−1](f)≤Br(Θ)(f)
The first one is easily addressable by Proposition 2.7. By proposition 2.7 and the definition of Θ, we have
Br(Θ)⊆(Θ⋉susΘ)↓=[⊤Γ⋉Ξ⋉susΘ]↓
And so, for monotone functions f, we have
Br(Θ)(f)≤[⊤Γ⋉Ξ⋉susΘ])(f)
Done.
Now to show our second inequality.
(⊤Γ⋉Ξ⋉susΘ)(λyαx.f(y,x,α))=(⊤Γ⋉Ξ)(λyx.δsusΘ(x)(λα.f(y,x,α)))=(⊤Γ⋉Ξ)(λyx.f(y,x,susΘ(x)))
Unpack the definition of the set
=(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|(y′,x)∈supp Θ}))
Unpack the definition of Θ=(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|(y′,x)∈supp ⊤Γ⋉Ξ}))
The condition (y′,x)∈supp ⊤Γ⋉Ξ is equivalent to x∈supp Ξ(y′). After all, if x∈supp Ξ(y′), the distribution δy′ lies in ⊤Γ, so δy′⋉Ξ would certify that (y′,x)∈supp ⊤Γ⋉Ξ. And if x∉supp Ξ(y′), then no matter the distribution in ⊤Γ or kernel selected from Ξ, if y′ gets picked, then the kernel selected from Ξ isn’t going to be making x along with it. Since we have that iff characterization, we have
=(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈supp Ξ(y′)}))Ξ(y′) is the union of a bunch of k(y′) for k∈Π (and convex hull), so its support is equal to the union of the supports for the k(y′).
=(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈⋃k∈Πsupp k(y′)}))
Then, since each k is a PoCK over Q, k(y′) is the restriction of some measure ϖk to the set Q(y), which will be written as χQ(y′)ϖk.
=(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈⋃k∈Πsupp (χQ(y′)ϖk)}))
And now we’re about to get an inequality.f is monotone, so making the associated set bigger (easier to fulfill the defining condition) should always increase the value of f, and by monotonicity, increase the expectation value, so we get
≤(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈Q(y′)}))
Then restate
=(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|(x,y′)∈Q}))=(⊤Γ⋉Ξ)(λyx.f(y,x,Q−1(x)))
And pack back up as a semidirect product.
=(⊤Γ⋉Ξ)(λyx.δQ−1(x)(λα.f(y,x,α)))=(⊤Γ⋉Ξ⋉Q−1)(λyαx.f(y,x,α))
And we have our second ≤ inequality established!
Now, onto the third inequality.
(⊤Γ⋉Ξ⋉Q−1)(λyαx.f(y,x,α))
Unpack the semidirect products
=⊤Γ(λy.Ξ(y)(λx.δQ−1(x)(λα.f(y,x,α))))
And what top means
=maxy∈ΓΞ(y)(λx.δQ−1(x)(λα.f(y,x,α)))
And as for Ξ… well, each Ξ(y) is the convex hull of the various k(y), for k∈Π. So, the expectation for Ξ(y) is the maximum expectation for the various k(y), so we can rewrite as
=maxy∈Γmaxk∈Πk(y)(λx.δQ−1(x)(λα.f(y,x,α)))
Pick a particular y∗ and k∗ that attain the maximal value
=k∗(y∗)(λx.δQ−1(x)(λα.f(y∗,x,α)))
Reexpress a little bit
=δy∗(λy.k∗(y)(λx.δQ−1(x)(λα.f(y,x,α)))
And pack this back up as a semidirect product
=(δy∗⋉k∗⋉Q−1)(λyαx.f(y,x,α))
And then we’ll be showing that this contribution lies in Br(Θ). Once we’ve done that, we can go
≤maxθ′∈Br(Θ)θ′(λyαx.f(y,x,α))=Br(Θ)(λyαx.f(y,x,α))
And we’d be done, having proven the third inequality and the last one to finish up the proof. So, now our proof target switches to showing that (δy∗⋉k∗⋉Q−1)∈Br(Θ). We can show this if we show the support condition and the endofunction condition.
For the support condition, we have
(δy∗⋉k∗⋉Q−1)(λyαx.χy∉α)=δy∗(λy.k∗(y)(λx.δQ−1(x)(λα.χy∉α)))=δy∗(λy.k∗(y)(λx.χy∉Q−1(x)))=k∗(y∗)(λx.χy∗∉Q−1(x))
And then we use that the k∗(y∗) are all of the form “take this measure, restrict it to Q(y∗)”, to get
=(χQ(y∗)ϖk∗)(λx.χy∗∉Q−1(x))=ϖk∗(λx.χx∈Q(y∗)χy∗∉Q−1(x))
Unpacking the definitions, we get
=ϖk∗(λx.χ(x,y∗)∈Qχ(x,y∗)∉Q)=0
And so, this contribution is indeed supported on (y,α) pairs s.t. y∈α.
Now for the endofunction condition. As usual, fix an s and a g.
(δy∗⋉k∗⋉Q−1)(λyαx.χs(y)∈αg(s(y),x))
Unpack the semidirect product
=δy∗(λy.k∗(y)(λx.δQ−1(x)(λα.χs(y)∈αg(s(y),x))))
Plug in the dirac-deltas
=k∗(y∗)(λx.χs(y∗)∈Q−1(x)g(s(y∗),x))
Reexpress the set membership criterion a bit
=k∗(y∗)(λx.χx∈Q(s(y∗))g(s(y∗),x))
And the contribution at the start
=(χQ(y∗)ϖk∗)(λx.χx∈Q(s(y∗))g(s(y∗),x))
Distribute it in as an indicator function.
=ϖk∗(λx.χx∈Q(y∗)χx∈Q(s(y∗))g(s(y∗),x))
Pull the other indicator function out.
=(χQ(s(y∗))ϖk∗)(λx.χx∈Q(y∗)g(s(y∗),x))
Rewrite with k∗=k∗(s(y∗))(λx.χx∈Q(y∗)g(s(y∗),x))
Use an inequality to get rid of the indicator function
≤k∗(s(y∗))(λx.g(s(y∗),x))
Rewrite it a bit
=δs(y∗)(λy.k∗(y)(λx.g(y,x)))
Swap out k∗(y) for Ξ(y), the latter is larger
≤δs(y∗)(λy.Ξ(y)(λx.g(y,x)))
Swap out δs(y∗) for ⊤Γ, the latter is larger
≤⊤Γ(λy.Ξ(y)(λx.g(y,x)))=(⊤Γ⋉Ξ)(λyx.g(y,x))
Abbreviate
=Θ(λyx.g(y,x))
And bam, endofunction condition is shown, the entire proof goes through now. ■
Corollary 4.3:Suppose that for any d∈D and π:H→A
s.t. d∈supp W(π), it holds that dCπ. That is, the observations
W predicts to receive from the computer are consistent with the
chosen policy. Let L:D→R be a Cartesian loss function
and π:H→A a policy. Then,(prelΓBr(ΘW)∩Cπfair)(Lphys)=W(π;L)
I’m going to be proceeding very cautiously here. First off, make our π value visually distinct by swapping it out for π∗(prelΓBr(ΘW)∩Cπ∗fair)(Lphys)
Now, by the identifications we made earlier, we can identify Γ with AH, the space of policies. Using that to unpack the function a little bit, we have
=(prelΓBr(ΘW)∩Cπ∗fair)(λπα.Lphys(π,α))
Now, we note that intersecting with top of a particular set is equivalent to updating on the indicator function for that set. Using definition 1.5 to unpack Cπ∗fair, we get
=(prelΓBr(ΘW))(λπα.χ∀h∈Hπ,α:Gπ(h)=π∗(h)Lphys(π,α))
Apply that Gπ(h) is “what would the agent do on h if the agent is copying the behavior of π”, so we can rephrase as:
=(prelΓBr(ΘW))(λπα.χ∀h∈Hπ,α:π(h)=π∗(h)Lphys(π,α))
Pull off the projection, and use d for a destiny in D.
=Br(ΘW)(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)Lphys(π,α))
At this point, we use that ΘW:=⊤Γ⋉W, and that W is a PUCK over Q0 and Proposition 4.1 to go
=[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)Lphys(π,α))
Before we can remove the downwards closure, we’ll want to verify the function is monotone. So, we’ll want to start unpacking the physicalist loss next. Applying definition 3.1, and using d′ instead of g to remember it’s a destiny, we have
=[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)minha:ha∈Xπ,αmaxd′:ha⊑d′L(d′))
Next up is unpacking Xπ,α. Using definition 3.1, it’s
=[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)minha:ha∈Hπ,α×A∧(∀π′∈α:Gπ′(h)=a)maxd′′:ha⊑d′L(d′))
At this point, we can, again, treat Gπ′(h) the same as π′(h).
=[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)minha:ha∈Hπ,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′′L(d′))
And now we need to take a moment to show that Hπ,α gets smaller when α gets larger. Applying definition 1.5, the event h∈Hπ,α unpacks as
(∀h′a⊏h,π′∈α:Gπ′(h′)=a)∧(∃d′:h⊏d′∧d′Cπ)
Now, if α becomes a larger set, then it gets harder for the first condition to be fulfilled, so the set Hπ,α shrinks. Now, since this happens, it means that if α gets bigger, it gets more difficult for the prerequisite of the implication in the indicator function to be fulfilled, so the implication is more likely to hold. Further, the minimization is taking place over a smaller set, so the loss goes up. So our function is monotone in α, and we can remove the downwards closure.
=(⊤Γ⋉W⋉Q−10)(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)minha:ha∈Hπ,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′′L(d′))
Unpacking the semidirect product, it is
=⊤Γ(λπ.W(π)(λd.δQ−10(d)(λα.χ∀h∈Hπ,α:π(h)=π∗(h)minha:ha∈Hπ,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′))))
Substituting in the dirac-delta everywhere that α is, we get
=⊤Γ(λπ.W(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)minha:ha∈Hπ,Q−10(d)×A∧(∀π′∈Q−10(d):π′(h)=a)maxd′:ha⊑d′L(d′)))
Now, Q−10(d) is the set of policies π′ s.t. π′Q0d. The “this policy is consistent with this destiny” relation. Also let’s swap out ⊤Γ for maximization
=maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)minha:ha∈Hπ,Q−10(d)×A∧(∀π′Q0d:π′(h)=a)maxd′:ha⊑d′L(d′))
Now, we’re going to try to address that minimum, and show that the only ha that fulfill the conditions are exactly those ha⊑d. This requires showing that ha⊑d is a sufficient condition to fulfill the relevant properties, and then to show that ha⋢d implies a failure of one of the properties.
So, first up. Assume ha⊑d. Then, for any π′, dQ0π′ and ha⊑d \emph{must} imply that π′(h)=a, that’s what policy consistency means. Also, h∈Hπ,Q−10(d) unpacks as the two conditions
∀h′a′,π′:h′a′⊏h∧dQ0π′→π′(h′)=a′∃d′:h⊏d′∧d′Cπ
As for the first condition,clearly, if π′ is consistent with d, it’s consistent with ha because ha⊑d, and so it must be consistent with any prefix of ha, so the first condition holds.
For the second condition, d is a valid choice, because we assumed ha⊑d, and dCπ occurs always, because W(π) always being supported on d s.t.dCπ was one of our problem assumptions.
So, we have one implication direction down. Now for the reverse implication direction. Assume ha⋢d. Then there are two possibilities. the first possibility is that ha first diverges from d on an observation. The second possibility is that ha first diverges from d on an action.
For the first possibility, it’s possible to make two policies which are consistent with d but also differ in their actions on history h, because h isn’t a prefix of d if ha first differs from d on an observation.
For the second possibility, it’s ruled out by either the condition for h∈Hπ,Q−10(d) that goes
∀h′a′,π′:h′a′⊏h∧π′Q0d→π′(h′)=a′
or the extra condition that
∀π′:π′Q0d→π′(h)=a
applied to the first a-history prefix that deviates from d, because π′Q0d implies that π′(h′) must be the action which d dictates, not the action a′ that deviates from d.
And that establishes the other direction of the iff statement.
Thus, we can swap out our fancy minimization with just minimizing over the ha⊑d.
=maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)minha:ha⊑dmaxd′:ha⊑d′L(d′))
This minimization is attained by selecting d itself. So then it turns into
=maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)L(d))
At this point, what we’ll do is show that an upper bound and lower bound on the value of this term is the same. Going from upper bound to lower bound, it’s starting out with
W(π∗)(λd.L(d))
At this point, we’ll use that W is a PUCK, so there’s a set E of environments e (PoCK’s) that W is generated from, so we can go:
=maxe∈Ee(π∗)(λd.L(d))=maxπmaxe∈Ee(π∗)(λd.χdQ0πL(d))=maxπmaxe∈E(χQ0(π∗)ϖe)(λd.χdQ0πL(d))=maxπmaxe∈Eϖe(λd.χdQ0π∗χdQ0πL(d))
Now pull the indicator function back out.
=maxπmaxe∈E(χQ0(π)ϖe)(λd.χdQ0π∗L(d))=maxπmaxe∈Ee(π)(λd.χdQ0π∗L(d))=maxπW(π)(λd.χdQ0π∗L(d))
Now we must show that this is a looser constraint than what was previously in our indicator function to proceed further. So our next order of business is showing that, certainly,
∀h∈Hπ,Q−10(d):π(h)=π∗(h)→dQ0π∗
Let h be one of the history prefixes of some d in the support of W(π). The two conditions for h∈Hπ,Q−10(d) are fulfilled, because they are
∀h′,a′,π′:h′a′⊏h∧dQ0π′→π′(h′)=a′∃d′:h⊏d′∧d′Cπ
For the first condition, if h′a′⊏h, then h′a′⊏d, and so if π′ is consistent with d, it must take the same action in response to h′, the action that d commands, a′. So that’s fulfilled. For the second condition, let d′ be d. h⊏d holds, and so dCπ holds certainly, because W(π) is supported on d s.t. dCπ.
So, for all d in the support of W(π), h⊏d→h∈Hπ,Q−10(d). Since we assumed our forall statement as prerequisite, this means that for all h⊏d, π(h)=π∗(h). And dQ0π means ∀ha⊑d:π(h)=a. Since π∗(h) mimics π(h) for all history prefixes of d, this means ∀ha⊑d:π∗(h)=a, ie dQ0π∗.
So, since this is a looser constraint, when we were previously at
=maxπW(π)(λd.χdQ0π∗L(d))
we can proceed further to
≥maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)L(d))
Which is our value we’re trying to sandwich. Now, at this point, plug in π∗ and get
≥W(π∗)(λd.χ∀h∈Hπ∗,Q−10(d):π∗(h)=π∗(h)L(d))=W(π∗)(λd.L(d))
And bam, we’ve sandwiched our term between W(π∗)(L) on both sides, and so the result follows. ■
Proposition 4.2:Let X, Y and Z be finite
sets, Q⊆Y×X a relation, κ:Y→Δc(X×Z)
a Z-PoCK over Q and Θ∈□cY. Then, there exist μ∈ΔcZ
and ϕ:Z×Y→ΔcX s.t. for all z, λy.ϕ(z,y) is a PoCK over Q, and for all y∈Y,
κ(y)=(λz.ϕ(z,y))⋊μ. Moreover, suppose that (μ1,ϕ1)
and (μ2,ϕ2) are both as above.
Then,μ1⋉Θ⋉ϕ1=μ2⋉Θ⋉ϕ2
Our first order of business is establishing that there’s even a μ and ϕ that has those effects at all. Here’s a way to define them.
μ(z):=maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)
Where ϖκ is the measure on Z×X that κ is associated with, ie, κ(y)=χQ(y)ϖκ must be true for some ϖκ because κ is a Z-PoCK over Q. And, ϕ will be defined as:
ϕ(y,z)(x):=χx∈Q(y)ϖκ(z,x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)
With those definitions in place, it’s easy to establish that μ⋉(λz.ϕ(y,z))=κ(y). We can just fix an arbitrary x,z pair and go
κ(y)(x,z)=χx∈Q(y)ϖκ(z,x)=maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)⋅χx∈Q(y)ϖκ(z,x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=μ(z)⋅ϕ(y,z)(x)=(μ⋉(λz′.ϕ(y,z′)))(x,z)
And we’re done with showing that such functions exist in the first place. Well, as long as we check that μ and ϕ behave accordingly. First off, μ being a contribution follows from ϖκ being a Z-polycontribution, and the definition of Z-polycontributions. Also, to show that (λy.ϕ(y,z)) is a PoCK over Q, we need to show that there’s a ϖϕ,z s.t. ϕ(y,z)=χQ(y)ϖϕ,z, and that always has 1 or less measure.
In order to do this, define
ϖϕ,z:=1maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)prX(χ{z}×Xϖκ)
Clearly, you get ϕ(y,z) from restricting this to Q(y), because we have
(χQ(y)ϖϕ,z)(x)=1maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)χQ(y)(prX(χ{z}×Xϖκ))(x)=χQ(y)(prX(χ{z}×Xϖκ))(x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=χx∈Q(y)prX(χ{z}×Xϖκ)(x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=χx∈Q(y)∑z′(χ{z}×Xϖκ)(x,z′)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=χx∈Q(y)∑z′χz′=zϖκ(x,z′)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=χx∈Q(y)ϖκ(x,z)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=ϕ(y,z)(x)
And we’re done. And also, the measure is ≤1, because
∑x∈Q(y)ϖϕ,z(x)=∑x∈Q(y)prX(χ{z}×Xϖκ)(x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)
and, skipping over a few routine steps,
=∑x∈Q(y)ϖκ(x,z)maxy′∈Y∑x′∈Q(y′)ϖκ(x′,z)≤∑x∈Q(y)ϖκ(x,z)∑x′∈Q(y)ϖκ(x′,z)=1
And we’re done, we figured out how to decompose κ into μ and ϕ.
Now for the second half of the proof. The first thing to establish is that, for all y,z, we have μ1(z)⋅ϕ1(y,z)=μ2(z)⋅ϕ2(y,z). This occurs because, for all x,
μ1(z)⋅ϕ1(y,z)(x)=(μ1⋉(λz.ϕ1(y,z)))(x,z)=κ(y)(x,z)
And then by symmetry, the exact same holds for μ2 and ϕ2, both were declared to be equal to κ. Now that this result is in place, we can begin.
(μ1⋉Θ⋉ϕ1)(λxyz.f(x,y,z))=μ1(λz.Θ(λy.ϕ1(y,z)(λx.f(x,y,z))))
Now, we do something odd. We can reexpress this as
=C(λz.μ1(z)⋅Θ(λy.ϕ1(y)(λx.f(x,y,z))))
Basically, what’s going on here is that we can swap out the contribution μ1 for the counting measure C (1 measure on each distinct point) and just scale down the expectation values accordingly. It’s pretty much the same way that you can think of ∑xμ(x)f(x) (expectation of f w.r.t μ) as ∑x1⋅μ(x)f(x) (expectation of μ⋅f w.r.t the counting measure). Now, since Θ is homogenous, we can move constants in or out of it, to get
=C(λz.Θ(λy.μ1(z)⋅ϕ1(y,z)(λx.f(x,y,z))))
Now, at this point, we can use that μ1(z)⋅ϕ1(y,z)=μ2(z)⋅ϕ2(y,z), to get
=C(λz.Θ(λy.μ2(z)⋅ϕ2(y,z)(λx.f(x,y,z))))
And just back up and reverse everything.
=C(λz.μ2(z)Θ(λy.ϕ2(y,z)(λx.f(x,y,z))))=μ2(λz.Θ(λy.ϕ2(y)(λx.f(x,y,z))))=(μ2⋉Θ⋉ϕ2)(λxyz.f(x,y,z))
And we’re done! ■
Lemma 4:Let X, Y and Z be finite
sets, Q⊆Y×X a relation, Ξ1,Ξ2:Y→□c(X×Z)Z-PUCKs over Q, Θ∈□cY and p∈[0,1]. Then, pΞ1+(1−p)Ξ2
is also a Z-PUCK over Q, andΘ∗(pΞ1+(1−p)Ξ2)⊆p(Θ∗Ξ1)+(1−p)(Θ∗Ξ2)
Our first order of business is establishing that the mix of Z-PUCK’s over Q is a Z-PUCK over Q. Here’s what we’ll do. We’ll define a family of kernels, show that they’re all Z-PoCK’s, and that said family makes a Z-PUCK that’s equal to the mix of Ξ1 and Ξ2.
So, let Π1 be the set of Z-PoCK’s associated with Ξ1, and Π2 be the set of Z-PoCK’s associated with Ξ2. Elements of these sets are κ1 and κ2. Define Π as {pκ1+(1−p)κ2|κ1∈Π1,κ2∈Π2}.
By Definition 4.5, in order to establish that these are Z-PoCK’s over Q, we need to make an appropriate choice of ϖ. In particular, the ϖ associated with κ=pκ1+(1−p)κ2 is ϖκ:=pϖκ1+(1−p)ϖκ2. It fufills definition 4.5 because
κ(y)(x,z)=(pκ1+(1−p)κ2)(y)(x,z)=pκ1(y)(x,z)+(1−p)κ2(y)(x,z)=p(χQ(y)ϖκ1)(x,z)+(1−p)(χQ(y)ϖκ2)(x,z)=(pχQ(y)ϖκ1+(1−p)χQ(y)ϖκ2)(x,z)=χQ(y)(pϖκ1+(1−p)ϖκ2)(x,z)=χQ(y)ϖκ(x,z)
By unpacking our definition, using how mixes of kernels work, applying definition 4.5 for κ1 and κ2, and then just doing some simple regrouping and packing the definition back up, we get our result.
But wait, we still need to show that ϖκ is a Z-Polycontribution on Q. Again, this isn’t too hard to show, with Definition 4.4.
∑z∈Zmaxy∈Y∑x∈Q(y)ϖκ(x,z)=∑z∈Zmaxy∈Y∑x∈Q(y)(pϖκ1+(1−p)ϖκ2)(x,z)=∑z∈Zmaxy∈Y∑x∈Q(y)(pϖκ1(x,z)+(1−p)ϖκ2(x,z))=∑z∈Zmaxy∈Y⎛⎝p∑x∈Q(y)ϖκ1(x,z)+(1−p)∑x∈Q(y)ϖκ2(x,z)⎞⎠≤∑z∈Z⎛⎝pmaxy∈Y∑x∈Q(y)ϖκ1(x,z)+(1−p)maxy∈Y∑x∈Q(y)ϖκ2(x,z)⎞⎠=p∑z∈Zmaxy∈Y∑x∈Q(y)ϖκ1(x,z)+(1−p)∑z∈Zmaxy∈Y∑x∈Q(y)ϖκ2(x,z)≤p⋅1+(1−p)⋅1=1
And bam, we have our inequality demonstrated, everything works out. Now we just need to show that this family of Z-PoCK’s makes the Z-PUCK that’s the mixture of the two. We’ll establish equality by showing equality for all functions and all y.
Ξ(y)(f)=maxκ∈Πκ(y)(f)=maxκ∈{pκ1+(1−p)κ2|κ1∈Π1,κ2∈Π2}κ(y)(f)=maxκ1∈Π1,κ2∈Π2(pκ1+(1−p)κ2)(y)(f)=maxκ1∈Π1,κ2∈Π2pκ1(y)(f)+(1−p)κ2(y)(f)=pmaxκ1∈Π1κ1(y)(f)+(1−p)maxκ2∈Π2κ2(y)(f)=pΞ1(y)(f)+(1−p)Ξ2(y)(f)=(pΞ1+(1−p)Ξ2)(y)(f)
Done, we’ve shown equality of the Z-PUCK with the mixture of other Z-PUCKs, establishing that the mixture of Z-PUCKs is a Z-PUCK.
That leaves establishing our relevant inequality. But before we do that, we’ll be wanting a nice handy form for that asterisk operator to manipulate things with. Given some κ that’s a Z-PUCK over Q, remember from the previous proof that a valid choice for μ and ϕ to break κ down is
μ(z):=maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)
and, abbreviating things a little bit, we have
ϕ(y,z)(x):=χx∈Q(y)ϖκ(z,x)μ(z)
So, we can get a pleasant-to-manipulate form for Θ∗κ as follows.
(Θ∗κ)(λxyz.f(x,y,z))=(μ⋉Θ⋉ϕ)(λxyz.f(x,y,z))=μ(λz.Θ(λy.ϕ(y,z)(λx.f(x,y,z))))
And proceed further
=∑z∈Zμ(z)⋅Θ(λy.ϕ(y,z)(λx.f(x,y,z)))=∑z∈Zμ(z)⋅Θ(λy.∑x∈Xϕ(y,z)(x)⋅f(x,y,z))=∑z∈Zμ(z)⋅Θ(λy.∑x∈Xχx∈Q(y)ϖκ(z,x)μ(z)⋅f(x,y,z))
And then we move the constant into Θ since it’s homogenous, and then into the sum, and it cancels out with the fraction.
=∑z∈ZΘ(λy.∑x∈Xχx∈Q(y)⋅ϖκ(z,x)⋅f(x,y,z))=∑z∈ZΘ(λy.∑x∈Q(y)ϖκ(z,x)⋅f(x,y,z))=∑z∈ZΘ(λy.χQ(y)×{z}ϖκ(λx′z′.f(x′,y,z′)))
This general form will be used whenever we need to unpack Θ∗κ. Now, let’s get started on the proof of our subset inclusion thingy. As usual, Π will be the set {pκ1+(1−p)κ2|κ1∈Π1,κ2∈Π2}, and as we’ve shown, that’s the set of Z-PoCK’s associated with pΞ1+(1−p)Ξ2. Also, as we’ve already shown, the associated Z-polycontribution ϖκ for κ=pκ1+(1−p)κ2 is pϖκ1+(1−p)ϖκ2. This will be implicitly used in the following.
(Θ∗(pΞ1+(1−p)Ξ2))(λxyz.f(x,y,z))=maxκ∈Π(Θ∗κ)(λxyz.f(x,y,z))
Now we use our preferred unpacking of how that asterisk operator works.
=maxκ∈Π∑z∈ZΘ(λy.χQ(y)×{z}ϖκ(λx′z′.f(x′,y,z′)))
And unpack κ and ϖκ appropriately.
=maxκ1∈Π1,κ2∈Π2∑z∈ZΘ(λy.χQ(y)×{z}(pϖκ1+(1−p)ϖκ2)(λx′z′.f(x′,y,z′)))=maxκ1∈Π1,κ2∈Π2∑z∈ZΘ(λy.pχQ(y)×{z}ϖκ1(λx′z′.f(x′,y,z′))+(1−p)χQ(y)×{z}ϖκ2(λx′z′.f(x′,y,z′)))
At this point, we use convexity of Θ, since it’s an ultradistribution.
≤maxκ1∈Π1,κ2∈Π2∑z∈Z(pΘ(λy.(χQ(y)×{z}ϖκ1)(λx′z′.f(x′,y,z′)))+(1−p)Θ(λy.(χQ(y)×{z}ϖκ2)(λx′z′.f(x′,y,z′))))=maxκ1∈Π1,κ2∈Π2(p∑z∈ZΘ(λy.(χQ(y)×{z}ϖκ1)(λx′z′.f(x′,y,z′)))+(1−p)∑z∈ZΘ(λy.(χQ(y)×{z}ϖκ2)(λx′z′.f(x′,y,z′))))
At this point, you can pack up things.
=maxκ1∈Π1,κ2∈Π2p(Θ∗κ1)(λxyz.f(x,y,z))+(1−p)(Θ∗κ2)(λxyz.f(x,y,z))=pmaxκ1∈Π1(Θ∗κ1)(λxyz.f(x,y,z))+(1−p)maxκ2∈Π2(Θ∗κ2)(λxyz.f(x,y,z))=p(Θ∗Ξ1)(λxyz.f(x,y,z))+(1−p)(Θ∗Ξ2)(λxyz.f(x,y,z))=(p(Θ∗Ξ1)+(1−p)(Θ∗Ξ2))(λxyz.f(x,y,z))
Done! ■
Proposition 4.3:Let X, Y and Z be finite
sets, Q⊆Y×X a relation, κ1,κ2:Y→Δc(X×Z)Z-PoCKs over Q, Θ∈□cY and p∈[0,1]. Then, pκ1+(1−p)κ2
is also a Z-PoCK over Q, andΘ∗(pκ1+(1−p)κ2)⊆p(Θ∗κ1)+(1−p)(Θ∗κ2)
Use Lemma 4, along with Z-PoCKs being a special case of Z-PUCKs.
Proposition 4.4:Let X, Y and Z be finite
sets, Q⊆Y×X a relation and Ξ:Y→□c(X×Z)
a Z-PUCK over Q. Denote Θ:=⊤Y∗Ξ. Define
β0,β1:Z×Y×X→2Z×Y by
β0(z,y,x):={z}×Q−1(x), \textbf{β1(z,y,x):=Z×Q−1(x)}.
Then(Θ⋉β0)↓⊆Br(Θ)⊆(Θ⋉β1)↓
Proof: As usual, when establishing inequalities with downwards closures, we only have to verify the result for monotone functions. So, we may assume that f is monotone, and attempt to show that
(Θ⋉β0)(λxyzα.f(x,y,z,α))≤BrZ×Y(Θ)(λxyzα.f(x,y,z,α))≤(Θ⋉β1)(λxyzα.f(x,y,z,α))
Remember, bridge transforms cash out as a maximum over contributions, so to show the first inequality, we’ll need to build a contribution that matches or exceeds that first term, and that lands in the bridge transform of Θ. For the second inequality, it’s considerably easier, we just use our Lemma 2 to figure out what sort of sets the bridge transform is supported on, swap out the sets it’s supported on for a bigger set upper bound, and bam, monotonicity of f takes over from there. From there, it’s easy to show the second inequality. Let’s unpack that first thing
(Θ⋉β0)(λxyzα.f(x,y,z,α))=Θ(λxyz.δβ0(x,z)(λα.f(x,y,z,α)))=Θ(λxyz.f(x,y,z,β0(x,z)))
And at this point we unpack what Θ is.
=(⊤Y∗Ξ)(λxyz.f(x,y,z,β0(x,z)))
And the Ξ.
=maxκ∈Π(⊤Y∗κ)(λxyz.f(x,y,z,β0(x,z)))
And then, κ can be broken down into some μκ and ϕκ, and that goes on both sides of ⊤Y as our previous proposition shows.
=maxκ∈Π(μκ⋉⊤Y⋉ϕκ)(λxyz.f(x,y,z,β0(x,z)))=maxκ∈Πμκ(λz.⊤Y(λy.ϕκ(y,z)(λx.f(x,y,z,β0(x,z)))))=maxκ∈Πμκ(λz.maxyϕκ(y,z)(λx.f(x,y,z,β0(x,z))))
Now we can start filling in some data. There’s a maximizing κ∗, so we can substitute that in. That gives us a canonical choice for what μκ∗ and ϕκ∗ are. Making that substitution,
=μκ∗(λz.maxyϕκ∗(y,z)(λx.f(x,y,z,β0(x,z))))
And then, let d:Z→Y be the function mapping each particular z to the y which maximizes ϕκ∗(y,z)(λx.f(x,y,z,β0(x,z))). This lets us reexpress things as
=μκ∗(λz.ϕκ∗(d(z),z)(λx.f(x,d(z),z,β0(x,z))))
And now, we can start unpacking things a bit.
=μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.f(x,y,z,β0(x,z)))))=μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.δβ0(x,z)(λα.f(x,y,z,α)))))
And now we can write things as just a giant semidirect product.
=(μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.f(x,y,z,α))
Now we’ll show that this particular contribution lies in Br(Θ).
Checking the support condition, we want to check for sure that y,z∈α, ie, the event y,z∉α has measure 0. Let’s begin.
(μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.χy,z∉α)=μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.δβ0(x,z)(λα.χy,z∉α))))
Substitute in the dirac-deltas.
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χd(z),z∉β0(x,z)))
Unpack what β0(x,z) is.
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χd(z),z∉Q−1(x)×{z}))
Now, z∈{z} always occurs, so that indicator function is the same as just testing whether d(z)∈Q−1(x).
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χd(z)∉Q−1(x)))
Rephrasing things a little bit,
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χx∉Q(d(z))))
Then, from proposition 4.2, we remember that λy.ϕκ∗(y,z) is a PoCK over Q. Ie, for any particular y, ϕκ∗(y,z) looks like a particular measure (ϖκ∗,z) restricted to Q(y). So, in particular, ϕκ∗(d(z),z) must be supported over Q(d(z)). Put another way, with full measure, x∈Q(d(z)). So, this event failing has 0 measure.
=μκ∗(λz.0)=0
And we’re done with that support condition.
Now to show the endofunction condition. As usual, we’ll let s:Y×Z→Y×Z, and let g:X×Y×Z→[0,1]. Actually, for conceptual clarity, since s:Y×Z→Y×Z can be viewed as a pair of functions sY:Y×Z→Y and sZ:Y×Z→Z, we’ll be using that formulation in our equation.
(μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.χsY(y,z),sZ(y,z)∈αg(sY(y,z),sZ(y,z),x))=μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.δβ0(x,z)(λα.χsY(y,z),sZ(y,z)∈αg(sY(y,z),sZ(y,z),x)))))
As usual, we’ll substitute in our dirac-deltas to simplify things.
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z),sZ(d(z),z)∈β0(x,z)g(sY(d(z),z),sZ(d(z),z),x)))
Substitute in what β0(x,z) is.
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z),sZ(d(z),z)∈Q−1(x)×{z}g(sY(d(z),z),sZ(d(z),z),x)))
Now, if that “this pair of points lies in this set” indicator function goes off, then sZ(d(z),z)=z. So, we can substitute that into the g term afterwards. And then get a ≤ inequality by making the indicator function less strict.
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z),sZ(d(z),z)∈Q−1(x)×{z}g(sY(d(z),z),z,x)))≤μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z)∈Q−1(x)g(sY(d(z),z),z,x)))
And reexpress the indicator function a little bit
=μκ∗(λz.ϕκ∗(d(z),z)(λx.χx∈Q(sY(d(z),z))g(sY(d(z),z),z,x)))
At this point, we can use that ϕκ∗(y,z) is χQ(y)ϖϕκ∗,z (ie, fixing z and varying y it just looks like you’re taking one measure and conditioning on various Q(y)), so reexpress things as
=μκ∗(λz.(χQ(d(z))ϖϕκ∗,z)(λx.χx∈Q(sY(d(z),z))g(sY(d(z),z),z,x)))
And then, view the indicator function as just more conditioning.
=μκ∗(λz.(χQ(d(z))∩Q(sY(d(z),z))ϖϕκ∗,z)(λx.g(sY(d(z),z),z,x)))
And then, relax about what you’re conditioning on.
≤μκ∗(λz.χQ(sY(d(z),z))ϖϕκ∗,z(λx.g(sY(d(z),z),z,x)))
Rewrite it as a kernel again
=μκ∗(λz.ϕκ∗(sY(d(z),z),z)(λx.g(sY(d(z),z),z,x)))
Pull out the dirac-delta
=μκ∗(λz.δsY(d(z),z)(λy.ϕκ∗(y,z)(λx.g(y,z,x))))
Throw one more inequality at it
≤μκ∗(λz.maxyϕκ∗(y,z)(λx.g(y,z,x))))
Write it as top
=μκ∗(λz.⊤Y(λy.ϕκ∗(y,z)(λx.g(y,z,x))))
Write as a semidirect product
=(μκ∗⋉⊤Y⋉ϕκ∗)(λyzx.g(y,z,x))
Reexpress
=(⊤Y∗κ∗)(λyzx.g(y,z,x))≤maxκ∈Π(⊤Y∗κ)(λyzx.g(y,z,x))=(⊤Y∗Ξ)(λyzx.g(y,z,x))=Θ(λyzx.g(y,z,x))
And we’re done! endofunction condition shown. Our relevant contribution is in Br(Θ). Let’s see, where were we… ah right, we had shown that for all monotone f,
(Θ⋉β0)(λxyzα.f(x,y,z,α))=(μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.f(x,y,z,α))
For some choice of d and κ∗. We know this is in Br(Θ), so we get
≤maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,α))=Br(Θ)(λyzαx.f(x,y,z,α))
And we’re done! One inequality done. That just leaves showing the second inequality, where β1(x)=Z×Q−1(x). It’s actually not too bad to show. Start with
Br(Θ)(λxyαz.f(x,y,z,α))=maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,α))
And then, we recall our Lemma 2, that if Θ had its support entirely on x,y,z tuples where y,z in h(x) (for some h:X→2Y×Z), then all the θ∈Br(Θ) would be supported on (x,α) pairs where α⊆h(x). And then, swapping out α for h(x), by monotonicity of f, produces a larger value.
To invoke this argument, our choice of h will be β1, where β1(x)=Q−1(x)×Z. We do need to show that Θ is supported on such tuples.
Θ(λxyz.χy,z∉Q−1(x)×Z)=Θ(λxyz.χy∉Q−1(x))=Θ(λxyz.χx∉Q(y))=(⊤Y∗Ξ)(λxyz.χx∉Q(y))=maxκ∈Π(⊤Y∗κ)(λxyz.χx∉Q(y))=maxκ∈Π(μκ⋉⊤Y⋉ϕκ)(λxyz.χx∉Q(y))=maxκ∈Πμκ(λz.⊤Y(λy.ϕκ(y,z)(λx.χx∉Q(y))))
And then use that ϕκ(y,z)=χQ(y)ϖϕκ,z since it’s a PoCK in Q, to get
=maxκ∈Πμκ(λz.⊤Y(λy.(χQ(y)ϖϕκ,z)(λx.χx∉Q(y))))
Hm, we updated on a set, and are evaluating the indicator function for not being in the set.
=maxκ∈Πμκ(λz.⊤Y(λy.0))=0
Ok, so this means we can invoke Lemma 2. We were previously at
=maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,α))
So now we can invoke monotonicity and go
≤maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,β1(x)))
And then invoke our endofunction property for the stuff in Br(Θ), letting s be the identity function (and also y,z∈α occurs always) to establish a uniform upper bound of
≤Θ(λxyzα.f(x,y,z,β1(x)))=Θ(λxyz.δβ1(x)(λα.f(x,y,z,α)))=(Θ⋉β1)(λxyzα.f(x,y,z,α))
And we’re done! Second inequality demonstrated. ■
Corollary 4.5:Suppose that for any d∈D, z∈Γ1
and π:H→A s.t. (d,z)∈supp W(π), it holds that
dCzπ. That is, the observations W predicts to receive from
the computer are consistent with the chosen policy and W’s beliefs
about computations. Let L:D→R be a Cartesian loss function
and π:H→A a policy. Define ~L:D×Γ1→R
by ~L(h,z):=L(h). Then,(prelΓBr(ΘW)∩Cπfair)(Lphys)=W(π;~L)
To a large extent, this will follow the proof of the previous corollary. We’ll use β for 2Γ and α for just the policy component.
I’m going to be proceeding very cautiously here. First off, make our π value special by swapping it out for π∗(prelΓBr(ΘW)∩Cπ∗fair)(Lphys)
Now, by the identifications we made earlier, we can identify Γ with AH×Γ1, the space of policies and computations. Using that to unpack the function a little bit, we have
=(prelΓBr(ΘW)∩Cπ∗fair)(λπzβ.Lphys(π,z,β))
Now, we note that intersecting with top of a particular set is equivalent to updating on the indicator function for that set. Using definition 1.5 to unpack Cπ∗fair, we get
=(prelΓBr(ΘW))(λπzβ.χ∀h∈Hπ,z,β:Gπ,z(h)=π∗(h)Lphys(π,z,β))
Throughout we’ll be applying that Gπ,z(h) is “what would the agent do on h if the agent is copying the behavior of π” (remember, part of the math is “what does the agent do in response to this history” and π is our term for that chunk of the math), so we’ll just always rephrase things like that and won’t bother to say Gπ,z(h)=π(h) every time we do it.
=(prelΓBr(ΘW))(λπzβ.χ∀h∈Hπ,z,β:π(h)=π∗(h)Lphys(π,z,β))
Pull off the projection, and use d for a destiny in D.
=Br(ΘW)(λπzβd.χ∀h∈Hπ,z,β:π(h)=π∗(h)Lphys(π,z,β))
Applying definition 3.1, and using d′ instead of g to remember it’s a destiny, we have
=Br(ΘW)(λπzβd.χ∀h∈Hπ,z,β:π(h)=π∗(h)minha:ha∈Xπ,z,βmaxd′:ha⊑d′L(d′))
Next up is unpacking Xπ,z,β. Using definition 3.1, it’s
=Br(ΘW)(λπzβd.χ∀h∈Hπ,z,β:π(h)=π∗(h)minha:ha∈Hπ,z,β×A∧(∀(π′,z′)∈β:π′(h)=a)maxd′:ha⊑d′L(d′))
Now we’ll show that Hπ,z,β only depends on pr(β), ie, the projection of β from 2AH×Γ1 to 2AH, and that it gets smaller as β gets larger, so the function above is monotone.
Let’s use definition 1.5 to unpack the event h∈Hπ,z,β.
(∀h′a′⊏h,(π′,z′)∈β:π′(h′)=a′)∧(∃d′:h⊏d′∧d′Czπ)
It shouldn’t be too hard to tell that β getting larger makes this set smaller, and that, since the z′ doesn’t matter, this condition is really the same as
(∀h′a′⊏h,π′∈pr(β):π′(h′)=a′)∧(∃d′:h⊏d′∧d′Czπ)
So, we’ll use pr(β) to remind us that our function only depends on the projection of β.
=Br(ΘW)(λπzβd.χ∀h∈Hπ,z,pr(β):π(h)=π∗(h)minha:ha∈Hπ,z,pr(β)×A∧(∀π′∈pr(β):π′(h)=a)maxd′:ha⊑d′L(d′))
And now we can pull that out as a projection! We’re now using α for our set of policies, instead of β for our set of policy/computation pairs.
=pr(Br(ΘW))(λπzαd.χ∀h∈Hπ,z,α:π(h)=π∗(h)minha:ha∈Hπ,z,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′))
To proceed further, we’re going to need to adapt our previous result which gets an upper and lower bound on the bridge transform of suitable Θ. Our first order of business is checking that α getting larger makes the function larger, which is easy to check since α getting larger cuts down on the options to minimize over (increasing value), and makes the antecedent of the implication harder to fulfill, which makes the implication as a whole easier to fulfill, so the indicator function is 1 more often. Now we can proceed further. Abstracting away a bit from our specific function, which will be swapped out for some f:AH×Γ1×D×2AH→[0,1] which is monotone in that last argument, we have
(ΘW⋉β0)↓(λzπdβ.f(z,π,d,pr(β)))=(ΘW⋉β0)(λzπdβ.f(z,π,d,pr(β)))=ΘW(λzπd.f(z,π,d,pr(β0(z,d))))=ΘW(λzπd.f(z,π,d,pr({z}×Q−10(d))))=ΘW(λzπd.f(z,π,d,Q−10(d)))=ΘW(λzπd.f(z,π,d,pr(Z×Q−10(d))))=ΘW(λzπd.f(z,π,d,pr(β1(d))))=(ΘW⋉β1)(λzπdβ.f(z,π,d,pr(β)))=(ΘW⋉β1)↓(λzπdβ.f(z,π,d,pr(β)))
Monotonicity was used to go back and forth between the downwards closure and the raw form.β0 and β1 are as they were in Proposition 4.4, and Q would be the relation on AH×D telling you whether a policy is consistent with a destiny. Now, by Proposition 4.4, since the bridge transform of ΘW is sandwiched between those two values, and they’re both equal, we have
pr(Br(ΘW))(λzπdα.f(z,π,d,α))=Br(ΘW)(λzπdβ.f(z,π,d,pr(β)))=ΘW(λzπd.f(z,π,d,Q−10(d)))=(ΘW⋉Q−10)(λzπdα.f(z,π,d,α))
The first equality was just relocating the projection. the second equality was from the bridge transform being sandwiched between two equal quantities, so it equals all the stuff on our previous big list of equalities (we went with the middle one). Then just express as a semidirect product, and you’re done. Applying this to our previous point of
=pr(Br(ΘW))(λπzαd.χ∀h∈Hπ,z,α:π(h)=π∗(h)minha:ha∈Hπ,z,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′))
We can reexpress it as
=(ΘW⋉Q−10)(λπzαd.χ∀h∈Hπ,z,α:π(h)=π∗(h)minha:ha∈Hπ,z,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′))
And start unpacking the semidirect product
=ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)minha:ha∈Hπ,z,Q−10(d)×A∧(∀π′:dQ0π′→π′(h)=a)maxd′:ha⊑d′L(d′))
Now, we’re going to try to address that minimum, and show that the only ha that fulfill the conditions are exactly those ha⊑d. This requires showing that ha⊑d is a sufficient condition to fulfill the relevant properties, and then to show that ha⋢d implies a failure of one of the properties.
The proof for this is almost entirely identical to the corresponding proof in the non-turing-law case, there are no substantive differences besides one issue to clear up. We need to show that W(π) always being supported on the relation C, for all π (as one of our starting assumptions) implies that ΘW is supported on C as well. Here’s how we do it. We have ΘW=⊤AH∗W. And then this ultracontribution (by the definition of Γ1-PUCK’s) can be written as the convex hull of the union of a bunch of ultracontributions of the form ⊤AH∗w, where w is a Γ1-PoCK. So, if we can show all of these are supported on the relation C, then the same holds for the convex hull of their union, ie, ΘW. By Proposition 4.2, we can reexpress this ultracontribution as μw⋉⊤AH⋉ϕw, where w(π)=μw⋉(λz.ϕw(z,π)), for any policy π. Now, let’s check the expectation value of the indicator function for C being violated.
(μw⋉⊤AH⋉ϕw)(λzπd.χ¬dCzπ)=μw(λz.⊤AH(λπ.ϕw(z,π)(λd.χ¬dCzπ)))=μw(λz.maxπϕw(z,π)(λd.χ¬dCzπ))
Let’s assume that the expectation of that indicator function is \emph{not} zero. Then there must be some particular z in the support of μw where it is nonzero, and some particular π∗ that attains that nonzero expectation value. So, there’s a z in the support of μw and π∗ s.t.
ϕw(z,π∗)(λd.χ¬dCzπ∗)>0
and so this means that we have
μw(λz.ϕw(z,π∗)(λd.χ¬dCzπ∗))>0
Because that z is assigned nonzero measure, and then this reshuffles to
(μw⋉(λz.ϕw(z,π∗)))(λd.χ¬dCzπ∗)>0
Which, via Proposition 4.2, is
w(π∗)(λzd.χ¬dCzπ∗)>0
But this contradicts that W(π∗) (and so all the w(π∗)) were supported on the event dCzπ∗, so we have a contradiction, and our result follows.
Now that that issue is taken care of, we can swap out our fancy minimization with just minimizing over the ha⊑d.
=ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)minha:ha⊑dmaxd′:ha⊑d′L(d′))
This minimization is attained by selecting d itself. So then it turns into
=ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))
At this point, we’ll upper-and-lower-bound this quantity by W(π∗)(λzd.L(d)). Let’s begin.
W(π∗)(λzd.L(d))=maxw∈Πw(π∗)(λzd.L(d))=maxw∈Πμw(λz.ϕw(π∗,z)(λd.L(d)))
Here we’re using Proposition 4.2 on being able to split up Z-PoCK’s (which W is) a certain way.
=maxw∈Πμw(λz.maxπϕw(π∗,z)(λd.χdQ0πL(d)))
This equality happens because you can always just pick π∗ as your choice of π, and since ϕw(π∗,z) can only produce destinies consistent with π∗. Now, we can swap out ϕw for the measure we’re conditioning on an event
=maxw∈Πμw(λz.maxπ(χQ(π∗)ϖϕw,z)(λd.χdQ0πL(d)))=maxw∈Πμw(λz.maxπϖϕw,z(λd.χdQ0π∗χdQ0πL(d)))
Reshuffle the indicator function back in
=maxw∈Πμw(λz.maxπ(χQ(π)ϖϕw,z)(λd.χdQ0π∗L(d)))=maxw∈Πμw(λz.maxπϕw(π,z)(λd.χdQ0π∗L(d)))=maxw∈Πμw(λz.⊤AH(λπ.ϕw(π,z)(λd.χdQ0π∗L(d))))=maxw∈Π(μw⋉⊤AH⋉ϕw)(λπzd.χdQ0π∗L(d))))=maxw∈Π(⊤AH∗w)(λπzd.χdQ0π∗L(d))))=(⊤AH∗W)(λπzd.χdQ0π∗L(d))))=ΘW(λπzd.χdQ0π∗L(d))))
Now we must show that this is a looser constraint than what was previously in our indicator function to proceed further. So our next order of business is showing that, certainly,
∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)→dQ0π∗
Let d be an arbitrary destiny in the support of ΘW, and h be one of the history prefixes of d. The two conditions for h∈Hπ,z,Q−10(d) are fulfilled, because they are
∀h′,a′,π′:h′a′⊏h∧dQ0π′→π′(h′)=a′∃d′:h⊏d′∧d′Cπz
For the first condition, if h′a⊏h, then h′a′⊏d, and so if π′ is consistent with d, it must take the same action in response to h′, the action that d commands. So it is fulfilled. For the second condition, let d′ be d. h⊏d holds, and dCπz holds certainly, because ΘW is supported on C, as we’ve previously shown.
So, certainly, h⊏d→h∈Hπ,z,Q−10(d). Since we assumed our forall statement as prerequisite, this means that for all h⊏d, π(h)=π∗(h). And dQ0π means ∀ha⊑d:π(h)=a. Since π∗(h) mimics π(h) for all history prefixes of d, this means ∀ha⊑d:π∗(h)=a, ie dQ0π∗.
So, since this is a looser constraint, when we were previously at
=ΘW(λπzd.χdQ0π∗L(d))))
we can proceed further to
≥ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))
which is the value we wanted to sandwich. Proceed further with
=(⊤AH∗W)(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))=maxw∈Π(⊤AH∗w)(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))=maxw∈Π(μw⋉⊤AH⋉ϕw)(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))=maxw∈Πμw(λz.⊤AH(λπ.ϕw(π,z)(λd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))))=maxw∈Πμw(λz.maxπϕw(π,z)(λd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)))≥maxw∈Πμw(λz.ϕw(π∗,z)(λd.χ∀h∈Hπ∗,z,Q−10(d):π∗(h)=π∗(h)L(d)))=maxw∈Πμw(λz.ϕw(π∗,z)(λd.L(d)))=maxw∈Π(μw⋉(λz.ϕw(π∗,z)))(λzd.L(d)))=maxw∈Πw(π∗)(λzd.L(d)))=W(π∗)(λzd.L(d)))
And we’ve got the same upper and lower bound, so our overall quantity is
W(π∗)(~L)
And we’re done. ■
Infra-Bayesian physicalism: proofs part II
This post is an appendix to “Infra-Bayesian physicalism: a formal theory of naturalized induction”.
Proposition 2.16: Consider some Γ0, Γ1, Φ and Θ∈□c(Γ0×Γ1×Φ). Define π:elΓ0×Γ1→elΓ0 by π(y,z,α):=(y,{y′∈Γ0∣(y′,z)∈α}). Then, (π×idΦ)∗BrΓ0×Γ1(Θ)⊆prelΓ0×ΦBrΓ0(Θ)⊆BrΓ0(prΓ0×ΦΘ)
We can prove the second subset inclusion directly as a corollary of Proposition 2.10, just let the t of Proposition 2.10 be the projection function Γ1×Φ→Φ, so that just leaves the first subset inclusion direction. If you’ve seen the proofs so far, you know we do a thing where we try to show subset inclusion with expectations of functions and inequalities instead. And that the proofs all proceed by transforming the expectations until we get a maximum over contribution expectation values, and that’s always where the hard part of proving the inequalities shows up. So, let’s just get that part over with, an interested reader can work it out with previous proofs as a guide. Unusually, we’ll be keeping track of identity functions here.
Plugging in some f, and doing our usual activities to get every term into the appropriate form, we can get this result if we manage to show that maxθ′∈BrΓ0×Γ1(Θ)(π×idΦ)∗θ′(λy0α0x.f(y0,x,α0)) ≤maxθ∈BrΓ0(Θ)prelΓ0×Φθ(λy0α0x.f(y0,x,α0)) So, to establish this, we’ll show that, given some θ′∈BrΓ0×Γ1(Θ), we have (π×idΓ1×Φ)∗(θ′)∈BrΓ0(Θ), and that prelΓ0×Φ((π×idΓ1×Φ)∗θ′)=(π×idΦ)∗θ′ because, if we show that, then it means that BrΓ0(Θ) is a rich enough set for the right-hand side of the equation to match anything the left-hand-side can put out.
First off, prelΓ0×Φ((π×idΓ1×Φ)∗θ′)=(π×idΦ)∗θ′ is pretty trivial to show. The only difference between the two processes is that the Γ1 coordinate of θ′ is discarded immediately on the right-hand-side, and it’s preserved for one step and then discarded on the second step for the left-hand-side.
Now for our inequality of interest. Let θ′∈BrΓ0×Γ1(Θ), and we’re trying to show that (π×idΓ1×Φ)∗(θ′)∈BrΓ0(Θ) First off, showing the support condition for (π×idΓ1×Φ)∗(θ′) which is somewhat nontrivial this time around. We start off with a guarantee that (y0,y1)∈α. This happens iff y0∈{y′0|(y′0,y1)∈α}=π(y0,y1,α)2Γ0 And so, we get that y0∈α0 is guaranteed for that pushforward, support condition established.
endofunction condition time. It’s important to remember that we want to treat elΓ0 as the computation side of things, and Γ1×Φ as the environment side of things, for our bridge transform we’re working with.s:Γ0→Γ0 and g:Γ0×Γ1×Φ→[0,1]. Begin. (π×idΓ1×Φ)∗θ′(λy0y1α0x.χs(y0)∈α0g(s(y0),y1,x)) =θ′(λy0y1αx.χs(y0)∈π(y0,α,y1)2Γ0g(s(y0),y1,x)) Let’s unpack precisely what that set is. =θ′(λy0y1αx.χs(y0)∈{y′0|(y′0,y1)∈α}g(s(y0),y1,x)) =θ′(λy0y1αx.χ(s(y0),y1)∈αg(s(y0),y1,x)) And we can rewrite the endofunction a little bit =θ′(λy0y1αx.χ(s×idΓ1)(y0,y1)∈αg((s×idΓ1)(y0,y1),x)) And finally apply our endofunction condition, since we’ve now got the function in a form that’s treating y0,y1 as part of the computational universe… ≤Θ(λy0y1x.g(y0,y1,x)) And we’re done, this establishes our desired result. ■
Proposition 2.17: Br(Θ) is a continuous function of Θ.
The way this proof will work is by describing a composition of functions that makes Br(Θ) from Θ, and then showing that each of these functions is continuous, if elΓ×Φ is a finite set.
Claim: The bridge transform of some Θ is equal to (using χelΓ to denote restricting an ultradistribution to the event y∈α and χ−1elΓ to denote the inverse of said function, mapping an ultradistribution on elΓ to the largest ultradistribution that could have produced it via restriction) χelΓ(⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ))))) Breaking down the unfamilar notation, the type of pr is elΓ×Φ→Γ×Φ, just the usual projection. That asterisk up top is pullback along that function. The type of ι is elΓ×Φ→Γ×2Γ×Φ. And s∗ is pullback along the function Γ×2Γ×Φ→Γ×2Γ×Φ given by (s,id2Γ,idΦ).
Let’s unpack the exact conditions that cause a θ to lie in the set χelΓ(⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ))))) First off, a θ is in this set iff it is supported over the event y∈α, and it lies in the set ⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ)))) Which occurs iff θ is supported over the event y∈α, and for all s:Γ→Γ, θ lies in the set s∗(χ−1elΓ(ι∗(pr∗(Θ)))) Which occurs iff θ is suported over the event y∈α, and for all s:Γ→Γ, s∗(θ) lies in the set χ−1elΓ(ι∗(pr∗(Θ))) Which occurs iff θ is supported over the event y∈α, and for all s:Γ→Γ, χelΓ(s∗(θ)) lies in the set ι∗(pr∗(Θ))
Now, ι is just doing a little bit of type conversion, so we’re justified in ignoring it. Anways, the previous thing occurs iff θ is supported over the event y∈α, and for all s:Γ→Γ, pr∗(χelΓ(s∗(θ)))∈Θ.
Which happens iff θ is supported over the event y∈α and for all s:Γ→Γ and g:Γ×Φ→[0,1], pr∗(χelΓ(s∗(θ)))(λyx.g(y,x))≤Θ(λyx.g(y,x)) However, unpacking the left-hand side, we get pr∗(χelΓ(s∗(θ)))(λyx.g(y,x)) =χelΓ(s∗(θ))(λyαx.g(y,x)) =s∗(θ)(λyαx.χy∈αg(y,x)) =θ(λyαx.χs(y)∈αg(s(y),x)) Which is the exact condition for θ to lie in the bridge transform. So, we have an equivalence.
Now, since we’ve phrased the bridge transform as χelΓ(⋂s:Γ→Γs∗(χ−1elΓ(ι∗(pr∗(Θ))))) We just need to establish that when all the sets are finite, then pullbacks are continuous, pushforwards are continuous, un-restrictions are continuous, intersections are continuous, and restrictions are continuous. Then, this would just be a particularly fancy continuous function, and accordingly, if Θn limited to Θ, then Br(Θn) would limit to Br(Θ).
Let’s establish that when the sets are finite, pullbacks are continuous. Let g:X→Y, and Y and X be finite sets, and ψ∈□Y. Then, we have g∗(ψ)(λx.f(x)):=ψ(λy.maxx∈g−1(y)f(x)) With the convention that maximizing over the empty set produces a value of 0. That is an alternate phrasing of pullback. We can then go limn→∞d(g∗(ψn),g∗(ψ))=limn→∞supf:X→[0,1]|g∗(ψn)(f)−g∗(ψ)(f)| =limn→∞supf:X→[0,1]|ψn(λy.maxx∈g−1(y)f(x))−ψ(λy.maxx∈g−1(y)f(x))| ≤limn→∞suph:Y→[0,1]|ψn(h)−ψ(h)|=limn→∞d(ψn,ψ)=0 Admittedly, this isn’t quite what our usual modified KR metric usually looks like. The reason we can do this is because we’re just dealing with functions in [0,1], so the norm part of the modified KR metric doesn’t matter, and since our sets are finite, we can say that all points are distance 1 from each other, so all functions are 1-Lipschitz, and then the two metrics coincide. So, pullback along any function is continuous.
For pushforward, it’s easy because, if ψ∈□X, then we’ve got limn→∞d(g∗(ψn),g∗(ψ))=limn→∞suph:Y→[0,1]|g∗(ψn)(h)−g∗(ψ)(h)| =limn→∞suph:Y→[0,1]|ψn(λx.h(g(x)))−ψ(λx.h(g(x)))| ≤limn→∞supf:X→[0,1]|ψn(f)−ψ(f)|=limn→∞d(ψn,ψ)=0 For showing restrictions continuous, for the set E⊆X that we’re updating on, limn→∞d(χE(ψn),χE(ψ))=limn→∞supf:X→[0,1]|χE(ψn)(f)−χE(ψ)(f)| =limn→∞supf:X→[0,1]|ψn(χx∈Ef(x))−ψn(χx∈Ef(x))| ≤limn→∞supf:X→[0,1]|ψn(f)−ψ(f)|=limn→∞d(ψn,ψ)=0 For intersections… that will take a bit more work. We’ll have to use the equivalent formulation of closeness, that ψn limits to ψ iff the Hausdorff distance between the corresponding sets (according to the generalized KR measure) limits to 0. So, our task is to assume that ψn limits to ψ, and ϕn limits to ϕ, and show that ψn∩ϕn limits to ψ∩ϕ. The bound we’ll manage to prove is that d(ψn∩ϕn,ψ∩ϕ)≤|X|max(d(ψn,ψ),d(ϕn,ϕ)) Where |X| is the number of elements in the finite set X. Here’s the basic argument. For any particular point in the set ψn, there’s a nearby point in ψ (since the Hausdorff distance is low) with only ϵ measure moved around or deleted. So, in particular, if all the measure moved or deleted was just deleted from ψn instead, then that resulting contribution would be below the nearby contribution in ψ that we picked, and so it would lie in ψ as well due to downwards closure.
So, in particular, if ψn and ψ only have a Hausdorff distance of ϵ, then, taking any contribution in ψn and subtracting ϵ measure from \emph{all points} (if possible, if not, just remove measure till you’re at 0) is \emph{guaranteed} to make a point in ψ, and vice-versa.
And a corollary of that is that, given any contribution in ψn∩ϕn, the “subtract max(d(ψn,ψ),d(ϕn,ϕ)) measure from each point” contribution is in ψ, also in ϕ, and at a maximum distance of |X|max(d(ψn,ψ),d(ϕn,ϕ)) from the original contribution. And this argument can be reversed to show that the limit of the intersections is the intersection of the limits (because hausdorff distance between the two goes to 0), so we do in fact have intersection being continuous.
And that just leaves un-restricting. Again, this will take a Hausdorff-distance argument. Fixing some contribution in χ−1E(ψn), it can be broken down into an on-E part θn,E, and an off-E part θn,¬E. When you restrict to E, then θn,E∈ψn. Since ψn is within ϵ of ψ, there’s some θE∈ψ that’s within ϵ of θn,E. Then, let your point in χ−1E(ψ) be θE+θn,¬E (if there’s slightly more than 1 measure there, delete ϵ measure from θn,¬E, or all the measure if there’s less than ϵ present). It’s close to θn,E+θn,¬E because θn,E is close to θE, the other component of it is unchanged, and maybe we deleted a little bit of excess measure which didn’t do much.
This line of argument shows that ψn being close to the limit ψ is sufficient to establish that the un-restriction of the two of them are comparably close together. So we have continuity for that, which is the last thing we needed.
Since we wrote the bridge transform as a sequence of continuous functions, we know it’s continuous (as long as all the involved sets are finite) ■
Proposition 3.1: Let X be a finite poset, f:X→R and Θ∈□cX downward closed. Define fmax:X→R by fmax(x):=maxy≤xf(y). Observe that fmax is always non-decreasing. Then, Θ(f)=Θ(fmax).
Proof: Pick a θ′∈Θ s.t. θ′(fmax)=Θ(fmax). Ie, a maximizing contribution. Let k:X→X be defined as k:=λx.argmaxy≤xf(y). Ie, it moves a point down to somewhere below it where it can attain the highest value according to f. Now, consider k∗(θ′). It’s present in Θ because Θ was, by assumption, downwards closed, and we just moved all the measure down.
Now, we have Θ(f)=maxθ∈Θθ(f)≥k∗(θ′)(f)=θ′(λx.f(k(x)))=θ′(λx.f(argmaxy≤xf(y))) =θ′(λx.maxy≤xf(y))=θ′(fmax)=Θ(fmax)≥Θ(f) And so, all inequalities must be equalities, proving that Θ(fmax)≥Θ(f). In order, the connectives were: unpacking definitions, using downward closure to conclude that k∗(θ′)∈Θ, unpacking pushforwards, unpacking the definition of k, using that applying a function to the argmax of inputs to that function just makes the max of the function, folding the definition of fmax back up, using that θ′ was selected to maximize fmax, and applying monotonicity. Done! ■
Proposition 4.1: Consider some Γ, Φ, a relation Q⊆Γ×Φ and a PUCK Ξ over Q. Let Θ:=⊤Γ⋉Ξ. Then, Br(Θ)=[⊤Γ⋉(susΘ⋊Ξ)]↓=[⊤Γ⋉(Q−1⋊Ξ)]↓
First off, I’m not terribly picky about variable ordering, so I’ll just write our desired proof target as Br(Θ)=[⊤Γ⋉Ξ⋉susΘ]↓=[⊤Γ⋉Ξ⋉Q−1]↓ The way we’ll do this is by establishing the following result. For all monotone f′:elΓ×Φ→[0,1], we have Br(Θ)(f′)≤[⊤Γ⋉Ξ⋉susΘ](f′)≤[⊤Γ⋉Ξ⋉Q−1](f′)≤Br(Θ)(f′) Why does that suffice? Well, assume hypothetically that the result held. Since the inequalities go in a circle, we have equality for all monotone functions. And then, for some non-monotone function f, we can go Br(Θ)(f)=Br(Θ)(fmax)=[⊤Γ⋉Ξ⋉susΘ](fmax) =[⊤Γ⋉Ξ⋉susΘ]↓(fmax)=[⊤Γ⋉Ξ⋉susΘ]↓(f) and swap out susΘ for Q−1 to show the other equality, and then we’d have equality of the three ultradistributions on all functions, so they’re equal.
For the equalities in the above equation, the first one arose because of Proposition 2.4 (bridge transforms are always downwards closed) and Proposition 3.1 (downwards-closed things let you swap out f for fmax and it doesn’t affect the value). The second equality arose because fmax is a monotone function and by assumption, we have equality for monotone functions. The third equality would arise because taking the downwards closure doesn’t affect the expectation value of monotone functions. If you add a bunch of contributions made by measure flowing down, that’s just strictly worse from the perspective of a monotone function and doesn’t change expectation value. And the fourth equality arises from Proposition 3.1 again.
So, we just need to prove the following three inequalities, for monotone functions f. Br(Θ)(f)≤[⊤Γ⋉Ξ⋉susΘ])(f)≤[⊤Γ⋉Ξ⋉Q−1](f)≤Br(Θ)(f) The first one is easily addressable by Proposition 2.7. By proposition 2.7 and the definition of Θ, we have Br(Θ)⊆(Θ⋉susΘ)↓=[⊤Γ⋉Ξ⋉susΘ]↓ And so, for monotone functions f, we have Br(Θ)(f)≤[⊤Γ⋉Ξ⋉susΘ])(f) Done.
Now to show our second inequality. (⊤Γ⋉Ξ⋉susΘ)(λyαx.f(y,x,α)) =(⊤Γ⋉Ξ)(λyx.δsusΘ(x)(λα.f(y,x,α))) =(⊤Γ⋉Ξ)(λyx.f(y,x,susΘ(x))) Unpack the definition of the set =(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|(y′,x)∈supp Θ})) Unpack the definition of Θ =(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|(y′,x)∈supp ⊤Γ⋉Ξ})) The condition (y′,x)∈supp ⊤Γ⋉Ξ is equivalent to x∈supp Ξ(y′). After all, if x∈supp Ξ(y′), the distribution δy′ lies in ⊤Γ, so δy′⋉Ξ would certify that (y′,x)∈supp ⊤Γ⋉Ξ. And if x∉supp Ξ(y′), then no matter the distribution in ⊤Γ or kernel selected from Ξ, if y′ gets picked, then the kernel selected from Ξ isn’t going to be making x along with it. Since we have that iff characterization, we have =(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈supp Ξ(y′)})) Ξ(y′) is the union of a bunch of k(y′) for k∈Π (and convex hull), so its support is equal to the union of the supports for the k(y′). =(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈⋃k∈Πsupp k(y′)})) Then, since each k is a PoCK over Q, k(y′) is the restriction of some measure ϖk to the set Q(y), which will be written as χQ(y′)ϖk. =(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈⋃k∈Πsupp (χQ(y′)ϖk)})) And now we’re about to get an inequality.f is monotone, so making the associated set bigger (easier to fulfill the defining condition) should always increase the value of f, and by monotonicity, increase the expectation value, so we get ≤(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|x∈Q(y′)})) Then restate =(⊤Γ⋉Ξ)(λyx.f(y,x,{y′|(x,y′)∈Q})) =(⊤Γ⋉Ξ)(λyx.f(y,x,Q−1(x))) And pack back up as a semidirect product. =(⊤Γ⋉Ξ)(λyx.δQ−1(x)(λα.f(y,x,α))) =(⊤Γ⋉Ξ⋉Q−1)(λyαx.f(y,x,α)) And we have our second ≤ inequality established!
Now, onto the third inequality. (⊤Γ⋉Ξ⋉Q−1)(λyαx.f(y,x,α)) Unpack the semidirect products =⊤Γ(λy.Ξ(y)(λx.δQ−1(x)(λα.f(y,x,α)))) And what top means =maxy∈ΓΞ(y)(λx.δQ−1(x)(λα.f(y,x,α))) And as for Ξ… well, each Ξ(y) is the convex hull of the various k(y), for k∈Π. So, the expectation for Ξ(y) is the maximum expectation for the various k(y), so we can rewrite as =maxy∈Γmaxk∈Πk(y)(λx.δQ−1(x)(λα.f(y,x,α))) Pick a particular y∗ and k∗ that attain the maximal value =k∗(y∗)(λx.δQ−1(x)(λα.f(y∗,x,α))) Reexpress a little bit =δy∗(λy.k∗(y)(λx.δQ−1(x)(λα.f(y,x,α))) And pack this back up as a semidirect product =(δy∗⋉k∗⋉Q−1)(λyαx.f(y,x,α)) And then we’ll be showing that this contribution lies in Br(Θ). Once we’ve done that, we can go ≤maxθ′∈Br(Θ)θ′(λyαx.f(y,x,α)) =Br(Θ)(λyαx.f(y,x,α)) And we’d be done, having proven the third inequality and the last one to finish up the proof. So, now our proof target switches to showing that (δy∗⋉k∗⋉Q−1)∈Br(Θ). We can show this if we show the support condition and the endofunction condition.
For the support condition, we have (δy∗⋉k∗⋉Q−1)(λyαx.χy∉α) =δy∗(λy.k∗(y)(λx.δQ−1(x)(λα.χy∉α))) =δy∗(λy.k∗(y)(λx.χy∉Q−1(x))) =k∗(y∗)(λx.χy∗∉Q−1(x)) And then we use that the k∗(y∗) are all of the form “take this measure, restrict it to Q(y∗)”, to get =(χQ(y∗)ϖk∗)(λx.χy∗∉Q−1(x)) =ϖk∗(λx.χx∈Q(y∗)χy∗∉Q−1(x)) Unpacking the definitions, we get =ϖk∗(λx.χ(x,y∗)∈Qχ(x,y∗)∉Q)=0 And so, this contribution is indeed supported on (y,α) pairs s.t. y∈α.
Now for the endofunction condition. As usual, fix an s and a g. (δy∗⋉k∗⋉Q−1)(λyαx.χs(y)∈αg(s(y),x)) Unpack the semidirect product =δy∗(λy.k∗(y)(λx.δQ−1(x)(λα.χs(y)∈αg(s(y),x)))) Plug in the dirac-deltas =k∗(y∗)(λx.χs(y∗)∈Q−1(x)g(s(y∗),x)) Reexpress the set membership criterion a bit =k∗(y∗)(λx.χx∈Q(s(y∗))g(s(y∗),x)) And the contribution at the start =(χQ(y∗)ϖk∗)(λx.χx∈Q(s(y∗))g(s(y∗),x)) Distribute it in as an indicator function. =ϖk∗(λx.χx∈Q(y∗)χx∈Q(s(y∗))g(s(y∗),x)) Pull the other indicator function out. =(χQ(s(y∗))ϖk∗)(λx.χx∈Q(y∗)g(s(y∗),x)) Rewrite with k∗ =k∗(s(y∗))(λx.χx∈Q(y∗)g(s(y∗),x)) Use an inequality to get rid of the indicator function ≤k∗(s(y∗))(λx.g(s(y∗),x)) Rewrite it a bit =δs(y∗)(λy.k∗(y)(λx.g(y,x))) Swap out k∗(y) for Ξ(y), the latter is larger ≤δs(y∗)(λy.Ξ(y)(λx.g(y,x))) Swap out δs(y∗) for ⊤Γ, the latter is larger ≤⊤Γ(λy.Ξ(y)(λx.g(y,x))) =(⊤Γ⋉Ξ)(λyx.g(y,x)) Abbreviate =Θ(λyx.g(y,x)) And bam, endofunction condition is shown, the entire proof goes through now. ■
Corollary 4.3: Suppose that for any d∈D and π:H→A s.t. d∈supp W(π), it holds that dCπ. That is, the observations W predicts to receive from the computer are consistent with the chosen policy. Let L:D→R be a Cartesian loss function and π:H→A a policy. Then, (prelΓBr(ΘW)∩Cπfair)(Lphys)=W(π;L)
I’m going to be proceeding very cautiously here. First off, make our π value visually distinct by swapping it out for π∗ (prelΓBr(ΘW)∩Cπ∗fair)(Lphys) Now, by the identifications we made earlier, we can identify Γ with AH, the space of policies. Using that to unpack the function a little bit, we have =(prelΓBr(ΘW)∩Cπ∗fair)(λπα.Lphys(π,α)) Now, we note that intersecting with top of a particular set is equivalent to updating on the indicator function for that set. Using definition 1.5 to unpack Cπ∗fair, we get =(prelΓBr(ΘW))(λπα.χ∀h∈Hπ,α:Gπ(h)=π∗(h)Lphys(π,α)) Apply that Gπ(h) is “what would the agent do on h if the agent is copying the behavior of π”, so we can rephrase as: =(prelΓBr(ΘW))(λπα.χ∀h∈Hπ,α:π(h)=π∗(h)Lphys(π,α)) Pull off the projection, and use d for a destiny in D. =Br(ΘW)(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)Lphys(π,α)) At this point, we use that ΘW:=⊤Γ⋉W, and that W is a PUCK over Q0 and Proposition 4.1 to go =[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)Lphys(π,α)) Before we can remove the downwards closure, we’ll want to verify the function is monotone. So, we’ll want to start unpacking the physicalist loss next. Applying definition 3.1, and using d′ instead of g to remember it’s a destiny, we have =[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h)minha:ha∈Xπ,αmaxd′:ha⊑d′L(d′)) Next up is unpacking Xπ,α. Using definition 3.1, it’s =[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h) minha:ha∈Hπ,α×A∧(∀π′∈α:Gπ′(h)=a)maxd′′:ha⊑d′L(d′)) At this point, we can, again, treat Gπ′(h) the same as π′(h). =[⊤Γ⋉W⋉Q−10]↓(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h) minha:ha∈Hπ,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′′L(d′)) And now we need to take a moment to show that Hπ,α gets smaller when α gets larger. Applying definition 1.5, the event h∈Hπ,α unpacks as (∀h′a⊏h,π′∈α:Gπ′(h′)=a)∧(∃d′:h⊏d′∧d′Cπ) Now, if α becomes a larger set, then it gets harder for the first condition to be fulfilled, so the set Hπ,α shrinks. Now, since this happens, it means that if α gets bigger, it gets more difficult for the prerequisite of the implication in the indicator function to be fulfilled, so the implication is more likely to hold. Further, the minimization is taking place over a smaller set, so the loss goes up. So our function is monotone in α, and we can remove the downwards closure. =(⊤Γ⋉W⋉Q−10)(λπαd.χ∀h∈Hπ,α:π(h)=π∗(h) minha:ha∈Hπ,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′′L(d′)) Unpacking the semidirect product, it is =⊤Γ(λπ.W(π)(λd.δQ−10(d)(λα.χ∀h∈Hπ,α:π(h)=π∗(h) minha:ha∈Hπ,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′)))) Substituting in the dirac-delta everywhere that α is, we get =⊤Γ(λπ.W(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h) minha:ha∈Hπ,Q−10(d)×A∧(∀π′∈Q−10(d):π′(h)=a)maxd′:ha⊑d′L(d′))) Now, Q−10(d) is the set of policies π′ s.t. π′Q0d. The “this policy is consistent with this destiny” relation. Also let’s swap out ⊤Γ for maximization =maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h) minha:ha∈Hπ,Q−10(d)×A∧(∀π′Q0d:π′(h)=a)maxd′:ha⊑d′L(d′)) Now, we’re going to try to address that minimum, and show that the only ha that fulfill the conditions are exactly those ha⊑d. This requires showing that ha⊑d is a sufficient condition to fulfill the relevant properties, and then to show that ha⋢d implies a failure of one of the properties.
So, first up. Assume ha⊑d. Then, for any π′, dQ0π′ and ha⊑d \emph{must} imply that π′(h)=a, that’s what policy consistency means. Also, h∈Hπ,Q−10(d) unpacks as the two conditions ∀h′a′,π′:h′a′⊏h∧dQ0π′→π′(h′)=a′ ∃d′:h⊏d′∧d′Cπ As for the first condition,clearly, if π′ is consistent with d, it’s consistent with ha because ha⊑d, and so it must be consistent with any prefix of ha, so the first condition holds.
For the second condition, d is a valid choice, because we assumed ha⊑d, and dCπ occurs always, because W(π) always being supported on d s.t.dCπ was one of our problem assumptions.
So, we have one implication direction down. Now for the reverse implication direction. Assume ha⋢d. Then there are two possibilities. the first possibility is that ha first diverges from d on an observation. The second possibility is that ha first diverges from d on an action.
For the first possibility, it’s possible to make two policies which are consistent with d but also differ in their actions on history h, because h isn’t a prefix of d if ha first differs from d on an observation.
For the second possibility, it’s ruled out by either the condition for h∈Hπ,Q−10(d) that goes ∀h′a′,π′:h′a′⊏h∧π′Q0d→π′(h′)=a′ or the extra condition that ∀π′:π′Q0d→π′(h)=a applied to the first a-history prefix that deviates from d, because π′Q0d implies that π′(h′) must be the action which d dictates, not the action a′ that deviates from d.
And that establishes the other direction of the iff statement.
Thus, we can swap out our fancy minimization with just minimizing over the ha⊑d. =maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h) minha:ha⊑dmaxd′:ha⊑d′L(d′)) This minimization is attained by selecting d itself. So then it turns into =maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)L(d)) At this point, what we’ll do is show that an upper bound and lower bound on the value of this term is the same. Going from upper bound to lower bound, it’s starting out with W(π∗)(λd.L(d)) At this point, we’ll use that W is a PUCK, so there’s a set E of environments e (PoCK’s) that W is generated from, so we can go: =maxe∈Ee(π∗)(λd.L(d)) =maxπmaxe∈Ee(π∗)(λd.χdQ0πL(d)) =maxπmaxe∈E(χQ0(π∗)ϖe)(λd.χdQ0πL(d)) =maxπmaxe∈Eϖe(λd.χdQ0π∗χdQ0πL(d)) Now pull the indicator function back out. =maxπmaxe∈E(χQ0(π)ϖe)(λd.χdQ0π∗L(d)) =maxπmaxe∈Ee(π)(λd.χdQ0π∗L(d)) =maxπW(π)(λd.χdQ0π∗L(d)) Now we must show that this is a looser constraint than what was previously in our indicator function to proceed further. So our next order of business is showing that, certainly, ∀h∈Hπ,Q−10(d):π(h)=π∗(h)→dQ0π∗ Let h be one of the history prefixes of some d in the support of W(π). The two conditions for h∈Hπ,Q−10(d) are fulfilled, because they are ∀h′,a′,π′:h′a′⊏h∧dQ0π′→π′(h′)=a′ ∃d′:h⊏d′∧d′Cπ For the first condition, if h′a′⊏h, then h′a′⊏d, and so if π′ is consistent with d, it must take the same action in response to h′, the action that d commands, a′. So that’s fulfilled. For the second condition, let d′ be d. h⊏d holds, and so dCπ holds certainly, because W(π) is supported on d s.t. dCπ.
So, for all d in the support of W(π), h⊏d→h∈Hπ,Q−10(d). Since we assumed our forall statement as prerequisite, this means that for all h⊏d, π(h)=π∗(h). And dQ0π means ∀ha⊑d:π(h)=a. Since π∗(h) mimics π(h) for all history prefixes of d, this means ∀ha⊑d:π∗(h)=a, ie dQ0π∗.
So, since this is a looser constraint, when we were previously at =maxπW(π)(λd.χdQ0π∗L(d)) we can proceed further to ≥maxπW(π)(λd.χ∀h∈Hπ,Q−10(d):π(h)=π∗(h)L(d)) Which is our value we’re trying to sandwich. Now, at this point, plug in π∗ and get ≥W(π∗)(λd.χ∀h∈Hπ∗,Q−10(d):π∗(h)=π∗(h)L(d)) =W(π∗)(λd.L(d)) And bam, we’ve sandwiched our term between W(π∗)(L) on both sides, and so the result follows. ■
Proposition 4.2: Let X, Y and Z be finite sets, Q⊆Y×X a relation, κ:Y→Δc(X×Z) a Z-PoCK over Q and Θ∈□cY. Then, there exist μ∈ΔcZ and ϕ:Z×Y→ΔcX s.t. for all z, λy.ϕ(z,y) is a PoCK over Q, and for all y∈Y, κ(y)=(λz.ϕ(z,y))⋊μ. Moreover, suppose that (μ1,ϕ1) and (μ2,ϕ2) are both as above. Then, μ1⋉Θ⋉ϕ1=μ2⋉Θ⋉ϕ2
Our first order of business is establishing that there’s even a μ and ϕ that has those effects at all. Here’s a way to define them. μ(z):=maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) Where ϖκ is the measure on Z×X that κ is associated with, ie, κ(y)=χQ(y)ϖκ must be true for some ϖκ because κ is a Z-PoCK over Q. And, ϕ will be defined as: ϕ(y,z)(x):=χx∈Q(y)ϖκ(z,x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) With those definitions in place, it’s easy to establish that μ⋉(λz.ϕ(y,z))=κ(y). We can just fix an arbitrary x,z pair and go κ(y)(x,z)=χx∈Q(y)ϖκ(z,x)=maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)⋅χx∈Q(y)ϖκ(z,x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) =μ(z)⋅ϕ(y,z)(x)=(μ⋉(λz′.ϕ(y,z′)))(x,z) And we’re done with showing that such functions exist in the first place. Well, as long as we check that μ and ϕ behave accordingly. First off, μ being a contribution follows from ϖκ being a Z-polycontribution, and the definition of Z-polycontributions. Also, to show that (λy.ϕ(y,z)) is a PoCK over Q, we need to show that there’s a ϖϕ,z s.t. ϕ(y,z)=χQ(y)ϖϕ,z, and that always has 1 or less measure.
In order to do this, define ϖϕ,z:=1maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)prX(χ{z}×Xϖκ) Clearly, you get ϕ(y,z) from restricting this to Q(y), because we have (χQ(y)ϖϕ,z)(x)=1maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)χQ(y)(prX(χ{z}×Xϖκ))(x) =χQ(y)(prX(χ{z}×Xϖκ))(x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=χx∈Q(y)prX(χ{z}×Xϖκ)(x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) =χx∈Q(y)∑z′(χ{z}×Xϖκ)(x,z′)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=χx∈Q(y)∑z′χz′=zϖκ(x,z′)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) =χx∈Q(y)ϖκ(x,z)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′)=ϕ(y,z)(x) And we’re done. And also, the measure is ≤1, because ∑x∈Q(y)ϖϕ,z(x)=∑x∈Q(y)prX(χ{z}×Xϖκ)(x)maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) and, skipping over a few routine steps, =∑x∈Q(y)ϖκ(x,z)maxy′∈Y∑x′∈Q(y′)ϖκ(x′,z) ≤∑x∈Q(y)ϖκ(x,z)∑x′∈Q(y)ϖκ(x′,z)=1 And we’re done, we figured out how to decompose κ into μ and ϕ.
Now for the second half of the proof. The first thing to establish is that, for all y,z, we have μ1(z)⋅ϕ1(y,z)=μ2(z)⋅ϕ2(y,z). This occurs because, for all x, μ1(z)⋅ϕ1(y,z)(x)=(μ1⋉(λz.ϕ1(y,z)))(x,z)=κ(y)(x,z) And then by symmetry, the exact same holds for μ2 and ϕ2, both were declared to be equal to κ. Now that this result is in place, we can begin. (μ1⋉Θ⋉ϕ1)(λxyz.f(x,y,z)) =μ1(λz.Θ(λy.ϕ1(y,z)(λx.f(x,y,z)))) Now, we do something odd. We can reexpress this as =C(λz.μ1(z)⋅Θ(λy.ϕ1(y)(λx.f(x,y,z)))) Basically, what’s going on here is that we can swap out the contribution μ1 for the counting measure C (1 measure on each distinct point) and just scale down the expectation values accordingly. It’s pretty much the same way that you can think of ∑xμ(x)f(x) (expectation of f w.r.t μ) as ∑x1⋅μ(x)f(x) (expectation of μ⋅f w.r.t the counting measure). Now, since Θ is homogenous, we can move constants in or out of it, to get =C(λz.Θ(λy.μ1(z)⋅ϕ1(y,z)(λx.f(x,y,z)))) Now, at this point, we can use that μ1(z)⋅ϕ1(y,z)=μ2(z)⋅ϕ2(y,z), to get =C(λz.Θ(λy.μ2(z)⋅ϕ2(y,z)(λx.f(x,y,z)))) And just back up and reverse everything. =C(λz.μ2(z)Θ(λy.ϕ2(y,z)(λx.f(x,y,z)))) =μ2(λz.Θ(λy.ϕ2(y)(λx.f(x,y,z)))) =(μ2⋉Θ⋉ϕ2)(λxyz.f(x,y,z)) And we’re done! ■
Lemma 4: Let X, Y and Z be finite sets, Q⊆Y×X a relation, Ξ1,Ξ2:Y→□c(X×Z) Z-PUCKs over Q, Θ∈□cY and p∈[0,1]. Then, pΞ1+(1−p)Ξ2 is also a Z-PUCK over Q, and Θ∗(pΞ1+(1−p)Ξ2)⊆p(Θ∗Ξ1)+(1−p)(Θ∗Ξ2)
Our first order of business is establishing that the mix of Z-PUCK’s over Q is a Z-PUCK over Q. Here’s what we’ll do. We’ll define a family of kernels, show that they’re all Z-PoCK’s, and that said family makes a Z-PUCK that’s equal to the mix of Ξ1 and Ξ2.
So, let Π1 be the set of Z-PoCK’s associated with Ξ1, and Π2 be the set of Z-PoCK’s associated with Ξ2. Elements of these sets are κ1 and κ2. Define Π as {pκ1+(1−p)κ2|κ1∈Π1,κ2∈Π2}.
By Definition 4.5, in order to establish that these are Z-PoCK’s over Q, we need to make an appropriate choice of ϖ. In particular, the ϖ associated with κ=pκ1+(1−p)κ2 is ϖκ:=pϖκ1+(1−p)ϖκ2. It fufills definition 4.5 because κ(y)(x,z)=(pκ1+(1−p)κ2)(y)(x,z)=pκ1(y)(x,z)+(1−p)κ2(y)(x,z) =p(χQ(y)ϖκ1)(x,z)+(1−p)(χQ(y)ϖκ2)(x,z)=(pχQ(y)ϖκ1+(1−p)χQ(y)ϖκ2)(x,z) =χQ(y)(pϖκ1+(1−p)ϖκ2)(x,z)=χQ(y)ϖκ(x,z) By unpacking our definition, using how mixes of kernels work, applying definition 4.5 for κ1 and κ2, and then just doing some simple regrouping and packing the definition back up, we get our result.
But wait, we still need to show that ϖκ is a Z-Polycontribution on Q. Again, this isn’t too hard to show, with Definition 4.4. ∑z∈Zmaxy∈Y∑x∈Q(y)ϖκ(x,z)=∑z∈Zmaxy∈Y∑x∈Q(y)(pϖκ1+(1−p)ϖκ2)(x,z) =∑z∈Zmaxy∈Y∑x∈Q(y)(pϖκ1(x,z)+(1−p)ϖκ2(x,z)) =∑z∈Zmaxy∈Y⎛⎝p∑x∈Q(y)ϖκ1(x,z)+(1−p)∑x∈Q(y)ϖκ2(x,z)⎞⎠ ≤∑z∈Z⎛⎝pmaxy∈Y∑x∈Q(y)ϖκ1(x,z)+(1−p)maxy∈Y∑x∈Q(y)ϖκ2(x,z)⎞⎠ =p∑z∈Zmaxy∈Y∑x∈Q(y)ϖκ1(x,z)+(1−p)∑z∈Zmaxy∈Y∑x∈Q(y)ϖκ2(x,z)≤p⋅1+(1−p)⋅1=1 And bam, we have our inequality demonstrated, everything works out. Now we just need to show that this family of Z-PoCK’s makes the Z-PUCK that’s the mixture of the two. We’ll establish equality by showing equality for all functions and all y. Ξ(y)(f)=maxκ∈Πκ(y)(f)=maxκ∈{pκ1+(1−p)κ2|κ1∈Π1,κ2∈Π2}κ(y)(f) =maxκ1∈Π1,κ2∈Π2(pκ1+(1−p)κ2)(y)(f)=maxκ1∈Π1,κ2∈Π2pκ1(y)(f)+(1−p)κ2(y)(f) =pmaxκ1∈Π1κ1(y)(f)+(1−p)maxκ2∈Π2κ2(y)(f)=pΞ1(y)(f)+(1−p)Ξ2(y)(f) =(pΞ1+(1−p)Ξ2)(y)(f) Done, we’ve shown equality of the Z-PUCK with the mixture of other Z-PUCKs, establishing that the mixture of Z-PUCKs is a Z-PUCK.
That leaves establishing our relevant inequality. But before we do that, we’ll be wanting a nice handy form for that asterisk operator to manipulate things with. Given some κ that’s a Z-PUCK over Q, remember from the previous proof that a valid choice for μ and ϕ to break κ down is μ(z):=maxy′∈Y∑x′∈Q(y′)ϖκ(z,x′) and, abbreviating things a little bit, we have ϕ(y,z)(x):=χx∈Q(y)ϖκ(z,x)μ(z) So, we can get a pleasant-to-manipulate form for Θ∗κ as follows. (Θ∗κ)(λxyz.f(x,y,z))=(μ⋉Θ⋉ϕ)(λxyz.f(x,y,z)) =μ(λz.Θ(λy.ϕ(y,z)(λx.f(x,y,z)))) And proceed further =∑z∈Zμ(z)⋅Θ(λy.ϕ(y,z)(λx.f(x,y,z))) =∑z∈Zμ(z)⋅Θ(λy.∑x∈Xϕ(y,z)(x)⋅f(x,y,z)) =∑z∈Zμ(z)⋅Θ(λy.∑x∈Xχx∈Q(y)ϖκ(z,x)μ(z)⋅f(x,y,z)) And then we move the constant into Θ since it’s homogenous, and then into the sum, and it cancels out with the fraction. =∑z∈ZΘ(λy.∑x∈Xχx∈Q(y)⋅ϖκ(z,x)⋅f(x,y,z)) =∑z∈ZΘ(λy.∑x∈Q(y)ϖκ(z,x)⋅f(x,y,z)) =∑z∈ZΘ(λy.χQ(y)×{z}ϖκ(λx′z′.f(x′,y,z′))) This general form will be used whenever we need to unpack Θ∗κ. Now, let’s get started on the proof of our subset inclusion thingy. As usual, Π will be the set {pκ1+(1−p)κ2|κ1∈Π1,κ2∈Π2}, and as we’ve shown, that’s the set of Z-PoCK’s associated with pΞ1+(1−p)Ξ2. Also, as we’ve already shown, the associated Z-polycontribution ϖκ for κ=pκ1+(1−p)κ2 is pϖκ1+(1−p)ϖκ2. This will be implicitly used in the following. (Θ∗(pΞ1+(1−p)Ξ2))(λxyz.f(x,y,z))=maxκ∈Π(Θ∗κ)(λxyz.f(x,y,z)) Now we use our preferred unpacking of how that asterisk operator works. =maxκ∈Π∑z∈ZΘ(λy.χQ(y)×{z}ϖκ(λx′z′.f(x′,y,z′))) And unpack κ and ϖκ appropriately. =maxκ1∈Π1,κ2∈Π2∑z∈ZΘ(λy.χQ(y)×{z}(pϖκ1+(1−p)ϖκ2)(λx′z′.f(x′,y,z′))) =maxκ1∈Π1,κ2∈Π2∑z∈ZΘ(λy.pχQ(y)×{z}ϖκ1(λx′z′.f(x′,y,z′)) +(1−p)χQ(y)×{z}ϖκ2(λx′z′.f(x′,y,z′))) At this point, we use convexity of Θ, since it’s an ultradistribution. ≤maxκ1∈Π1,κ2∈Π2∑z∈Z(pΘ(λy.(χQ(y)×{z}ϖκ1)(λx′z′.f(x′,y,z′))) +(1−p)Θ(λy.(χQ(y)×{z}ϖκ2)(λx′z′.f(x′,y,z′)))) =maxκ1∈Π1,κ2∈Π2(p∑z∈ZΘ(λy.(χQ(y)×{z}ϖκ1)(λx′z′.f(x′,y,z′))) +(1−p)∑z∈ZΘ(λy.(χQ(y)×{z}ϖκ2)(λx′z′.f(x′,y,z′)))) At this point, you can pack up things. =maxκ1∈Π1,κ2∈Π2p(Θ∗κ1)(λxyz.f(x,y,z))+(1−p)(Θ∗κ2)(λxyz.f(x,y,z)) =pmaxκ1∈Π1(Θ∗κ1)(λxyz.f(x,y,z))+(1−p)maxκ2∈Π2(Θ∗κ2)(λxyz.f(x,y,z)) =p(Θ∗Ξ1)(λxyz.f(x,y,z))+(1−p)(Θ∗Ξ2)(λxyz.f(x,y,z)) =(p(Θ∗Ξ1)+(1−p)(Θ∗Ξ2))(λxyz.f(x,y,z)) Done! ■
Proposition 4.3: Let X, Y and Z be finite sets, Q⊆Y×X a relation, κ1,κ2:Y→Δc(X×Z) Z-PoCKs over Q, Θ∈□cY and p∈[0,1]. Then, pκ1+(1−p)κ2 is also a Z-PoCK over Q, and Θ∗(pκ1+(1−p)κ2)⊆p(Θ∗κ1)+(1−p)(Θ∗κ2)
Use Lemma 4, along with Z-PoCKs being a special case of Z-PUCKs.
Proposition 4.4: Let X, Y and Z be finite sets, Q⊆Y×X a relation and Ξ:Y→□c(X×Z) a Z-PUCK over Q. Denote Θ:=⊤Y∗Ξ. Define β0,β1:Z×Y×X→2Z×Y by β0(z,y,x):={z}×Q−1(x), \textbf{β1(z,y,x):=Z×Q−1(x)}. Then (Θ⋉β0)↓⊆Br(Θ)⊆(Θ⋉β1)↓
Proof: As usual, when establishing inequalities with downwards closures, we only have to verify the result for monotone functions. So, we may assume that f is monotone, and attempt to show that (Θ⋉β0)(λxyzα.f(x,y,z,α))≤BrZ×Y(Θ)(λxyzα.f(x,y,z,α)) ≤(Θ⋉β1)(λxyzα.f(x,y,z,α)) Remember, bridge transforms cash out as a maximum over contributions, so to show the first inequality, we’ll need to build a contribution that matches or exceeds that first term, and that lands in the bridge transform of Θ. For the second inequality, it’s considerably easier, we just use our Lemma 2 to figure out what sort of sets the bridge transform is supported on, swap out the sets it’s supported on for a bigger set upper bound, and bam, monotonicity of f takes over from there. From there, it’s easy to show the second inequality. Let’s unpack that first thing (Θ⋉β0)(λxyzα.f(x,y,z,α)) =Θ(λxyz.δβ0(x,z)(λα.f(x,y,z,α))) =Θ(λxyz.f(x,y,z,β0(x,z))) And at this point we unpack what Θ is. =(⊤Y∗Ξ)(λxyz.f(x,y,z,β0(x,z))) And the Ξ. =maxκ∈Π(⊤Y∗κ)(λxyz.f(x,y,z,β0(x,z))) And then, κ can be broken down into some μκ and ϕκ, and that goes on both sides of ⊤Y as our previous proposition shows. =maxκ∈Π(μκ⋉⊤Y⋉ϕκ)(λxyz.f(x,y,z,β0(x,z))) =maxκ∈Πμκ(λz.⊤Y(λy.ϕκ(y,z)(λx.f(x,y,z,β0(x,z))))) =maxκ∈Πμκ(λz.maxyϕκ(y,z)(λx.f(x,y,z,β0(x,z)))) Now we can start filling in some data. There’s a maximizing κ∗, so we can substitute that in. That gives us a canonical choice for what μκ∗ and ϕκ∗ are. Making that substitution, =μκ∗(λz.maxyϕκ∗(y,z)(λx.f(x,y,z,β0(x,z)))) And then, let d:Z→Y be the function mapping each particular z to the y which maximizes ϕκ∗(y,z)(λx.f(x,y,z,β0(x,z))). This lets us reexpress things as =μκ∗(λz.ϕκ∗(d(z),z)(λx.f(x,d(z),z,β0(x,z)))) And now, we can start unpacking things a bit. =μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.f(x,y,z,β0(x,z))))) =μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.δβ0(x,z)(λα.f(x,y,z,α))))) And now we can write things as just a giant semidirect product. =(μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.f(x,y,z,α)) Now we’ll show that this particular contribution lies in Br(Θ).
Checking the support condition, we want to check for sure that y,z∈α, ie, the event y,z∉α has measure 0. Let’s begin. (μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.χy,z∉α) =μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.δβ0(x,z)(λα.χy,z∉α)))) Substitute in the dirac-deltas. =μκ∗(λz.ϕκ∗(d(z),z)(λx.χd(z),z∉β0(x,z))) Unpack what β0(x,z) is. =μκ∗(λz.ϕκ∗(d(z),z)(λx.χd(z),z∉Q−1(x)×{z})) Now, z∈{z} always occurs, so that indicator function is the same as just testing whether d(z)∈Q−1(x). =μκ∗(λz.ϕκ∗(d(z),z)(λx.χd(z)∉Q−1(x))) Rephrasing things a little bit, =μκ∗(λz.ϕκ∗(d(z),z)(λx.χx∉Q(d(z)))) Then, from proposition 4.2, we remember that λy.ϕκ∗(y,z) is a PoCK over Q. Ie, for any particular y, ϕκ∗(y,z) looks like a particular measure (ϖκ∗,z) restricted to Q(y). So, in particular, ϕκ∗(d(z),z) must be supported over Q(d(z)). Put another way, with full measure, x∈Q(d(z)). So, this event failing has 0 measure. =μκ∗(λz.0)=0 And we’re done with that support condition.
Now to show the endofunction condition. As usual, we’ll let s:Y×Z→Y×Z, and let g:X×Y×Z→[0,1]. Actually, for conceptual clarity, since s:Y×Z→Y×Z can be viewed as a pair of functions sY:Y×Z→Y and sZ:Y×Z→Z, we’ll be using that formulation in our equation. (μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.χsY(y,z),sZ(y,z)∈αg(sY(y,z),sZ(y,z),x)) =μκ∗(λz.δd(z)(λy.ϕκ∗(y,z)(λx.δβ0(x,z)(λα.χsY(y,z),sZ(y,z)∈αg(sY(y,z),sZ(y,z),x))))) As usual, we’ll substitute in our dirac-deltas to simplify things. =μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z),sZ(d(z),z)∈β0(x,z)g(sY(d(z),z),sZ(d(z),z),x))) Substitute in what β0(x,z) is. =μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z),sZ(d(z),z)∈Q−1(x)×{z}g(sY(d(z),z),sZ(d(z),z),x))) Now, if that “this pair of points lies in this set” indicator function goes off, then sZ(d(z),z)=z. So, we can substitute that into the g term afterwards. And then get a ≤ inequality by making the indicator function less strict. =μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z),sZ(d(z),z)∈Q−1(x)×{z}g(sY(d(z),z),z,x))) ≤μκ∗(λz.ϕκ∗(d(z),z)(λx.χsY(d(z),z)∈Q−1(x)g(sY(d(z),z),z,x))) And reexpress the indicator function a little bit =μκ∗(λz.ϕκ∗(d(z),z)(λx.χx∈Q(sY(d(z),z))g(sY(d(z),z),z,x))) At this point, we can use that ϕκ∗(y,z) is χQ(y)ϖϕκ∗,z (ie, fixing z and varying y it just looks like you’re taking one measure and conditioning on various Q(y)), so reexpress things as =μκ∗(λz.(χQ(d(z))ϖϕκ∗,z)(λx.χx∈Q(sY(d(z),z))g(sY(d(z),z),z,x))) And then, view the indicator function as just more conditioning. =μκ∗(λz.(χQ(d(z))∩Q(sY(d(z),z))ϖϕκ∗,z)(λx.g(sY(d(z),z),z,x))) And then, relax about what you’re conditioning on. ≤μκ∗(λz.χQ(sY(d(z),z))ϖϕκ∗,z(λx.g(sY(d(z),z),z,x))) Rewrite it as a kernel again =μκ∗(λz.ϕκ∗(sY(d(z),z),z)(λx.g(sY(d(z),z),z,x))) Pull out the dirac-delta =μκ∗(λz.δsY(d(z),z)(λy.ϕκ∗(y,z)(λx.g(y,z,x)))) Throw one more inequality at it ≤μκ∗(λz.maxyϕκ∗(y,z)(λx.g(y,z,x)))) Write it as top =μκ∗(λz.⊤Y(λy.ϕκ∗(y,z)(λx.g(y,z,x)))) Write as a semidirect product =(μκ∗⋉⊤Y⋉ϕκ∗)(λyzx.g(y,z,x)) Reexpress =(⊤Y∗κ∗)(λyzx.g(y,z,x)) ≤maxκ∈Π(⊤Y∗κ)(λyzx.g(y,z,x)) =(⊤Y∗Ξ)(λyzx.g(y,z,x)) =Θ(λyzx.g(y,z,x)) And we’re done! endofunction condition shown. Our relevant contribution is in Br(Θ). Let’s see, where were we… ah right, we had shown that for all monotone f, (Θ⋉β0)(λxyzα.f(x,y,z,α)) =(μκ∗⋉d⋉ϕκ∗⋉β0)(λxyzα.f(x,y,z,α)) For some choice of d and κ∗. We know this is in Br(Θ), so we get ≤maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,α)) =Br(Θ)(λyzαx.f(x,y,z,α)) And we’re done! One inequality done. That just leaves showing the second inequality, where β1(x)=Z×Q−1(x). It’s actually not too bad to show. Start with Br(Θ)(λxyαz.f(x,y,z,α)) =maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,α)) And then, we recall our Lemma 2, that if Θ had its support entirely on x,y,z tuples where y,z in h(x) (for some h:X→2Y×Z), then all the θ∈Br(Θ) would be supported on (x,α) pairs where α⊆h(x). And then, swapping out α for h(x), by monotonicity of f, produces a larger value.
To invoke this argument, our choice of h will be β1, where β1(x)=Q−1(x)×Z. We do need to show that Θ is supported on such tuples. Θ(λxyz.χy,z∉Q−1(x)×Z)=Θ(λxyz.χy∉Q−1(x))=Θ(λxyz.χx∉Q(y)) =(⊤Y∗Ξ)(λxyz.χx∉Q(y))=maxκ∈Π(⊤Y∗κ)(λxyz.χx∉Q(y)) =maxκ∈Π(μκ⋉⊤Y⋉ϕκ)(λxyz.χx∉Q(y)) =maxκ∈Πμκ(λz.⊤Y(λy.ϕκ(y,z)(λx.χx∉Q(y)))) And then use that ϕκ(y,z)=χQ(y)ϖϕκ,z since it’s a PoCK in Q, to get =maxκ∈Πμκ(λz.⊤Y(λy.(χQ(y)ϖϕκ,z)(λx.χx∉Q(y)))) Hm, we updated on a set, and are evaluating the indicator function for not being in the set. =maxκ∈Πμκ(λz.⊤Y(λy.0))=0 Ok, so this means we can invoke Lemma 2. We were previously at =maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,α)) So now we can invoke monotonicity and go ≤maxθ∈Br(Θ)θ(λyzαx.f(x,y,z,β1(x))) And then invoke our endofunction property for the stuff in Br(Θ), letting s be the identity function (and also y,z∈α occurs always) to establish a uniform upper bound of ≤Θ(λxyzα.f(x,y,z,β1(x))) =Θ(λxyz.δβ1(x)(λα.f(x,y,z,α))) =(Θ⋉β1)(λxyzα.f(x,y,z,α)) And we’re done! Second inequality demonstrated. ■
Corollary 4.5: Suppose that for any d∈D, z∈Γ1 and π:H→A s.t. (d,z)∈supp W(π), it holds that dCzπ. That is, the observations W predicts to receive from the computer are consistent with the chosen policy and W’s beliefs about computations. Let L:D→R be a Cartesian loss function and π:H→A a policy. Define ~L:D×Γ1→R by ~L(h,z):=L(h). Then, (prelΓBr(ΘW)∩Cπfair)(Lphys)=W(π;~L)
To a large extent, this will follow the proof of the previous corollary. We’ll use β for 2Γ and α for just the policy component.
I’m going to be proceeding very cautiously here. First off, make our π value special by swapping it out for π∗ (prelΓBr(ΘW)∩Cπ∗fair)(Lphys) Now, by the identifications we made earlier, we can identify Γ with AH×Γ1, the space of policies and computations. Using that to unpack the function a little bit, we have =(prelΓBr(ΘW)∩Cπ∗fair)(λπzβ.Lphys(π,z,β)) Now, we note that intersecting with top of a particular set is equivalent to updating on the indicator function for that set. Using definition 1.5 to unpack Cπ∗fair, we get =(prelΓBr(ΘW))(λπzβ.χ∀h∈Hπ,z,β:Gπ,z(h)=π∗(h)Lphys(π,z,β)) Throughout we’ll be applying that Gπ,z(h) is “what would the agent do on h if the agent is copying the behavior of π” (remember, part of the math is “what does the agent do in response to this history” and π is our term for that chunk of the math), so we’ll just always rephrase things like that and won’t bother to say Gπ,z(h)=π(h) every time we do it. =(prelΓBr(ΘW))(λπzβ.χ∀h∈Hπ,z,β:π(h)=π∗(h)Lphys(π,z,β)) Pull off the projection, and use d for a destiny in D. =Br(ΘW)(λπzβd.χ∀h∈Hπ,z,β:π(h)=π∗(h)Lphys(π,z,β)) Applying definition 3.1, and using d′ instead of g to remember it’s a destiny, we have =Br(ΘW)(λπzβd.χ∀h∈Hπ,z,β:π(h)=π∗(h)minha:ha∈Xπ,z,βmaxd′:ha⊑d′L(d′)) Next up is unpacking Xπ,z,β. Using definition 3.1, it’s =Br(ΘW)(λπzβd.χ∀h∈Hπ,z,β:π(h)=π∗(h) minha:ha∈Hπ,z,β×A∧(∀(π′,z′)∈β:π′(h)=a)maxd′:ha⊑d′L(d′)) Now we’ll show that Hπ,z,β only depends on pr(β), ie, the projection of β from 2AH×Γ1 to 2AH, and that it gets smaller as β gets larger, so the function above is monotone. Let’s use definition 1.5 to unpack the event h∈Hπ,z,β. (∀h′a′⊏h,(π′,z′)∈β:π′(h′)=a′)∧(∃d′:h⊏d′∧d′Czπ) It shouldn’t be too hard to tell that β getting larger makes this set smaller, and that, since the z′ doesn’t matter, this condition is really the same as (∀h′a′⊏h,π′∈pr(β):π′(h′)=a′)∧(∃d′:h⊏d′∧d′Czπ) So, we’ll use pr(β) to remind us that our function only depends on the projection of β. =Br(ΘW)(λπzβd.χ∀h∈Hπ,z,pr(β):π(h)=π∗(h) minha:ha∈Hπ,z,pr(β)×A∧(∀π′∈pr(β):π′(h)=a)maxd′:ha⊑d′L(d′)) And now we can pull that out as a projection! We’re now using α for our set of policies, instead of β for our set of policy/computation pairs. =pr(Br(ΘW))(λπzαd.χ∀h∈Hπ,z,α:π(h)=π∗(h) minha:ha∈Hπ,z,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′)) To proceed further, we’re going to need to adapt our previous result which gets an upper and lower bound on the bridge transform of suitable Θ. Our first order of business is checking that α getting larger makes the function larger, which is easy to check since α getting larger cuts down on the options to minimize over (increasing value), and makes the antecedent of the implication harder to fulfill, which makes the implication as a whole easier to fulfill, so the indicator function is 1 more often. Now we can proceed further. Abstracting away a bit from our specific function, which will be swapped out for some f:AH×Γ1×D×2AH→[0,1] which is monotone in that last argument, we have (ΘW⋉β0)↓(λzπdβ.f(z,π,d,pr(β))) =(ΘW⋉β0)(λzπdβ.f(z,π,d,pr(β))) =ΘW(λzπd.f(z,π,d,pr(β0(z,d)))) =ΘW(λzπd.f(z,π,d,pr({z}×Q−10(d)))) =ΘW(λzπd.f(z,π,d,Q−10(d))) =ΘW(λzπd.f(z,π,d,pr(Z×Q−10(d)))) =ΘW(λzπd.f(z,π,d,pr(β1(d)))) =(ΘW⋉β1)(λzπdβ.f(z,π,d,pr(β))) =(ΘW⋉β1)↓(λzπdβ.f(z,π,d,pr(β))) Monotonicity was used to go back and forth between the downwards closure and the raw form.β0 and β1 are as they were in Proposition 4.4, and Q would be the relation on AH×D telling you whether a policy is consistent with a destiny. Now, by Proposition 4.4, since the bridge transform of ΘW is sandwiched between those two values, and they’re both equal, we have pr(Br(ΘW))(λzπdα.f(z,π,d,α))=Br(ΘW)(λzπdβ.f(z,π,d,pr(β))) =ΘW(λzπd.f(z,π,d,Q−10(d)))=(ΘW⋉Q−10)(λzπdα.f(z,π,d,α)) The first equality was just relocating the projection. the second equality was from the bridge transform being sandwiched between two equal quantities, so it equals all the stuff on our previous big list of equalities (we went with the middle one). Then just express as a semidirect product, and you’re done. Applying this to our previous point of =pr(Br(ΘW))(λπzαd.χ∀h∈Hπ,z,α:π(h)=π∗(h) minha:ha∈Hπ,z,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′)) We can reexpress it as =(ΘW⋉Q−10)(λπzαd.χ∀h∈Hπ,z,α:π(h)=π∗(h) minha:ha∈Hπ,z,α×A∧(∀π′∈α:π′(h)=a)maxd′:ha⊑d′L(d′)) And start unpacking the semidirect product =ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h) minha:ha∈Hπ,z,Q−10(d)×A∧(∀π′:dQ0π′→π′(h)=a)maxd′:ha⊑d′L(d′)) Now, we’re going to try to address that minimum, and show that the only ha that fulfill the conditions are exactly those ha⊑d. This requires showing that ha⊑d is a sufficient condition to fulfill the relevant properties, and then to show that ha⋢d implies a failure of one of the properties.
The proof for this is almost entirely identical to the corresponding proof in the non-turing-law case, there are no substantive differences besides one issue to clear up. We need to show that W(π) always being supported on the relation C, for all π (as one of our starting assumptions) implies that ΘW is supported on C as well. Here’s how we do it. We have ΘW=⊤AH∗W. And then this ultracontribution (by the definition of Γ1-PUCK’s) can be written as the convex hull of the union of a bunch of ultracontributions of the form ⊤AH∗w, where w is a Γ1-PoCK. So, if we can show all of these are supported on the relation C, then the same holds for the convex hull of their union, ie, ΘW. By Proposition 4.2, we can reexpress this ultracontribution as μw⋉⊤AH⋉ϕw, where w(π)=μw⋉(λz.ϕw(z,π)), for any policy π. Now, let’s check the expectation value of the indicator function for C being violated. (μw⋉⊤AH⋉ϕw)(λzπd.χ¬dCzπ) =μw(λz.⊤AH(λπ.ϕw(z,π)(λd.χ¬dCzπ))) =μw(λz.maxπϕw(z,π)(λd.χ¬dCzπ)) Let’s assume that the expectation of that indicator function is \emph{not} zero. Then there must be some particular z in the support of μw where it is nonzero, and some particular π∗ that attains that nonzero expectation value. So, there’s a z in the support of μw and π∗ s.t. ϕw(z,π∗)(λd.χ¬dCzπ∗)>0 and so this means that we have μw(λz.ϕw(z,π∗)(λd.χ¬dCzπ∗))>0 Because that z is assigned nonzero measure, and then this reshuffles to (μw⋉(λz.ϕw(z,π∗)))(λd.χ¬dCzπ∗)>0 Which, via Proposition 4.2, is w(π∗)(λzd.χ¬dCzπ∗)>0 But this contradicts that W(π∗) (and so all the w(π∗)) were supported on the event dCzπ∗, so we have a contradiction, and our result follows.
Now that that issue is taken care of, we can swap out our fancy minimization with just minimizing over the ha⊑d. =ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)minha:ha⊑dmaxd′:ha⊑d′L(d′)) This minimization is attained by selecting d itself. So then it turns into =ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)) At this point, we’ll upper-and-lower-bound this quantity by W(π∗)(λzd.L(d)). Let’s begin. W(π∗)(λzd.L(d)) =maxw∈Πw(π∗)(λzd.L(d)) =maxw∈Πμw(λz.ϕw(π∗,z)(λd.L(d))) Here we’re using Proposition 4.2 on being able to split up Z-PoCK’s (which W is) a certain way. =maxw∈Πμw(λz.maxπϕw(π∗,z)(λd.χdQ0πL(d))) This equality happens because you can always just pick π∗ as your choice of π, and since ϕw(π∗,z) can only produce destinies consistent with π∗. Now, we can swap out ϕw for the measure we’re conditioning on an event =maxw∈Πμw(λz.maxπ(χQ(π∗)ϖϕw,z)(λd.χdQ0πL(d))) =maxw∈Πμw(λz.maxπϖϕw,z(λd.χdQ0π∗χdQ0πL(d))) Reshuffle the indicator function back in =maxw∈Πμw(λz.maxπ(χQ(π)ϖϕw,z)(λd.χdQ0π∗L(d))) =maxw∈Πμw(λz.maxπϕw(π,z)(λd.χdQ0π∗L(d))) =maxw∈Πμw(λz.⊤AH(λπ.ϕw(π,z)(λd.χdQ0π∗L(d)))) =maxw∈Π(μw⋉⊤AH⋉ϕw)(λπzd.χdQ0π∗L(d)))) =maxw∈Π(⊤AH∗w)(λπzd.χdQ0π∗L(d)))) =(⊤AH∗W)(λπzd.χdQ0π∗L(d)))) =ΘW(λπzd.χdQ0π∗L(d)))) Now we must show that this is a looser constraint than what was previously in our indicator function to proceed further. So our next order of business is showing that, certainly, ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)→dQ0π∗ Let d be an arbitrary destiny in the support of ΘW, and h be one of the history prefixes of d. The two conditions for h∈Hπ,z,Q−10(d) are fulfilled, because they are ∀h′,a′,π′:h′a′⊏h∧dQ0π′→π′(h′)=a′ ∃d′:h⊏d′∧d′Cπz For the first condition, if h′a⊏h, then h′a′⊏d, and so if π′ is consistent with d, it must take the same action in response to h′, the action that d commands. So it is fulfilled. For the second condition, let d′ be d. h⊏d holds, and dCπz holds certainly, because ΘW is supported on C, as we’ve previously shown.
So, certainly, h⊏d→h∈Hπ,z,Q−10(d). Since we assumed our forall statement as prerequisite, this means that for all h⊏d, π(h)=π∗(h). And dQ0π means ∀ha⊑d:π(h)=a. Since π∗(h) mimics π(h) for all history prefixes of d, this means ∀ha⊑d:π∗(h)=a, ie dQ0π∗.
So, since this is a looser constraint, when we were previously at =ΘW(λπzd.χdQ0π∗L(d)))) we can proceed further to ≥ΘW(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)) which is the value we wanted to sandwich. Proceed further with =(⊤AH∗W)(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)) =maxw∈Π(⊤AH∗w)(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)) =maxw∈Π(μw⋉⊤AH⋉ϕw)(λπzd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)) =maxw∈Πμw(λz.⊤AH(λπ.ϕw(π,z)(λd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d)))) =maxw∈Πμw(λz.maxπϕw(π,z)(λd.χ∀h∈Hπ,z,Q−10(d):π(h)=π∗(h)L(d))) ≥maxw∈Πμw(λz.ϕw(π∗,z)(λd.χ∀h∈Hπ∗,z,Q−10(d):π∗(h)=π∗(h)L(d))) =maxw∈Πμw(λz.ϕw(π∗,z)(λd.L(d))) =maxw∈Π(μw⋉(λz.ϕw(π∗,z)))(λzd.L(d))) =maxw∈Πw(π∗)(λzd.L(d)))=W(π∗)(λzd.L(d))) And we’ve got the same upper and lower bound, so our overall quantity is W(π∗)(~L) And we’re done. ■