Now, what about updates? We’ll use ugh (and suppress the π¬h that should be there) as shorthand for the function that maps (m,b) over Θ(π¬h∙πpa) to (c(m|h),b+m(0★hg)) in Ma(F(πpa)) (or the nirvana-free or sur variant of this), and also use ugh as a function from belief functions to belief functions (just map all the sets through)
Lemma 27:When updating, the closure adds no nirvana-free points that weren’t present originally if Nonemptiness, Nirvana-free upper-completion and closure holds originally (works in the sur-case too)
Proof sketch: We take a sequence Mn limiting to M, and then take a preimage point of Mn, go to a minimal below it, find a limit point in our original set by Compactness, and map it back through the update, getting a point below M. Then, we find what we need to add to that to get M, and find something above our limit point that maps to M, so we didn’t actually need closure anyways because we made M as an image of a nirvana-free point present in the original set.
Proof: Fix a sequence Mn in ugh(Θ)(πpa) (but without the defining closure part in the end) that limit to M which is nirvana-free.
Every Mn has a preimage point M′n∈Θ(π¬h∙πpa) with no nirvana off-h. For each M′n, find a minimal point M′lon below it, which have a λ⊙+b⊙ bound by bounded-minimals, so we can find a convergent subsequence limiting to M′lo (actually, might not be minimal, still a limit of minimal points, though). Shoving the M′lon (and the limit point) back through the update (which is a continuous function), we get a sequence Mlon limiting to Mlo (the thing you get from pushing M′lo through the update).
Since M′n lies above M′lon (upper-completion ordering), then updating preserves that property, because the update function is linear. Thus, all the Mlon lie below their corresponding Mn. Now, we can invoke Lemma 16 to conclude that Mlo lies below M. It lies below a nirvana-free point, so M′lo is nirvana-free as well. Now, we just need to show nirvana-free upper-completion because M=Mlo+M∗.
We can take M′lo and add on (m∗,b∗) (extend the measure back to the original domain by sticking an h prefix on everything, and saying that the measure is 0 everywhere else), making an a-measure that’s in Θ(π¬h∙πpa), by nirvana-free uppper-completion there. By linearity, and the update not affecting (m∗,b∗) (it’s 0 outside of h so the g outside of h doesn’t get to contribute anything to the b term when we update), updating M′lo+(m∗,b∗) makes Mlo+M∗=M. So, if a nirvana-free point appears post-update (with closure), then it’ll appear post-update (without closure).
First, (pr(π¬h∙πhipa),(π¬h∙πlopa)∗(m))(0★hg)=m(0★hg) This is because the projection down doesn’t change the measure at all outside of h, and we’re evaluating a function that’s 0 inside h. So, that takes care of the b term. Also, projection preserves the b term, so our desired equality is:
For the measure term, the first is “restrict to on-h histories, clip off the h prefixes, and project down”, and the second is “project down the on-h histories accordingly, then restrict to on-h histories and clip off the h prefix”, which are obviously equal.
Proposition 9:For causal, surcausal, pseudocausal and acausal hypotheses, updating them produces a causal, surcausal, pseudocausal or acausal hypothesis as long as renormalization doesn’t fail.
Proof sketch: What we can do is consider the “raw update”, show that it preserves all nice properties except for renormalization, and then show that the renormalization terms in the update are the proper renormalization terms to use. Thus, we’ll define our raw update ugh(Θ) via: ugh(Θ)(πpa) is: Θ(π¬h∙πpa)∩NF off h, mapped through the following function: (m,b)↦(c(m|h),b+m(0★hg)) And then you take the closure of the resulting set at the end.
So, we take our partial policy πpa and glue it to the off-h-partial policy π¬h, go to that part of the original belief function, strip off everything that has nirvana off-h (for Murphy shall not select those, and properly, that should make the b term infinite post-update), slice off the part of the measure off-h, strip off the h prefix from those histories, and go “ok, our utility function is g, let’s take the expected utility off-h and fold it into the b term”
If we can get all nice properties but renormalization, we’re good, just appeal to Lemma 24. As for showing the conditions: We’re in for something almost as bad as one of the directions of the Isomorphism theorem. Nonemptiness, closure, and convexity are trivial, upper completion, pseudocausality, and bounded-minimals are easy and the extreme point condition and causality are moderately tricky.
For the extreme point condition, Step 1 is establishing that ufh(Θ)(πpa)∩NF equals the closed convex hull of nirvana-free projections from above by an argument that makes sense when you sketch it out but may be difficult if you don’t have it sketched out, step 2 is using Hausdorff-continuity and Lemma 20 to turn it into an ordinary convex hull, and finally, arguing that a nirvana-free exteme point must have come from a nirvana-free point from above via step 2.
For causality, we can (for the most part) just go back to Θ, get an outcome function there, and map it back through the update to get an outcome function, the hard part is netting the limit points, which requires a limit of outcome functions. But because we want a countable product of sets to get sequential compactness from Tychonoff, we have to work with stubs, which adds some extra complexity.
Hausdorff-continuity is just hellishly hard, we need to show that the preimages of the sets post-update are the updates of the preimages of sets pre-update, and then combine that with some fancy work with minimal points and upper completions and using two different characterizations of uniform continuity at once via Lemma 15, and a touch of functional analysis. There’s way too many interacting points and sets in this one.
But easily the biggest grind is Consistency. We have 4 subset directions to show, each of which requires their own separate fancy argument, and two of them require splitting into a nirvana-containing/causal case and a nirvana-free case, so it’s a 6-part proof. A good chunk of complexity arises because we have to take closure in the nirvana-containing case, an issue which goes away if we just let Nirvana be 1 reward forever. Let’s begin.
Condition 1: Nirvana-free Nonemptiness:
This is trivial. Just pick a nirvana-free point in Θ(π¬h∙πpa) by nirvana-free nonemptiness, and update, to get one in ugh(Θ)(πpa).
Conditions 2,3: Closure, Convexity:
Closure is a tautology since we took the closure. For convexity, the closure of a convex set is convex, and ugh is a linear function, so it maps convex sets to convex sets.
Condition 4: Nirvana-free Upper-Completeness:
First, invoke Lemma 27 to see that all nirvana-free points must have been present in the raw ugh(Θ(π¬h∙πpa)) set to begin with, without the closure. What we want is that, if M′=M+M∗, and M lies in the raw updated set and is nirvana-free, and M′ is a nirvana-free a-measure, then M∗ lies in the updated set as well.
Find a M′′ that maps to M after updating. It must be nirvana-free, because the nirvana either occurs without h as a prefix (which is forbidden because all that stuff gets clipped off and doesn’t get pushed through the update), or the nirvana occurs with h as a prefix, but then it’d show up in the measure component of M post-update, contradicting its nirvana-freeness. Now, we can consider M′′+(m∗,b∗) (basically, M∗, but we take the measure back by sticking an h prefix on everything, and saying that it’s 0 off-h). This is present in Θ(π¬h∙πpa), by nirvana-free upper completion. By linearity of updating, and m∗ having no measure in any off-h area where it’d get picked up by g, this updates to M+M∗, witnessing that M′ lies in the image of the update, so we get nirvana-free upper completion.
Condition 5: Bounded-Minimals:
For bounded-minimals, we can pull the Lemma 16 trick of taking our M of interest that Mn limit to, taking a preimage M′n for each Mn, finding a minimal M′lon below each M′n (which obeys a λ⊙+b⊙ bound and also has no nirvana off-h), getting a limit point M′lo (still no nirvana off-h) by compactness, and pushing the sequence through the update, to get a sequence Mlon below Mn limiting to Mlo which is below M (Lemma 16) Now, we just have to check up on the λ+b values of our Mlon sequence, and show that they respect the λ⊙+b⊙ bound, to transfer this to Mlo. The raw update deletes measure from off-h, and assigns it the value that g does, which is 1 or less, so any increase in b correspond to an equal-or-greater decrease in λ, so the Mlon all obey the λ⊙+b⊙ bound as well. Thus, the limit point Mlo obeys the bound, and it’s below our original M, so any minimal must obey the λ⊙+b⊙ bound.
Condition 7: Consistency:
This is going to be extremely tedious and difficult to show, it’s a 6-part proof. The first 3 parts are devoted to showing that ugh(Θ)(πpa)=¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π)))
Part 1 is showing ugh(Θ)(πpa)⊆¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π))) In the nirvana-free pseudocausal/acausal case.
Let M be in ugh(Θ)(πpa). By Lemma 27, since we’re working in the nirvana-free case, we didn’t need to take the closure, it won’t add any points that aren’t there anyways. So, M has a preimage point M′∈Θ(π¬h∙πpa) that maps to it. By consistency for Θ, M′ lies in the closed convex hull of projections of policies down from above, so there are points in the convex hull of projections of policies that are arbitrarily close to M′, fix some sequence M′n of points in the convex hull of projections down from policies above that limit to M′. Mapping these through the raw update (which is continuous) we get a sequence Mn of points in ugh(Θ)(πpa) that limit to M.
All these policies above (π¬h∙πpa) have the form (π¬h∙π). So, M′n can be written as a mix of finitely many M′i,n, which are the projections of M′∞i,n from above, in policies. Update those, getting points M∞i,n in ugh(Θ)(π). These project down to Mi,n, which mix back together to make… Mn. This is because of Lemma 28, that update-then-project equals project-then-update. Also, mix-then-project equals project-then-mix. Remember, Mn is made by: “Project M′∞i,n down to make M′i,n, then mix to make M′n, then update.”
So, we can go project-mix-update equals mix-project-update equals mix-update-project equals update-mix-project equals update-project-mix, which is the process “update the M′∞i,n to M∞i,n, project down to Mi,n, mix to Mn”
The first equality is linearity of projection, the second equality is Lemma 28, the third equality is linearity of updating, the final equality is linearity of projection again.
Anyways, taking stock of what we’ve done, we have a sequence Mn limiting to our M of interest, and every Mn is crafted by taking points from finitely many ugh(Θ)(π), projecting them down, and mixing them. Therefore, our M∈ugh(Θ)(πpa) lies in the closed convex hull of projections down from above.
Part 2: we’ll show this again, but in the nirvana-containing case, where we’ll leverage causality.
Fix a M∈ugh(Θ)(πpa) (with closure). There’s a sequence Mn that limits to it, that lies in the same set, but without closure, so we can take preimage points M′n∈Θ(π¬h∙πpa) that update to make Mn. By causality, fix some arbitrary policy above (π¬h∙πpa), which can be expressed as (π¬h∙π), where π≥πpa. Anyways, we can take M′n, and use causality to get an outcome function of, to get a M′∞n∈Θ(π¬h∙π) that projects down to M′n. We don’t have to worry about nirvana off-h, because M′n already specifies everything that happens off-h and it says no nirvana occurs in that case. So, M′∞n can be updated to make a M∞n in ugh(Θ)(π). By Lemma 28, this must project down to Mn. So, all our Mn lie in the projection of ugh(Θ)(π), and since M is a limit point of that sequence, it must lie in the closed convex hull of projections.
And we’ve shown that ugh(Θ)(πpa)⊆¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π)))
And have taken care of 2 of our 6 parts. Now for the reverse direction, that
Thankfully, this can be done with a general argument that isn’t sensitive to the presence of Nirvana.
Part 3: Fix a M in the closed convex hull, which has a sequence Mn limiting to it that’s in the convex hull of projections down from above. the Mn shatter into finitely many Mi,n, which are projections of M∞i,n∈ugh(Θ)(πi). Now, these aren’t necessarily preimage points, they may have been added in the closure. Thus, we can perturb by 2−n or less if needed to make a M′∞i,n which does have a preimage point. Projecting these down to M′i,n and mixing, crafts a M′n point that is within 2−n of Mn (remember, projection doesn’t expand distance), so the sequence M′n still has M as a limit point (it gets arbitrarily close to a sequence that gets arbitrarily close to M). If we can show that all the M′n lie in ugh(Θ)(πpa), then by closure, we’ll get that M lies in the same set so we’re done.
Ok, so we have M′∞i,n∈ugh(Θ)(πi), that project down and mix to make M′n, and importantly, we crafted them so they’re produceable without closure. Thus, they have preimage points M′′∞i,n∈Θ(π¬h∙πi) (that lack nirvana off-h) Project them down to make M′′i,n∈Θ(π¬h∙πpa), and mix them to make a M′′n in the same set (which still lacks nirvana off-h), and this updates to make M′n via Lemma 28, as we’ll show shortly.
Starting with the M′′∞i,n, we know that update, project, mix equals M′n via going M′∞i,n, M′i,n, M′n. Then, update-project-mix equals project-update-mix equals project-mix-update, which is the path we took. Therefore, all the M′n lie in ugh(Θ)(πpa), which is closed, so M (arbitrary in the closed convex hull of projections) lies in the same set, establishing the reverse subset direction and thus equality,
Part 4: Now that we’re halfway done,let’s look at the “intersection of preimages of stubs from below” direction of consistency, ugh(Θ)(πpa)⊆⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa)). If we ignore the closure part and work with the raw update set sans closure, we can fix a M in ugh(Θ)(πhipa), take a preimage point in Θ(π¬h∙πhipa), project it down to Θ(π¬h∙πlopa) by consistency, then update it to get exactly the projection of M (again, Lemma 28) Then, when we take the closure, we can just take our M in the closure, fix a sequence in the raw update set sans closure Mn that limits to M, project down, getting M′n in the raw update set ugh(Θ)(πlopa) sans closure, and then the limit point M′ lies in ugh(Θ)(πlopa) by closure, and by continuity of projection, M′ is the projection of M.
Since the sets get bigger as you go down, we can invoke Lemma 6 to swap out the intersection of preimages of all stubs below you, for the intersection of preimages of stubs of the form πnpa, this will be important later.
Now, it’s trivial to show that ugh(Θ)(πpa)⊆⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa)) because we’ve established that projecting down makes a subset, and projection commutes, so any M∈ugh(Θ)(πpa) projects down into ugh(Θ)(πnpa) for all n.
All that’s left now is the reverse subset direction,
ugh(Θ)(πpa)⊇⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa))
Sadly, this one will require us splitting into the nirvana-containing (and thus causal) cases and the nirvana-free cases, and it’s a really difficult one to show.
Part 5: Let’s address the nirvana-free case, we’ll use a nifty trick to control the size of the preimage points we select.
Ok, let’s say you have a M with some λ1 and b1 value. And you take M′ that’s a preimage point, but its λ and b values are just… waaay too high. We want to have a preimage point with reasonable values, in order to apply bounding arguments. What you do, is find a minimal-point Mmin below M′, so M′=Mmin+M∗. Now, what you do, is swap out M∗ ie (m∗,b∗), for (m∗|h,b∗+m∗(0★hg)). This is an sa-measure, because
b∗+m∗(0★hg)+(m∗|h)−(1)=b∗+m∗(0★hg)+m∗−(1★h0)
≥m∗−(0★hg)+m∗−(1★h0)=b∗+m∗−(1★hg)≥b∗+m∗−(1)≥0
Now, consider updating Mmin+(m∗|h,b∗+m∗(0★hg)) instead (it’s an a-measure, it has less negative parts than M∗, and is present by nirvana-free upper-completion). This gets you the update of Mmin, plus… (c(m∗|h),b∗+m∗(0★hg)) (remember, 0 measure off-h). Which is the exact same thing you’d get by updating M∗, so when we updated our new sum, we hit M exactly.
However, this sum is special, because we can stick some decent bounds on its λ and b value! For starters, its b value is less than b1 (updating only adds on b-mass, and it updates to M). And as for the λ value… well, Mmin has its λ bounded above by λ⊙ (of the original Θ) due to being a minimal point. And in the worst-case, all of the measure in M came from the thing we added, so m∗|h has a measure of λ1 or less. So our bound on the λ value is λ⊙+λ1.
Armed with this knowledge, we can begin to prove the last bit of consistency in the nirvana-free case. Take a M in the intersection of preimages. It projects down to make Mn in ugh(Θ)(πnpa). Projection preserves λ and b, so they all have the same λ1,b1 bounds. Because we don’t need to close in the nirvana-free case, we get a preimage point of M′n in Θ(π¬h∙πnpa) From our earlier considerations, we can always pick M′n such that its λ is ≤λ⊙+λ1, and its b is ≤b1, although we’ll be using a bound of max(b1,b⊙).
Now, we’re going to have to be extremely careful here. Let the point M′n,j be defined as:
If j<n, then M′n,j is some arbitrary point in Θ(π¬h∙πnpa), with λ equal to or below λ⊙+λ1, and b equal to or below max(b1,b⊙), which always exists by all minimal points obeying the λ⊙+b⊙ bound.
If j=n, then M′n,j=M′n.
If j>n, then M′n,j=pr(π¬h,πjpa),(π¬h,πnpa)∗(M′j)
Then, the tuple of M′n,j for all n is a point in:
∏n(Θ(π¬h∙πnpa)∩{(λμ,b)|λ≤λ⊙+λ1,b≤max(b1,b⊙)})
Equipped with the product topology. In particular, this is a product of compact sets, so by Tychonoff’s theorem, it’s compact. Thus, we can get a convergent subsequence of the tuples. On this subsequence, all the M′n,j converge to a limit point M′n,∞, regardless of n.
Also, M′n,∞ projects down to M′m,∞ if n≥m, because for large enough j, the projection of M′n,j will always be M′m,j, and by continuity of projection, the projection of M′n,∞ must be M′m,∞
Ok, so we’ve got an infinite sequence of M′n,∞ for all n that all project down onto each other. Another nice feature is that updating M′n,∞ produces Mn. This is because, when j climbs high enough, M′j,j projects down to M′n,j, and M′j,j is just M′j which updates to Mj, which projects down to Mn. By Lemma 28, update-then-project equals project-then-update, so M′n,j must update to Mn, for all sufficiently high j. The preimage of a single point is closed, so past a certain point, the M′n,j are wandering around in the preimage of Mn, so M′n,∞ also updates to Mn.
Now, our next step is, does the M′n,∞ sequence in Θ(π¬h∙πnpa) pick out a single point M′ in Θ(π¬h∙πpa) that projects down accordingly? Yes it does. Just intersect all the preimages of single points, they’re nested in each other and compact so the finite intersection property holds, and if the intersection wasn’t composed of a single point, you’d have two distinct points with a difference at some finite time, but projecting down to any finite time the two distinct points are identical, so there can only be a single point in the intersection. Further, it must lie in Θ(π¬h∙πpa), because you can project it down to M′n,∞ in Θ(π¬h∙πnpa) for any n, which, by consistency for Θ, you can also project down to Θ((π¬h∙πpa)n) (projecting down further), so it’s present in the intersection of all the preimages, certifying that it’s in the appropriate set.
Now, finally… does M′, when updated, produce M, certifying that the point in the intersection of preimages is also in the raw update set? Well, let’s say it didn’t. Then we get a M′′ that’s not equal to M, so projecting down to some finite n should suffice to observe that. However, projecting M′′ and M down produces… Mn. This is because of Lemma 28, update-then-project equals project-then-update. Projecting M′ down makes M′n,∞, which updates to Mn.
So, no finite stage suffices to observe the difference between the updated form of M′ and M itself, so they must be identical, certifying ugh(Θ)(πpa)⊇⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa)) for the nirvana-free case.
Part 6: Let’s move to the nirvana-case where we can leverage causality. We’ll be showing this in a rather nonstandard way. We’re going to pick a π≥πpa, and show that our M of interest in the intersection of preimages can be written as a limit of points projected down from ugh(Θ)(π), establishing that M lies in the closed convex hull of points from above, which we’ve already shown equals ugh(Θ)(πpa).
Ok, so M is in the intersection of preimages. Project it down to all the ugh(Θ)(πnpa), getting a batch of points Mn from them. This is the raw update set, so within 2−n or less distance from Mn, there’s a M′n in the raw update sans closure, which has a preimage point M′′n that lies in Θ(π¬h∙πnpa).
Now, pick some arbitrary policy above π¬h∙πpa, which can be written as π¬h∙π. Moving on even further, by causality, we can get a point M′′∞n∈Θ(π¬h∙π) that projects down to M′′n. Update M′′∞n to get a M′∞n∈ugh(Θ)(π), which then (by our earlier thing about how a set equaled the closed convex hull of projections down from above), projects down to a M′hin∈ugh(Θ)(πpa).
Now, we can ask whether the sequence M′hin limits to M itself.ugh(Θ)(πpa) is closed, so this would certify that M lies in the appropriate set.
First, observe that the projection of M′hin down to πnpa is M′n. This is by Lemma 28, update-then-project equals project-then-update.M′′∞n projects down to M′′n, which updates to M′n, so M′n must be what you get by updating M′′∞n to M′∞n, and projecting down to πpa (making M′hin), and projecting that down to πnpa.
Now, because projection preserves the b term, and M′hin projects down to M′n which is within 2−n of Mn (not much of a difference in the b terms), and Mn has the same b term as M, we can certify convergence of the b term at least. Now for convergence of the measure components. Again, M′hin projects down to M′n which is within 2−n of Mn (not much difference before timestep n, shrinking increasingly low), and Mn perfectly mimics what M does before timestep n. So, M′hin behaves increasingly closely to M for everything before time n, which increases without bound. Increasingly close matches on increasingly large initial segments of what happens mean that M′hin must limit to M itself, certifying that M lies in ugh(Θ)(πpa) for the causal cases.
That’s the last bit we needed! We’re finally done with consistency now. This just leaves the hausdorff-condition and the extreme-point condition and pseudocausality and causality.
Condition 9: Hausdorff-continuity:
What we need to do for our setup to even approach this is to show that updating the preimage of the nirvana-free part of Θ(π¬h∙πpa), produces exactly the preimage of the nirvana-free part of ugh(Θ)(πpa).
One direction, we can get easily. If you fix a M′∞ in the preimage of the nirvana-free part of Θ(π¬h∙πpa), it projects down to a M′∈Θ(π¬h∙πpa)∩NF, that updates to a M∈ugh(Θ)(πpa), then by Lemma 28, project-then-update equals update-then-project, so M′∞ must update to a M∞ that projects down to M, certifying that updating the preimage of the nirvana-free part of Θ(π¬h∙πpa) produces a subset of the preimage of the nirvana-free part of ugh(Θ)(πpa).
In the other direction, fix a M∞ in the preimage of the nirvana-free part of ugh(Θ)(πpa). It projects down to a M in ugh(Θ)(πpa)∩NF, and by Lemma 27, M wasn’t introduced in the closure, so it has a preimage point M′∈Θ(π¬h∙πpa)∩NF.
Now, how do we extend M′ to craft a M′∞ that updates to M∞? Well, we can split into two parts. What happens on-h, and what happens off-h? For the off-h part, the post-update part has everything folded into the b term, while the pre-update part has an actual measure specified everywhere. Thus, our M′∞ should have the same off-h part as M′ to project down accordingly, so updating folds it into the same b term as M∞ has.
Now, for the on-h part, it’s a bit more complicated.M∞ specified what happens for all infinite histories with h as a prefix. However, M and M′ only specify part of that data, but fortunately agree on that part. Thus, for M′∞, you can just extend with the conditional probabilities of M∞, to perfectly mimic it on-h. This makes a M′∞ in the preimage that updates to M∞.
Ok, so the appropriate preimages for Hausdorff-continuity (post-update) are made exactly by updating the preimages for Hausdorff-continuity (pre-update). Now, updating is a continuous linear operator. We’re mapping from the Banach space M±((A×O)ω)⊕R to the Banach space M±(h(A×O)ω)⊕R.
Well, this isn’t quite right, your actions and observations may vary depending on where you are in history, but the general thing of “restrict to signed measures over infinite histories with h as a prefix” still checks out. Updating is still a continuous linear operator between Banach spaces, by Lemma 8 of Section 1.
Also, all continuous linear operators between Banach spaces are bounded, and thus Lipschitz-continuous at 0, and thus Lipschitz-continuous everywhere due to linearity. So, when we push two points that are only ϵ apart through the update, they’re now Cϵ apart at most, where C is a finite constant.
We’re going to have a lot of points. Unusually enough, we’ll be using the standard formulation of Hausdorff-continuity for our original Θ, that for all ϵ, there’s a δ where two partial policies πpa and π′pa that are δ or less apart have (pr∞,πpa∗)−1(Θ(πpa)∩NF∩{≤⊙}) (and the analogous set for π′pa) being only ϵ apart in Hausdorff-distance.
Fixing your ϵ, you’re gonna want δ to be low enough to force a ϵC difference between the clipped preimages, and δ<ϵC. It’s highly advised to sketch out how our points interact and what sets they’re in. A superscript of infinity will be used to denote points in the preimages of the ugh(Θ)(πpa) sets (or Θ(π¬h∙πpa)) (ie, at the infinite levels), and a superscript of “u” specifies post-update while its lack is pre-update.
Anyways, here’s our points.
Mu,∞ lies in the preimage of ugh(Θ)(πpa)∩NF, and it’s our point that we want to find a point nearby.λ will refer to the λ value of this thing.
Projecting Mu,∞ down to ugh(Θ)(πpa)∩NF makes Mu.
We can find a minimal point below Mu, Mu,min in ugh(Θ)(πpa)∩NF. Mu,min+Mu,∗=Mu.
A nirvana-free point wasn’t introduced by the closure, and it has a minimal point in its preimage, so there’s a Mmin in Θ(πpa) that updates to Mu,min, and respects the λ⊙+b⊙ bound of Θ.
Let Mlo be defined as Mmin+((mu,∗)−,−(mu,∗)−(1)). We’re extending the negative-measure part of Mu,∗ back to its original domain by sticking an h prefix on everything, and saying it’s 0 everywhere else. this is an a-measure that lies in Θ(π¬h∙πpa)∩NF∩{≤⊙} (because Mmin respects the λ⊙+b⊙ bound, and the thing that we added has a λ+b value of 0)
Let M be defined as Mlo+((mu,∗)+,bu,∗+(mu,∗)−(1)), it also lies in the same set Updating M makes Mu, because, unpacking M, it’s Mmin+Mu,∗, which updates to Mu,min+Mu,∗ which adds up to make Mu.
Our goal now is to explicitly construct a M∞ and Mlo,∞ in the preimage of Θ(π¬h∙πpa)∩NF s.t. they project down onto M and Mlo, Mlo,∞ lies below M∞, and M∞ updates to Mu,∞.
A sufficient way to do this is to make Mlo,∞ and M∞ by, after h, extending the measures further with the conditional probabilities of the measure component of Mu,∞. Extending ((mu,∗)+,bu,∗+(mu,∗)−(1)) with the conditional probabilities of Mu,∞ witnesses that Mlo,∞ lies below M∞. They obviously project down onto M and Mlo.
As for M∞ updating to Mu,∞, the b term and the fragment of the measure that doesn’t get ignored by projection down matches because M∞ projects to M which updates to Mu which is the projection of Mu,∞. And, for the fragment of the measure that isn’t defined in Θ(π¬h∙πpa), but that must be present on the infinite levels, we copied the conditional probabilities of the measure component Mu,∞, so we’ve got a match there.
Taking a break from setting up all our damn points for a brief recap, we have a Mlo,∞ that lies in the preimage of Θ(π¬h∙πpa)∩NF∩{≤⊙}, and a M∞ that lies above it (in the preimage of Θ(π¬h∙πpa)∩NF), and it updates to hit Mu,∞ (our original point in the preimage of ugh(Θ)(πpa)∩NF). Now, we can proceed.
So… Mlo,∞ lies in the preimage of Θ(π¬h∙πpa)∩NF∩{≤⊙}. By hausdorff-continuity for Θ and the distance between (π¬h∙πpa) and (π¬h∙π′pa) being below δ because the distance between πpa and π′pa is below δ, and using our earlier thing about how a δ distance means a ϵC difference between the clipped preimages, we can find a point (Mlo,∞)′ in the preimage of Θ(π¬h∙π′pa)∩NF∩{≤⊙} that’s that close to Mlo,∞. To go up from Mlo,∞ to M∞ requires adding ((mu,∗)+,bu,∗+(mu,∗)−(1)) (with the measure component extended with the conditional probabilities of the measure component of Mu,∞, obviously).
Also, because the λ value of Mu,∞ is the λ value of Mu, which was made by adding Mu,∗ to an a-measure, an upper bound on the λ value of that a-measure we added onto Mlo,∞ is… λ. Corresponding to the extreme case where all the measure of Mu came from Mu,∗.
Now, we can craft a point (M∞)′ which lies in the preimage of Θ(π¬h∙π′pa)∩NF that’s only ϵC+δλ away from M∞. Why? Well, we can start with (Mlo,∞)′, which is only ϵC away from Mlo,∞, and take that positive-measure-thingy we added, and reshuffle the measure on it. With earthmover distance, the δ distance between (π¬h∙π′pa) and (π¬h∙πpa) corresponds to a time-threshold where they start to differ at logγ(δ), and you’re moving dirt a γlogγ(δ)=δ difference to account for having to land in the right preimage, and you’ve got λ at most dirt to move. Then, you just add (Mlo,∞)′ and your reshuffled measure, to get your point (M∞)′. Which is the sum of two components that only differ by ϵC and δλ from the components which sum to make M∞.
Ok, so we have a point M∞ in the preimage of Θ(π¬h∙πpa)∩NF, which updates to Mu,∞ that lies in the preimage of ugh(Θ)(πpa). And a point (M∞)′ in the preimage of Θ(π¬h∙π′pa)∩NF which is (taking into account that δ<ϵC) only ϵC(1+λ) distance away from M∞.
And now we can finish up, because the preimage of ugh(Θ)(π′pa)∩NF is the update of the preimage of Θ(π¬h∙π′pa)∩NF. So, we just update (M∞)′ to get a point (Mu,∞)′ in the preimage of ugh(Θ)(π′pa). And further, the distance between M∞ and (M∞)′ is only ϵC(1+λ) at most.M∞ updates to Mu,∞, and (M∞)′ updates to (Mu,∞)′. And we know that ugh has a Lipschitz constant of C (by being a continuous linear operator between Banach spaces), so Mu,∞ only has a distance of ϵ(1+λ) from a point in the preimage of ugh(Θ)(π′pa).
So, we get Hausdorff-continuity (the Lemma 15 variant).
Condition 8: Extreme Point Condition:
We had to defer this because π¬h∙πstisn’t a stub, so we can’t use the extreme point condition we had, and instead must regenerate it completely from scratch.
Our first step in this is showing ugh(Θ)(πst)∩NF=¯¯¯¯¯¯¯¯c.h(⋃π≥πstprπ,πst(ugh(Θ)(π)∩NF))
One subset direction is easy, the closed convex hull of projections of nirvana-free stuff must all be in ugh(Θ)(πst) by consistency which we’ve shown, and all must be nirvana-free. Now for the reverse direction. Let M∈ugh(Θ)(πst)∩NF By Lemma 27, this point wasn’t added in the closure, so it has a preimage point M′∈Θ(π¬h∙πst)∩NF. Using all our nice conditions for Θ, we can invoke Lemma 21 to get that M′∈¯¯¯¯¯¯¯¯c.h(⋃π≥(π¬h∙πst)prπ,(π¬h∙πst)(Θ(π)∩NF)), so we can fix a sequence M′n limiting to M where each M′n shatters into M′i,n that came from some M′∞i,n that’s nirvana-free and lies in the associated set of a full policy above π¬h∙πst.
Updating the M′n produces a sequence Mn which is nirvana-free, in ugh(Θ)(πst), and limits to M by continuity.
Updating the M′∞i,n into M∞i,n which lie in ugh(Θ)(πi)∩NF, projecting down to get Mi,n, and mixing them, produces Mn, by our usual Lemma 28 argument.
This witnesses that all the Mn lie in c.h(⋃π>πstprπ,πst(ugh(Θ)(π)∩NF))
Thus, M lies in the closed convex hull of projections of nirvana-free stuff from above. What do we do with this? Well, now we can invoke Lemma 20, since we have Hausdorff-continuity proved, to conclude that c.h(⋃π≥πstprπ,πst(ugh(Θ)(π)∩NF)) is closed, so we didn’t really need the closed convex hull (which we’ve already shown is the same as ugh(Θ)(πst)∩NF)
And we now know that ugh(Θ)(πst)∩NF=c.h(⋃π≥πstprπ,πst(ugh(Θ)(π)∩NF))
Now, we can take a minimal extreme nirvana-free point Mex in ugh(Θ)(πst). It must be minimal and extreme and nirvana-free in the original set. If it wasn’t minimal in the original set, all minimals below it would be nirvana-free too, witnessing its nonminimiality in the restricted set. And if it wasn’t extreme in the original set, then the points that mix to make it must all be nirvana-free too, since it’s nirvana-free, so we have a witness of non-extremeness in ugh(Θ)(πst)∩NF.
Ok, so it’s extreme and nirvana-free. It must also be extreme in the convex hull set, but, since it can’t be produced by mixtures, there’s a M∞ in someugh(Θ)(π)∩NF that projects down to Mex, establishing the extreme point condition.
That just leaves causality and pseudocausality.
Condition C: Causality
Ok, we pick a πpa and a point in ugh(Θ)(πpa) Can we make an outcome function for everything that includes our point? By our proof of full causality in the first part of the Isomorphism theorem (finite-to-full direction), this can be done as long as all other conditions are met and we can make an outcome function for any point in any ugh(Θ)(πst). So, let’s just establish finitary causality. Fix some πst and some M∈ugh(Θ)(πst).
Since M is in the updated set, there’s a sequence Mn that limits to M that we don’t need closure to get. There’s a λ and b bound on this sequence because it converges, call those bounds λ◯ and b◯. Now, we can take a M′n∈Θ(π¬h∙πpa) that updates to Mn. We can use causality for Θ to get an outcome function for M′n.
We don’t have to worry about nirvana-off-h, because M′n has no nirvana off-h, and the projection of M′n down to Θ(π¬h) preserves the off-h part, and is nirvana-free off-h, and everything above that (which is the only thing that determines the update) must also match the off-h part and certify that it’s nirvana-free.
Updating an outcome function back in produces an outcome function for ugh(Θ) by Lemma 28 (update then project equals project then update). Said outcome function for ugh(Θ) maps πst to Mn. We can restrict it to just stubs, to get an outcome function over stubs.
So, proceeding in this way, we get a sequence ofn of outcome functions for the stubs of ugh(Θ). Remember, outcome functions must match λ and b values, so the points for ofn have a λ and b value matching that of Mn, ie, less than λ◯ and b◯ since that’s our bound on the Mn sequence. this sequence ofn of outcome functions (picking out one point for each πst) can be thought of as an element of
∏π′st(ugh(Θ)(π′st)∩{(λμ,b)|λ≤λ◯,b≤b◯})
This is a product of compact sets (intersection of closed and compact sets by the Compactness Lemma) so it’s compact by Tychonoff. Thus, our sequence ofn of outcome functions has a subsequence with limit point of, and for all π′st (restricting n to the subsequence), limn→∞(ofn(π′st))=of(π′st). We have closure so all these limit points lie in their appropriate sets. In particular, of(πst)=limn→∞(ofn(πst))=limn→∞Mn=M So that checks out. Continuity of projections certifies that of is indeed an outcome function for stubs, because
And running through the proof of causality in the first part of the Isomorphism theorem, we get causality in general.
Condition P: Pseudocausality:
In the nirvana-free setting, fix a M∈ugh(Θ)(πpa), whose support is a subset of FNF(π′pa). Get a M′∈Θ(π¬h∙πpa) that updates to M. Its support is either on infinite histories of the off-h portion, or stuff in FNF(π′pa) (with an h prefix stuck on front), so it’s supported on FNF(π¬h∙π′pa), so M′∈Θ(π¬h∙π′pa) by pseudocausality, so then we update and get M∈ugh(Θ)(π′pa), certifying pseudocausality.
Almost done. Just invoke Lemma 24 to show that, after renormalizing, every nice property is preserved. We still do have to check that the renormalization we use is the proper renormalization to use. Our scale term for renormalization for updating is (maxπ>π¬hEΘ(π)(1★hg)−EΘ(π¬h)(0★hg))−1 and our shift term is EΘ(π¬h)(0★hg)
The scale term and shift term we should have for proper renormalization is (maxπEugh(Θ)(π)(1)−minπEugh(Θ)(π)(0))−1 and minπEugh(Θ)(π)(0) respectively. So let’s show they’re equal! We’ll be using Lemma 27, to get that every nirvana-free thing in ugh(Θ)(πpa) wasn’t added in the closure and has a preimage point.
Ok, so one of our normalization factors is correct. Let’s look at the second one.
minπEugh(Θ)(π)(0)=minπmin(m,b)∈ugh(Θ)(π)∩NF(b) Now, said minimal nirvana-free point projects down to π∅, the empty policy, preserving its b. Further, by Lemma 21, any point in π∅ with a lower b value, being nirvana-free, must be a finite mix of nirvana-free points from above projected down, so we get some nirvana-free point in some ugh(Θ)(π) with a too-low b, which is impossible, so we can swap out ugh(Θ)(π) with ugh(Θ)(π∅), getting
First, note that fh is defined as fh(h′)=f(hh′). So it basically copies f, but it’s subtly different because it has to account for the h prefix being sliced off in the update. Let’s unpack E(Θ|g,π¬h,h)(πpa)(fh) first.
Then, we can invoke Lemma 27 to realize that all nirvana-free points in the update came from nirvana-free points originally, so we can rewrite this (taking the renormalization terms into account) as
Proposition 11: If hh′ is a valid o-history, then for causal, pseudocausal, acausal, and surcausal hypotheses, (Θ|g,π¬h,h)|gh,π¬h′,h′=Θ|g,(π¬h∙π¬h′),hh′
Proof sketch: We’ll work with updates assuming no closure is done, and then once we’ve established our main result, we’ll show it with the closure part of updating. This is very long but it’s mostly just a lot of algebra grinding to show that mapping a suitable point in Θ(π¬h∙(π¬h′∙πpa)) through the two individual updates and the single big update makes the same point.
We’ll take a detour first and show that Pgh(Θ|g,π¬h,h),π¬h′(h′)⋅PgΘ,π¬h(h)=PgΘ,(π¬h∙π¬h′)(hh′)
Ok, that’s good enough for now, we’ll do more rewrites later. Unpacking the second term, again, with Lemma 27, we get… pretty much the exact same thing by the same sequence of rewrites, culminating in
First, observe that 1★h′gh and 0★h′gh (since we’re taking the expectation over stuff that’s had the h clipped off), can be written as (1★hh′g)h and (0★hh′g)h respectively, because the term 1★hh′g (or 0) is “1 (or 0) on hh’, g on h but off hh’, g off h”, so (1★hh′g)h is “1 (or 0) on h’, gh off h’” (from stripping off the h prefix), which is the same as 1★h′gh (or the 0 analogue). So, we can rewrite as:
Now, we should probably figure out how to rewrite (1★hh′g)★hg. This rewrites as (1★hh′g). Similarly, with (0★hh′g)★hg, it rewrites as (0★hh′g). Making these substitutions, we get
And we’re done, having shown that Pgh(Θ|g,π¬h,h),π¬h′(h′)⋅PgΘ,π¬h(h)=PgΘ,(π¬h∙π¬h′)(hh′)
Back to the bulk of the proof. First, we have to consider what points in Θ((π¬h∙π¬h′)∙πpa) can survive the pair of updates/single update. They have to have no nirvana that lacks h as a prefix, and no nirvana that has h as a prefix but lacks hh’ as a prefix. So, all the nirvana is after hh’. Let (m,b) be an appropriate point (lacking the nirvana, and in the right set) that we can shove through both updates/the single big update.
The induced point in ((Θ|g,π¬h,h)|gh,π¬h′,h′)(πpa) (minus the closure on both steps!) can be written as… well, it’s kinda big, we’ll break down the measure component and b component
1Pgh(Θ|g,π¬h,h),π¬h′(h′)c((1PgΘ,π¬h(h)c(m|h))|h′)
And the b component is still too big, we’ll split it up into two parts.
It may be nonobvious that this is the update, but the big fraction in front is the scale term for the second update which applies to everything, the measure term is the update of the measure term for the first update, the first b part is the b term that’s produced after the first update, and the second b part is the stuff added to the b term from the first update. Now, let’s break this down a bit. For the first measure term, we can pull out the inner scale term and use our result on what happens when you multiply the scale terms to get
This is because the first term is the process: Take m, strip out all parts of the measure that don’t have h as a prefix, clip off the h prefix, then strip out all parts of that measure that don’t have h’ as a prefix, and clip off h’. Which is the same as taking m, stripping out all parts that don’t have h as a prefix, stripping out all parts that don’t have hh’ as a prefix, and clipping off hh’ (the second term). And this is the same as taking m, stripping out all parts that don’t have hh’ as a prefix, and clipping off hh’ (the third term)
So, our final rewrite of the measure term is: 1PgΘ,(π¬h∙π¬h′)(hh′)c(m|hh′)
Now, let’s address the first b part. We can easily just pull the scale term out to rewrite it as:
1PgΘ,(π¬h∙π¬h′)(hh′)(b+m(0★hg)−EΘ(π¬h)(0★hg))
Which should be good enough for now. Moving on to the second b term,
Which is exactly what you’d get from pushing (m,b) through the single big update function with g,(π¬h∙π¬h′),hh′.
Ok, so we’ve shown that ((Θ|g,π¬h,h)|gh,π¬h′,h′)(πpa)=(Θ|g,(π¬h∙πh¬h′),hh′)(πpa) for all πpa, but that’s just for the update with renormalization and without closure. How do we show “take the closure at the intermediate step and end” for the two updates and “take the closure at the end” for the big update are the same? Easy. Updates are continuous, so if we take the closure at the end, the preimage (of the final closed set for the second small update) is a closed superset of the image (for the first small update), so taking the closure there adds no new points. So, the closures don’t affect anything, and we have our theorem.
Theorem 5: Belief Function Bayes:For pseudocausal and acausal hypotheses, if there’s some i s.t.Θi|g,π¬h,h exists and is nontrivial, then
This says that updating a prior works exactly as you’d expect. You rescale every updated component according to its “probability” relative to the “probability” the prior assigns to the observation, and mix them together. Because mixes may not be renormalized, you then just throw in a single scale-and-shift (which doesn’t affect things) and you’re done.
Proof sketch: A mixture of hypotheses, when renormalized, is a hypothesis, and a hypothesis updated, is a hypothesis. Because of consistency, a sufficient test for the two belief functions being equal is if we can show that they’re equal for all π, because all lower levels are uniquely generated from that. A further observation we can make is that (Eζ(PgΘi,π¬h(h)))−1 is a scaling term, so all we really need to do is to show that
because the renormalization compensates for the absence of that scale term.
The actual proof itself relies on first showing that we have a sufficient condition for the two sets to be well-defined in the form of nontriviality post-update. And then, one of the most aggrivating parts of this proof is keeping track of all the scale-and-shift terms for the updates and renormalization and showing how they all interact, and we spend a while doing that and laying groundwork. The basic proof path is fixing a batch of points in the Θi and pushing them through one side, and making a batch of points that hit the same resulting point when we push them through the other side. We must show both directions. We can’t just say “pushing them through the second process makes the same point” because that’s not true, we’ll need to exploit upper completion to build a different batch of points to account for Θi with 0 “probability”, because updating crashes there and we “lose” those points. This is another long algebra-grinding proof, like Proposition 11.
Proof: First, we should show some details about nontriviality. our original definition was that, for a Θ to be nontrivial, there’s a π st. EΘ(π)(1)≠EΘ(π)(0). By Proposition 6 in Section 1, triviality of Θ is equivalent to there being a single minimal point of the form (0,b) for all Θ(π) (b may vary depending on π, though) Now, our starting condition was that there’s some Θi you can update and not have it crash, so that means the Eζ(PgΘi,π¬h(h)) term is nonzero, so that’s one possible source of failure eliminated. Said nontrivial Θi|g,π¬h,h, since it has nonzero “probability” implies that the mixture has some π with different expectation values, so we can safely renormalize the mixture of updated belief functions without running into issues.
Also, updating a trivial Θi makes a trivial Θi|g,π¬h,h, because for all π, Θi(π¬h∙π) has one minimal point of the form (0,b), and everything else is an a-measure added to that, so updating is equivalent to updating (0,b) and updating your a-measure, so the updated set (without renormalization) has a single minimal point of the form (0,b) (the measure component being 0 means it can’t influence the b term), and then a scale-and-shift means your new set has a single minimal point of the form (0,b′), ie, your updated Θi is trivial. So, if there’s an i where Θi|g,π¬h,h is nontrivial, then Θi must be nontrivial. And, by “there’s a single nontrivial hypothesis” being a sufficient condition for being able to mix sets and renormalize them, (EζΘi)R is well-defined.
Admittedly, we haven’t touched the update of the renormalized set yet, but we’ll address that part later. For now, just remember that the renormalization of EζΘi doesn’t crash, and neither does the renormalization of Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h))
And that’s enough for now. Also, since we’re working in the nirvana-free setting, we can use Lemma 27 to show that we don’t need the closure part of updating.
Let Idef be the subset of the i where PgΘi,π¬h(h)>0 (ie, those where the update is defined) It’s nonempty. Now, reindex the probability distribution and hypotheses so 0∈Idef.
For an i∈Idef, let the set {i→} be the largest set of the form {i+1,i+2...} where i+1,i+2... are all not in Idef. Intuitively, {i→} is the largest contiguous string of numbers after i where PgΘi,π¬h(h) is zero if there is such a string.
One of the most aggrivating parts of this proof if we don’t do some housekeeping work first is keeping track of all the renormalization terms for all the updates and mixtures. Let’s introduce some notation for these and derive relationships between them.
α1:=maxπ>π¬hE(EζΘi)R(π)(1★hg) and β1:=minπ>π¬hE(EζΘi)R(π)(0★hg)
α2:=maxπE(EζΘi)(π)(1) and β2:=minπE(EζΘi)(π)(0)
α3:=maxπ>π¬hE(EζΘi)(π)(1★hg) and β3:=minπ>π¬hE(EζΘi)(π)(0★hg)
αi:=maxπ>π¬hEΘi(π)(1★hg) and βi:=minπ>π¬hEΘi(π)(0★hg)
And we need to take a moment to note that for the next two, π′ isn’t a normal policy, it’s one of those post-update policies that you have to glue to π¬h to get an actual policy.
α4:=maxπ′E(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))(π′)(1) and β4:=minπ′E(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))(π′)(0)
1α1−β1 and β1 are the scale-and-shift terms for updating the renormalized mixture of belief functions, 1α2−β2 and β2 are the scale-and-shift terms for renormalizing the mixture of belief functions, 1α3−β3 and β3 are the scale-and-shift terms for updating the raw mixture of belief functions, 1αi−βi and βi are the scale-and-shift terms for updating an individual belief function, and 1α4−β4 and β4 are the scale-and-shift terms for renormalizing our mixture of updated belief functions.
By our earlier considerations about nontriviality, α4≠β4, and α2≠β2, and there’s some i where αi≠βi (and in particular, 0 is one of those i) We’ll show after a bit more work, that α2≠β2 and α3≠β3, so none of the scaling terms have a divide-by-zero error except, for some i, αi=βi (maybe)
Ok, so we have α1=α3−β2α2−β2. The exact same argument, just with appropriate terms switched around, establishes that β1=β3−β2α2−β2 Remember, α2≠β2 so the scale term doesn’t blow up.
For the next one, we’ll need the crucial fact that if updating fails (ie, αi=βi), then after raw updating (but before renormalization), for all policies, ugh(Θi)(π) has a single minimal point that’s of the form (0,βi).
And then, we can use the fact that, if renormalization fails, regardless of the π′, ugh(Θi)(π′) is composed of a single minimal point of the form (0,βi) to get
So α4=α3−Eζβi and for β4, it’s the exact same thing and same argument, β4=β3−Eζβi
Let’s verify that α3−β3≠0. See that α3−β3=α4−Eζβi−β4+Eζβi=α4−β4≠0 (we already know that α4≠β4)
And we can verify that α1−β1≠0 by α1−β1=α3−β2α2−β2−β3−β2α2−β2=α3−β2−β3+β2α2−β2=α3−β3α2−β2≠0
(we already know that α3−β3≠0, we just showed it, and we know that α2−β2≠0 so that scale term doesn’t crash)
We’ll show two directions of the proof. The first direction is, we take a bunch of (mi,bi)∈Θi(π¬h∙π) and feed them through the second process (update, mix with corrected probabilities, then renormalize), then show we can craft a bunch of (m′i,b′i)∈Θi(π¬h∙π) that, when fed through the first process (mix, renormalize, then update as a whole), produce the same point.
Feeding our (mi,bi)∈Θi(π¬h∙π) through the updates produce the points:1αi−βi(c(mi|h),bi+mi(0★hg)−βi) Well, only for i∈Idef. Otherwise the update crashes.
Mixing them produces: ∑i∈Idefζi(αi−βi)1αi−βi(c(mi|h),bi+mi(0★hg)−βi)
Which cancels to make: ∑i∈Idefζi(c(mi|h),bi+mi(0★hg)−βi)
This can be reexpressed as: (∑i∈Idefζic(mi|h),∑i∈Idefζi(bi+mi(0★hg)−βi))
And now, post-renormalization for the mixture, we get:
Our task is now to hit that exact point by feeding some appropriate batch of points through the mix, renormalization, and then update. If i∈Idef, then let (m′i,b′i)=(mi,bi). Otherwise, let (m′i,b′i) be some suitable point that raw-updates to (0,βi). In particular this means that m′i|h=0 and b′i+m′i(0★hg)=βi.
Anyways, mixing the (m′i,b′i) and renormalizing produces 1α2−β2(Eζm′i,Eζb′i−β2)
So, pushing a collection of points through the updates individually and mixing with corrected probabilities and renormalizing can be replicated by mixing a different batch of points in the Θi(π¬h∙π) first, renormalizing, and updating, establishing that
That just leaves the other direction. This time we’ll be using (mi,bi) for stuff in the Θi(π¬h∙π) sets that are pushed through “mix, renormalize, update”, and (m′i,b′i), i∈Idef for points in Θi(π¬h∙π) that are pushed through “update, mix with corrected probabilities, renormalize”, to attempt to hit the same point.
Ok, let’s express our point of interest to hit in terms of the (mi,bi). First, mixing the (mi,bi) and renormalizing produces 1α2−β2(Eζmi,Eζbi−β2)
And now, since α1=α3−β2α2−β2 and β1=β3−β2α2−β2, we can simplify (α1−β1)(α2−β2) as α3−β3, and simplify (α2−β2)β1 as β3−β2, to rewrite as
=1α3−β3(Eζc(mi|h),Eζ(bi+mi(0★hg))−β3)
Now, let’s define our (m′i,b′i), when i∈Idef (to have them not get canceled out of existence by an undefined update/multiplication-by-zero)
m′i:=mi+∑j∈{i→}ζjζi(mj|h)
b′i:=bi+∑j∈{i→}ζjζi(bj+mj(0★hg)−βj)
What this essentially does is take (mi,bi) and adds a specially chosen a-measure to it, getting another point in the same set by upper-completion. Because ζi is nonzero, and the mixture of the (mi,bi) converges, the scale term doesn’t affect the fact that this partial sum converges. Adding in the expectation-of-function terms doesn’t affect convergence, and it’s a sum of scaled a-measures because bj+mj(0★hg)−βj is just the b term of the raw update of (mj,bj) but shifted down, ie, nonnegative.
This may appear a bit mysterious, but the rationale behind it is “dang, we’re only summing up over i∈Idef, we need to take our “missing” (mj,bj), and stash them in our “safe” i somehow (via the wiggle room we get from upper-completion) so they can manifest post-update”
Feeding our (m′i,b′i)∈Θi(π¬h∙π) through the updates produce the points:
1αi−βi(c(m′i|h),b′i+m′i(0★hg)−βi) But only for i∈Idef.
Mixing them produces the point: ∑i∈Idefζi(αi−βi)1αi−βi(c(m′i|h),b′i+m′i(0★hg)−βi)
Which cancels to make: ∑i∈Idefζi(c(m′i|h),b′i+m′i(0★hg)−βi)
This can be reexpressed as: (c(∑i∈Idef(ζim′i)|h),∑i∈Idefζi(bi+mi(0★hg)−βi))
And now, post-renormalization for the mixture, we get:
(this previous step is because 0★hg is 0 on histories with h as a prefix, and mj|h is only supported on histories with h as a prefix, so the expectation is 0, and this extends to the sum of expectations)
and uniting this with our rewritten measure term, we get:
1α4−β4(Eζc(mi|h),Eζ(bi+mi(0★hg))−Eζβi−β4)
Now, let’s compare against
=1α3−β3(Eζc(mi|h),Eζ(bi+mi(0★hg))−β3)
We already know from our earlier results on α and β that α4−β4=α3−β3, and that −β3=−β4−Eζβi, so our equivalence is complete.
So, mixing a batch of points in the Θi(π¬h∙π), renormalizing, and updating, can be replicated by pushing a different collection of points through the updates individually and mixing with corrected probabilities and renormalizing, establishing that for all π
Theorem 6: Dynamic Consistency: Given a hypothesis Θ (causal, pseudocausal, acausal, surcausal), and an arbitrary policy π and utility function U, then, with πh being the continuation of π post-update and π∗ being such that E(Θ|U,π¬h,h)(πh)(Uh)⪋E(Θ|U,π¬h,h)(π∗)(Uh) , then EΘ(π)(U)⪋EΘ(π¬h∙π∗)(U)
Proof sketch: We’ll need to shatter this into two parts. The first part is if the update is undefined. Then the agent gives up and cries, and all policies are equally good. So we have to show that regardless of what the agent does after h, then it matches the performance of the original policy. The second part is showing the result for a well-defined update. It’s mostly shuffling equations around.
For the first part, updates fail exactly when, for all policies, uUh(Θ)(π)∩NF has a single minimal point of the form (0,β) (same β for all policies). We’ll be using this. Also, πh will be used to denote what the policy π does after observation h, so we have π=π¬h∙πh.
So EΘ(π)(U)−EΘ(π¬h)(0★hU)⪋EΘ(π¬h∙π∗)(U)−EΘ(π¬h)(0★hU)
and we can finish up and get dynamic consistency. EΘ(π)(U)⪋EΘ(π¬h∙π∗)(U)
Theorem 7: Maximin UDT:Translating a set S of policy selection environments with a bounded modulus of continuity to an acausal hypothesis Θ always works. Also, for all utility functions U,argmaxπinfe∈SEπ⋅e(U)=argmaxπEΘ(π)(U)
Ok, so a policy selection environment e is a continuous (in the policy) function (A×O)<ω×Π→ΔO
If you really want, like, for the planting flowers problem, you can have some probability of nonexistence that’s policy-dependent, and a backup utility in case of nonexistence, though both still must be continuous in policy, by going “ok, there’s a primordial event that leads to either the distribution starting, or I get event ⊥ with b≤1 utility”, this can be crammed into the necessary framework.
A policy selection environment looks at what has happened thus far, and your policy, and picks some distribution over observations. For a single policy-selection environment, π→π⋅e is uniformly continuous. This is because, if you fix a time length t, there’s finitely many histories of length t or less. For each of these histories, there’s a δ where two policies identical up till time logγ(δ) produce only an ϵ change in ΔO (continuity means that different policies that are identical up till some sufficiently long time induce only a small change in what happens now). So, we can go “policies π,π′ that identical up till some stupidly long time mean that, for the first t steps, there’s very little change in what happens”. t can be made as long as we wish, and ϵ can be made as low as we wish, so for all ϵ, there some δ where, if d(π,π′)<δ, then π⋅e and π′⋅e are within ϵ of each other.
Bounded modulus of uniform continuity means that there’s a single δ/ϵ function that works for all your policy selection environments of interest. Ie, no matter which environment was selected, you know how long policies need to be identical for to make only an ϵ difference in the resulting distribution over histories.
Encode each π⋅e history distribution as having λ=1 and b=0 Considering the set of (π⋅e,0) points as points for Θ?ω(π), we have
We do need to show that Θ?ω fulfills the essential properties for being able to turn it into an acausal hypothesis via Proposition 2. There’s four. Nonemptiness, restricted-minimals, Hausdorff-continuity, and renormalization not failing. Nonemptiness is trivial. Restricted-minimals is easy because every point in Θ?ω(π), regardless of π, has λ=1 and b=0. Hausdorff-continuity can be shown by the set of environments having a bounded modulus of continuity, so given any (π⋅e,0)∈Θ?ω(π), we can reuse that e and there’s a δ where d((π⋅e,0),(π′⋅e,0))=d(π⋅e,π′⋅e)<ϵ, and (π′⋅e,0)∈Θ?ω(π′), so Hausdorff-continuity follows.
That just leaves being normalizable. This occurs if there’s a nontrivial Θ?ω(π), ie, EΘω(π)(1)≠EΘω(π)(0). This is obviously true, because the former is “minimal value of λ+b (always 1)”, and the latter is “minimal value of b” (always 0).
So, by proposition 2, we create an acausal Θω hypothesis from our Θ?ω. From proposition 5, we then get argmaxπinf(m,b)∈Θ?ω(π)(m(U)+b)=argmaxπmin(m,b)∈Θω(π)(m(U)+b)
And then, since we have an acausal infinitary hypothesis, we can use the Isomorphism theorem to get that Θ(π)=Θω(π), so argmaxπmin(m,b)∈Θω(π)(m(U)+b)=argmaxπmin(m,b)∈Θ(π)(m(U)+b)
And finally, we can wrap up with Proposition 5 that said argmax set of policies actually exists, showing
And we’re done with UDT-copying. And, if you want, you can translate it into a surcausal hypothesis, and into a set of a-survironments from there.
Proposition 12:If the collection of hypotheses Θi is learnable, then any Infrabayes-optimal policy family for a prior on them also learns the collection of hypotheses as well.
First, we’ll recap learnability. Learnability of a countable collection of belief functions by a γ-indexed family of policies πγ is the condition that for each Θi, regret limits to 0. (we’ll use Uγ for our utility function with time-discount parameter γ) Ie,
∀i:limγ→1(maxπ∗(EΘi(π∗)(Uγ))−EΘi(πγ)(Uγ))=0
So, in the low-time-discount limit, you get a score arbitrarily close to that of an optimal agent that knows exactly what environment it’s playing against.
An Infrabayes-optimal policy family for a prior/mixture of belief functions is one where
π∗γ∈argmaxπE(EζΘi)R(π)(Uγ)
Such an argmax set exists by Proposition 5. Further, any scale-and-shift just does a scale-and-shift on the values a policy will achieve and leaves the argmax set alone, so we could get an alternate representation as: π∗γ∈argmaxπE(EζΘi)(π)(Uγ)
So, assume that a countable family of belief functions paired with a utility function U is learnable by some family of policies πγ. We’ll show that it’s also learnable by any bayes-optimal π∗γ family for the prior (EζΘi)R.
First, πγ learns the family of hypotheses,∀i:limγ→1(maxπ∗(EΘi(π∗)(Uγ))−EΘi(πγ)(Uγ))=0
This implies limγ→1Eζ(maxπ∗(EΘi(π∗)(Uγ))−EΘi(πγ)(Uγ))=0
Because you only have to go finitely far out to nab all but ϵ of the probability mass of the expectation, and you can pick some γ extremely close to 1 that ensures that all those finitely many environments have ϵ or less regret.
We can now move the expectation inside to get:
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−Eζ(EΘi(πγ)(Uγ)))=0
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−E(EζΘi)(πγ)(Uγ))=0
Now, E(EζΘi)(πγ)(Uγ)≤E(EζΘi)(π∗γ)(Uγ) because π∗γ is optimal for the prior, so it’s optimal for any rescaled version. So,
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−E(EζΘi)(π∗γ)(Uγ))=0
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−Eζ(EΘi(π∗γ)(Uγ)))=0
limγ→1Eζ(maxπ∗(EΘi(π∗)(Uγ))−EΘi(π∗γ)(Uγ))=0
Now, symmetrically, if π∗γ doesn’t limit to 0 regret on all belief functions, then the expectation doesn’t limit to 0 either.
∀i:limγ→1(maxπ∗(EΘi(π∗)(Uγ))−EΘi(π∗γ)(Uγ))=0
and we have shown that our arbitrary Infrabayes-optimal family of policies learns the environments.
Complete Class Theorem Weak Version:Given any pareto-optimal policy π, then there is an infradistribution H over states, where ∀π′:fπ′≠fπ:EH(fπ)>EH(fπ′)
Proof sketch: Because we are able to translate from concave lipschitz monotone normalized functionals over [0,1]S to infradistributions over states, we just have to get a concave lipschitz monotone functional where our policy π is optimal, and then it can be normalized back up to 1, and then it can be turned into an infradistribution over states by LF-duality. Said concave lipschitz monotone functional is:
h(f):=mins∈S(f(s)−Eo∼ob(s)P(s,π(o)))
We just need to show that the function is indeed concave, Lipschitz, monotone, and assigns no policy a higher expectation value than the Pareto-optimal policy, because all these properties are preserved by a scale-and-shift.
Proof of Lipschitzness: If you perturb f by ϵ or less in all states, then this only affects the minimal value by ϵ or less, so we actually have a Lipschitz constant of 1.
Proof of monotonicity: If f matches or outperforms f′ in all states, than the possible values that min is picking amongst all went up, so f gets an equal or higher value, showing monotonicity.
Proof of π getting the optimal value: π is on the pareto-frontier, so there is no other policy π′ that gets equal-or-greater value in all states. Thus, given any other π′, there is a state in which it underperforms the reward of π in that state, so the quantity is negative. And for π itself, it obviously matches the behavior of π in all states, so it gets a value of 0. Thus, π gets the strictly optimal score amongst policies against this h, so the same holds after renormalization, and then the same holds for the expectation values w.r.t. the infradistribution.
Proofs Section 2.3 (Updates, Decision Theory)
Here are the previous two posts.
Now, what about updates? We’ll use ugh (and suppress the π¬h that should be there) as shorthand for the function that maps (m,b) over Θ(π¬h∙πpa) to (c(m|h),b+m(0★hg)) in Ma(F(πpa)) (or the nirvana-free or sur variant of this), and also use ugh as a function from belief functions to belief functions (just map all the sets through)
Lemma 27: When updating, the closure adds no nirvana-free points that weren’t present originally if Nonemptiness, Nirvana-free upper-completion and closure holds originally (works in the sur-case too)
Proof sketch: We take a sequence Mn limiting to M, and then take a preimage point of Mn, go to a minimal below it, find a limit point in our original set by Compactness, and map it back through the update, getting a point below M. Then, we find what we need to add to that to get M, and find something above our limit point that maps to M, so we didn’t actually need closure anyways because we made M as an image of a nirvana-free point present in the original set.
Proof: Fix a sequence Mn in ugh(Θ)(πpa) (but without the defining closure part in the end) that limit to M which is nirvana-free.
Every Mn has a preimage point M′n∈Θ(π¬h∙πpa) with no nirvana off-h. For each M′n, find a minimal point M′lon below it, which have a λ⊙+b⊙ bound by bounded-minimals, so we can find a convergent subsequence limiting to M′lo (actually, might not be minimal, still a limit of minimal points, though). Shoving the M′lon (and the limit point) back through the update (which is a continuous function), we get a sequence Mlon limiting to Mlo (the thing you get from pushing M′lo through the update).
Since M′n lies above M′lon (upper-completion ordering), then updating preserves that property, because the update function is linear. Thus, all the Mlon lie below their corresponding Mn. Now, we can invoke Lemma 16 to conclude that Mlo lies below M. It lies below a nirvana-free point, so M′lo is nirvana-free as well. Now, we just need to show nirvana-free upper-completion because M=Mlo+M∗.
We can take M′lo and add on (m∗,b∗) (extend the measure back to the original domain by sticking an h prefix on everything, and saying that the measure is 0 everywhere else), making an a-measure that’s in Θ(π¬h∙πpa), by nirvana-free uppper-completion there. By linearity, and the update not affecting (m∗,b∗) (it’s 0 outside of h so the g outside of h doesn’t get to contribute anything to the b term when we update), updating M′lo+(m∗,b∗) makes Mlo+M∗=M. So, if a nirvana-free point appears post-update (with closure), then it’ll appear post-update (without closure).
Lemma 28: raw-update-then-project equals project-then-raw-update.
Take some (m,b). We want to show that:
prπhipa,πlopa∗(c(m|h),b+m(0★hg))
=(c(pr(π¬h∙πhipa),(π¬h∙πlopa)∗(m)|h),b+(pr(π¬h∙πhipa),(π¬h∙πlopa)∗(m))(0★hg))
First, (pr(π¬h∙πhipa),(π¬h∙πlopa)∗(m))(0★hg)=m(0★hg) This is because the projection down doesn’t change the measure at all outside of h, and we’re evaluating a function that’s 0 inside h. So, that takes care of the b term. Also, projection preserves the b term, so our desired equality is:
(prπhipa,πlopa∗(c(m|h)),b+m(0★hg))=(c(pr(π¬h∙πhipa),(π¬h∙πlopa)∗(m)|h),b+m(0★hg))
For the measure term, the first is “restrict to on-h histories, clip off the h prefixes, and project down”, and the second is “project down the on-h histories accordingly, then restrict to on-h histories and clip off the h prefix”, which are obviously equal.
Proposition 9: For causal, surcausal, pseudocausal and acausal hypotheses, updating them produces a causal, surcausal, pseudocausal or acausal hypothesis as long as renormalization doesn’t fail.
Proof sketch: What we can do is consider the “raw update”, show that it preserves all nice properties except for renormalization, and then show that the renormalization terms in the update are the proper renormalization terms to use. Thus, we’ll define our raw update ugh(Θ) via: ugh(Θ)(πpa) is: Θ(π¬h∙πpa)∩NF off h, mapped through the following function: (m,b)↦(c(m|h),b+m(0★hg)) And then you take the closure of the resulting set at the end.
So, we take our partial policy πpa and glue it to the off-h-partial policy π¬h, go to that part of the original belief function, strip off everything that has nirvana off-h (for Murphy shall not select those, and properly, that should make the b term infinite post-update), slice off the part of the measure off-h, strip off the h prefix from those histories, and go “ok, our utility function is g, let’s take the expected utility off-h and fold it into the b term”
If we can get all nice properties but renormalization, we’re good, just appeal to Lemma 24. As for showing the conditions: We’re in for something almost as bad as one of the directions of the Isomorphism theorem. Nonemptiness, closure, and convexity are trivial, upper completion, pseudocausality, and bounded-minimals are easy and the extreme point condition and causality are moderately tricky.
For the extreme point condition, Step 1 is establishing that ufh(Θ)(πpa)∩NF equals the closed convex hull of nirvana-free projections from above by an argument that makes sense when you sketch it out but may be difficult if you don’t have it sketched out, step 2 is using Hausdorff-continuity and Lemma 20 to turn it into an ordinary convex hull, and finally, arguing that a nirvana-free exteme point must have come from a nirvana-free point from above via step 2.
For causality, we can (for the most part) just go back to Θ, get an outcome function there, and map it back through the update to get an outcome function, the hard part is netting the limit points, which requires a limit of outcome functions. But because we want a countable product of sets to get sequential compactness from Tychonoff, we have to work with stubs, which adds some extra complexity.
Hausdorff-continuity is just hellishly hard, we need to show that the preimages of the sets post-update are the updates of the preimages of sets pre-update, and then combine that with some fancy work with minimal points and upper completions and using two different characterizations of uniform continuity at once via Lemma 15, and a touch of functional analysis. There’s way too many interacting points and sets in this one.
But easily the biggest grind is Consistency. We have 4 subset directions to show, each of which requires their own separate fancy argument, and two of them require splitting into a nirvana-containing/causal case and a nirvana-free case, so it’s a 6-part proof. A good chunk of complexity arises because we have to take closure in the nirvana-containing case, an issue which goes away if we just let Nirvana be 1 reward forever. Let’s begin.
Condition 1: Nirvana-free Nonemptiness:
This is trivial. Just pick a nirvana-free point in Θ(π¬h∙πpa) by nirvana-free nonemptiness, and update, to get one in ugh(Θ)(πpa).
Conditions 2,3: Closure, Convexity:
Closure is a tautology since we took the closure. For convexity, the closure of a convex set is convex, and ugh is a linear function, so it maps convex sets to convex sets.
Condition 4: Nirvana-free Upper-Completeness:
First, invoke Lemma 27 to see that all nirvana-free points must have been present in the raw ugh(Θ(π¬h∙πpa)) set to begin with, without the closure. What we want is that, if M′=M+M∗, and M lies in the raw updated set and is nirvana-free, and M′ is a nirvana-free a-measure, then M∗ lies in the updated set as well.
Find a M′′ that maps to M after updating. It must be nirvana-free, because the nirvana either occurs without h as a prefix (which is forbidden because all that stuff gets clipped off and doesn’t get pushed through the update), or the nirvana occurs with h as a prefix, but then it’d show up in the measure component of M post-update, contradicting its nirvana-freeness. Now, we can consider M′′+(m∗,b∗) (basically, M∗, but we take the measure back by sticking an h prefix on everything, and saying that it’s 0 off-h). This is present in Θ(π¬h∙πpa), by nirvana-free upper completion. By linearity of updating, and m∗ having no measure in any off-h area where it’d get picked up by g, this updates to M+M∗, witnessing that M′ lies in the image of the update, so we get nirvana-free upper completion.
Condition 5: Bounded-Minimals:
For bounded-minimals, we can pull the Lemma 16 trick of taking our M of interest that Mn limit to, taking a preimage M′n for each Mn, finding a minimal M′lon below each M′n (which obeys a λ⊙+b⊙ bound and also has no nirvana off-h), getting a limit point M′lo (still no nirvana off-h) by compactness, and pushing the sequence through the update, to get a sequence Mlon below Mn limiting to Mlo which is below M (Lemma 16) Now, we just have to check up on the λ+b values of our Mlon sequence, and show that they respect the λ⊙+b⊙ bound, to transfer this to Mlo. The raw update deletes measure from off-h, and assigns it the value that g does, which is 1 or less, so any increase in b correspond to an equal-or-greater decrease in λ, so the Mlon all obey the λ⊙+b⊙ bound as well. Thus, the limit point Mlo obeys the bound, and it’s below our original M, so any minimal must obey the λ⊙+b⊙ bound.
Condition 7: Consistency:
This is going to be extremely tedious and difficult to show, it’s a 6-part proof. The first 3 parts are devoted to showing that ugh(Θ)(πpa)=¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π)))
Part 1 is showing ugh(Θ)(πpa)⊆¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π))) In the nirvana-free pseudocausal/acausal case.
Let M be in ugh(Θ)(πpa). By Lemma 27, since we’re working in the nirvana-free case, we didn’t need to take the closure, it won’t add any points that aren’t there anyways. So, M has a preimage point M′∈Θ(π¬h∙πpa) that maps to it. By consistency for Θ, M′ lies in the closed convex hull of projections of policies down from above, so there are points in the convex hull of projections of policies that are arbitrarily close to M′, fix some sequence M′n of points in the convex hull of projections down from policies above that limit to M′. Mapping these through the raw update (which is continuous) we get a sequence Mn of points in ugh(Θ)(πpa) that limit to M.
All these policies above (π¬h∙πpa) have the form (π¬h∙π). So, M′n can be written as a mix of finitely many M′i,n, which are the projections of M′∞i,n from above, in policies. Update those, getting points M∞i,n in ugh(Θ)(π). These project down to Mi,n, which mix back together to make… Mn. This is because of Lemma 28, that update-then-project equals project-then-update. Also, mix-then-project equals project-then-mix. Remember, Mn is made by: “Project M′∞i,n down to make M′i,n, then mix to make M′n, then update.”
So, we can go project-mix-update equals mix-project-update equals mix-update-project equals update-mix-project equals update-project-mix, which is the process “update the M′∞i,n to M∞i,n, project down to Mi,n, mix to Mn”
The first equality is linearity of projection, the second equality is Lemma 28, the third equality is linearity of updating, the final equality is linearity of projection again.
Anyways, taking stock of what we’ve done, we have a sequence Mn limiting to our M of interest, and every Mn is crafted by taking points from finitely many ugh(Θ)(π), projecting them down, and mixing them. Therefore, our M∈ugh(Θ)(πpa) lies in the closed convex hull of projections down from above.
Part 2: we’ll show this again, but in the nirvana-containing case, where we’ll leverage causality.
Fix a M∈ugh(Θ)(πpa) (with closure). There’s a sequence Mn that limits to it, that lies in the same set, but without closure, so we can take preimage points M′n∈Θ(π¬h∙πpa) that update to make Mn. By causality, fix some arbitrary policy above (π¬h∙πpa), which can be expressed as (π¬h∙π), where π≥πpa. Anyways, we can take M′n, and use causality to get an outcome function of, to get a M′∞n∈Θ(π¬h∙π) that projects down to M′n. We don’t have to worry about nirvana off-h, because M′n already specifies everything that happens off-h and it says no nirvana occurs in that case. So, M′∞n can be updated to make a M∞n in ugh(Θ)(π). By Lemma 28, this must project down to Mn. So, all our Mn lie in the projection of ugh(Θ)(π), and since M is a limit point of that sequence, it must lie in the closed convex hull of projections.
And we’ve shown that ugh(Θ)(πpa)⊆¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π)))
And have taken care of 2 of our 6 parts. Now for the reverse direction, that
ugh(Θ)(πpa)⊇¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π)))
Thankfully, this can be done with a general argument that isn’t sensitive to the presence of Nirvana.
Part 3: Fix a M in the closed convex hull, which has a sequence Mn limiting to it that’s in the convex hull of projections down from above. the Mn shatter into finitely many Mi,n, which are projections of M∞i,n∈ugh(Θ)(πi). Now, these aren’t necessarily preimage points, they may have been added in the closure. Thus, we can perturb by 2−n or less if needed to make a M′∞i,n which does have a preimage point. Projecting these down to M′i,n and mixing, crafts a M′n point that is within 2−n of Mn (remember, projection doesn’t expand distance), so the sequence M′n still has M as a limit point (it gets arbitrarily close to a sequence that gets arbitrarily close to M). If we can show that all the M′n lie in ugh(Θ)(πpa), then by closure, we’ll get that M lies in the same set so we’re done.
Ok, so we have M′∞i,n∈ugh(Θ)(πi), that project down and mix to make M′n, and importantly, we crafted them so they’re produceable without closure. Thus, they have preimage points M′′∞i,n∈Θ(π¬h∙πi) (that lack nirvana off-h) Project them down to make M′′i,n∈Θ(π¬h∙πpa), and mix them to make a M′′n in the same set (which still lacks nirvana off-h), and this updates to make M′n via Lemma 28, as we’ll show shortly.
Starting with the M′′∞i,n, we know that update, project, mix equals M′n via going M′∞i,n, M′i,n, M′n. Then, update-project-mix equals project-update-mix equals project-mix-update, which is the path we took. Therefore, all the M′n lie in ugh(Θ)(πpa), which is closed, so M (arbitrary in the closed convex hull of projections) lies in the same set, establishing the reverse subset direction and thus equality,
ugh(Θ)(πpa)=¯¯¯¯¯¯¯¯c.h(⋃π≥πpaprπ,πpa∗(ugh(Θ)(π)))
Part 4: Now that we’re halfway done,let’s look at the “intersection of preimages of stubs from below” direction of consistency, ugh(Θ)(πpa)⊆⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa)). If we ignore the closure part and work with the raw update set sans closure, we can fix a M in ugh(Θ)(πhipa), take a preimage point in Θ(π¬h∙πhipa), project it down to Θ(π¬h∙πlopa) by consistency, then update it to get exactly the projection of M (again, Lemma 28) Then, when we take the closure, we can just take our M in the closure, fix a sequence in the raw update set sans closure Mn that limits to M, project down, getting M′n in the raw update set ugh(Θ)(πlopa) sans closure, and then the limit point M′ lies in ugh(Θ)(πlopa) by closure, and by continuity of projection, M′ is the projection of M.
Since the sets get bigger as you go down, we can invoke Lemma 6 to swap out the intersection of preimages of all stubs below you, for the intersection of preimages of stubs of the form πnpa, this will be important later.
Now, it’s trivial to show that ugh(Θ)(πpa)⊆⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa)) because we’ve established that projecting down makes a subset, and projection commutes, so any M∈ugh(Θ)(πpa) projects down into ugh(Θ)(πnpa) for all n.
All that’s left now is the reverse subset direction,
ugh(Θ)(πpa)⊇⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa))
Sadly, this one will require us splitting into the nirvana-containing (and thus causal) cases and the nirvana-free cases, and it’s a really difficult one to show.
Part 5: Let’s address the nirvana-free case, we’ll use a nifty trick to control the size of the preimage points we select.
Ok, let’s say you have a M with some λ1 and b1 value. And you take M′ that’s a preimage point, but its λ and b values are just… waaay too high. We want to have a preimage point with reasonable values, in order to apply bounding arguments. What you do, is find a minimal-point Mmin below M′, so M′=Mmin+M∗. Now, what you do, is swap out M∗ ie (m∗,b∗), for (m∗|h,b∗+m∗(0★hg)). This is an sa-measure, because
b∗+m∗(0★hg)+(m∗|h)−(1)=b∗+m∗(0★hg)+m∗−(1★h0)
≥m∗−(0★hg)+m∗−(1★h0)=b∗+m∗−(1★hg)≥b∗+m∗−(1)≥0
Now, consider updating Mmin+(m∗|h,b∗+m∗(0★hg)) instead (it’s an a-measure, it has less negative parts than M∗, and is present by nirvana-free upper-completion). This gets you the update of Mmin, plus… (c(m∗|h),b∗+m∗(0★hg)) (remember, 0 measure off-h). Which is the exact same thing you’d get by updating M∗, so when we updated our new sum, we hit M exactly.
However, this sum is special, because we can stick some decent bounds on its λ and b value! For starters, its b value is less than b1 (updating only adds on b-mass, and it updates to M). And as for the λ value… well, Mmin has its λ bounded above by λ⊙ (of the original Θ) due to being a minimal point. And in the worst-case, all of the measure in M came from the thing we added, so m∗|h has a measure of λ1 or less. So our bound on the λ value is λ⊙+λ1.
Armed with this knowledge, we can begin to prove the last bit of consistency in the nirvana-free case. Take a M in the intersection of preimages. It projects down to make Mn in ugh(Θ)(πnpa). Projection preserves λ and b, so they all have the same λ1,b1 bounds. Because we don’t need to close in the nirvana-free case, we get a preimage point of M′n in Θ(π¬h∙πnpa) From our earlier considerations, we can always pick M′n such that its λ is ≤λ⊙+λ1, and its b is ≤b1, although we’ll be using a bound of max(b1,b⊙).
Now, we’re going to have to be extremely careful here. Let the point M′n,j be defined as:
If j<n, then M′n,j is some arbitrary point in Θ(π¬h∙πnpa), with λ equal to or below λ⊙+λ1, and b equal to or below max(b1,b⊙), which always exists by all minimal points obeying the λ⊙+b⊙ bound.
If j=n, then M′n,j=M′n.
If j>n, then M′n,j=pr(π¬h,πjpa),(π¬h,πnpa)∗(M′j)
Then, the tuple of M′n,j for all n is a point in:
∏n(Θ(π¬h∙πnpa)∩{(λμ,b)|λ≤λ⊙+λ1,b≤max(b1,b⊙)})
Equipped with the product topology. In particular, this is a product of compact sets, so by Tychonoff’s theorem, it’s compact. Thus, we can get a convergent subsequence of the tuples. On this subsequence, all the M′n,j converge to a limit point M′n,∞, regardless of n.
Also, M′n,∞ projects down to M′m,∞ if n≥m, because for large enough j, the projection of M′n,j will always be M′m,j, and by continuity of projection, the projection of M′n,∞ must be M′m,∞
Ok, so we’ve got an infinite sequence of M′n,∞ for all n that all project down onto each other. Another nice feature is that updating M′n,∞ produces Mn. This is because, when j climbs high enough, M′j,j projects down to M′n,j, and M′j,j is just M′j which updates to Mj, which projects down to Mn. By Lemma 28, update-then-project equals project-then-update, so M′n,j must update to Mn, for all sufficiently high j. The preimage of a single point is closed, so past a certain point, the M′n,j are wandering around in the preimage of Mn, so M′n,∞ also updates to Mn.
Now, our next step is, does the M′n,∞ sequence in Θ(π¬h∙πnpa) pick out a single point M′ in Θ(π¬h∙πpa) that projects down accordingly? Yes it does. Just intersect all the preimages of single points, they’re nested in each other and compact so the finite intersection property holds, and if the intersection wasn’t composed of a single point, you’d have two distinct points with a difference at some finite time, but projecting down to any finite time the two distinct points are identical, so there can only be a single point in the intersection. Further, it must lie in Θ(π¬h∙πpa), because you can project it down to M′n,∞ in Θ(π¬h∙πnpa) for any n, which, by consistency for Θ, you can also project down to Θ((π¬h∙πpa)n) (projecting down further), so it’s present in the intersection of all the preimages, certifying that it’s in the appropriate set.
Now, finally… does M′, when updated, produce M, certifying that the point in the intersection of preimages is also in the raw update set? Well, let’s say it didn’t. Then we get a M′′ that’s not equal to M, so projecting down to some finite n should suffice to observe that. However, projecting M′′ and M down produces… Mn. This is because of Lemma 28, update-then-project equals project-then-update. Projecting M′ down makes M′n,∞, which updates to Mn.
So, no finite stage suffices to observe the difference between the updated form of M′ and M itself, so they must be identical, certifying ugh(Θ)(πpa)⊇⋂n(prπpa,πnpa∗)−1(ugh(Θ)(πnpa)) for the nirvana-free case.
Part 6: Let’s move to the nirvana-case where we can leverage causality. We’ll be showing this in a rather nonstandard way. We’re going to pick a π≥πpa, and show that our M of interest in the intersection of preimages can be written as a limit of points projected down from ugh(Θ)(π), establishing that M lies in the closed convex hull of points from above, which we’ve already shown equals ugh(Θ)(πpa).
Ok, so M is in the intersection of preimages. Project it down to all the ugh(Θ)(πnpa), getting a batch of points Mn from them. This is the raw update set, so within 2−n or less distance from Mn, there’s a M′n in the raw update sans closure, which has a preimage point M′′n that lies in Θ(π¬h∙πnpa).
Now, pick some arbitrary policy above π¬h∙πpa, which can be written as π¬h∙π. Moving on even further, by causality, we can get a point M′′∞n∈Θ(π¬h∙π) that projects down to M′′n. Update M′′∞n to get a M′∞n∈ugh(Θ)(π), which then (by our earlier thing about how a set equaled the closed convex hull of projections down from above), projects down to a M′hin∈ugh(Θ)(πpa).
Now, we can ask whether the sequence M′hin limits to M itself.ugh(Θ)(πpa) is closed, so this would certify that M lies in the appropriate set.
First, observe that the projection of M′hin down to πnpa is M′n. This is by Lemma 28, update-then-project equals project-then-update.M′′∞n projects down to M′′n, which updates to M′n, so M′n must be what you get by updating M′′∞n to M′∞n, and projecting down to πpa (making M′hin), and projecting that down to πnpa.
Now, because projection preserves the b term, and M′hin projects down to M′n which is within 2−n of Mn (not much of a difference in the b terms), and Mn has the same b term as M, we can certify convergence of the b term at least. Now for convergence of the measure components. Again, M′hin projects down to M′n which is within 2−n of Mn (not much difference before timestep n, shrinking increasingly low), and Mn perfectly mimics what M does before timestep n. So, M′hin behaves increasingly closely to M for everything before time n, which increases without bound. Increasingly close matches on increasingly large initial segments of what happens mean that M′hin must limit to M itself, certifying that M lies in ugh(Θ)(πpa) for the causal cases.
That’s the last bit we needed! We’re finally done with consistency now. This just leaves the hausdorff-condition and the extreme-point condition and pseudocausality and causality.
Condition 9: Hausdorff-continuity:
What we need to do for our setup to even approach this is to show that updating the preimage of the nirvana-free part of Θ(π¬h∙πpa), produces exactly the preimage of the nirvana-free part of ugh(Θ)(πpa).
One direction, we can get easily. If you fix a M′∞ in the preimage of the nirvana-free part of Θ(π¬h∙πpa), it projects down to a M′∈Θ(π¬h∙πpa)∩NF, that updates to a M∈ugh(Θ)(πpa), then by Lemma 28, project-then-update equals update-then-project, so M′∞ must update to a M∞ that projects down to M, certifying that updating the preimage of the nirvana-free part of Θ(π¬h∙πpa) produces a subset of the preimage of the nirvana-free part of ugh(Θ)(πpa).
In the other direction, fix a M∞ in the preimage of the nirvana-free part of ugh(Θ)(πpa). It projects down to a M in ugh(Θ)(πpa)∩NF, and by Lemma 27, M wasn’t introduced in the closure, so it has a preimage point M′∈Θ(π¬h∙πpa)∩NF.
Now, how do we extend M′ to craft a M′∞ that updates to M∞? Well, we can split into two parts. What happens on-h, and what happens off-h? For the off-h part, the post-update part has everything folded into the b term, while the pre-update part has an actual measure specified everywhere. Thus, our M′∞ should have the same off-h part as M′ to project down accordingly, so updating folds it into the same b term as M∞ has.
Now, for the on-h part, it’s a bit more complicated.M∞ specified what happens for all infinite histories with h as a prefix. However, M and M′ only specify part of that data, but fortunately agree on that part. Thus, for M′∞, you can just extend with the conditional probabilities of M∞, to perfectly mimic it on-h. This makes a M′∞ in the preimage that updates to M∞.
Ok, so the appropriate preimages for Hausdorff-continuity (post-update) are made exactly by updating the preimages for Hausdorff-continuity (pre-update). Now, updating is a continuous linear operator. We’re mapping from the Banach space M±((A×O)ω)⊕R to the Banach space M±(h(A×O)ω)⊕R.
Well, this isn’t quite right, your actions and observations may vary depending on where you are in history, but the general thing of “restrict to signed measures over infinite histories with h as a prefix” still checks out. Updating is still a continuous linear operator between Banach spaces, by Lemma 8 of Section 1.
Also, all continuous linear operators between Banach spaces are bounded, and thus Lipschitz-continuous at 0, and thus Lipschitz-continuous everywhere due to linearity. So, when we push two points that are only ϵ apart through the update, they’re now Cϵ apart at most, where C is a finite constant.
We’re going to have a lot of points. Unusually enough, we’ll be using the standard formulation of Hausdorff-continuity for our original Θ, that for all ϵ, there’s a δ where two partial policies πpa and π′pa that are δ or less apart have (pr∞,πpa∗)−1(Θ(πpa)∩NF∩{≤⊙}) (and the analogous set for π′pa) being only ϵ apart in Hausdorff-distance.
Fixing your ϵ, you’re gonna want δ to be low enough to force a ϵC difference between the clipped preimages, and δ<ϵC. It’s highly advised to sketch out how our points interact and what sets they’re in. A superscript of infinity will be used to denote points in the preimages of the ugh(Θ)(πpa) sets (or Θ(π¬h∙πpa)) (ie, at the infinite levels), and a superscript of “u” specifies post-update while its lack is pre-update.
Anyways, here’s our points.
Mu,∞ lies in the preimage of ugh(Θ)(πpa)∩NF, and it’s our point that we want to find a point nearby.λ will refer to the λ value of this thing.
Projecting Mu,∞ down to ugh(Θ)(πpa)∩NF makes Mu.
We can find a minimal point below Mu, Mu,min in ugh(Θ)(πpa)∩NF. Mu,min+Mu,∗=Mu.
A nirvana-free point wasn’t introduced by the closure, and it has a minimal point in its preimage, so there’s a Mmin in Θ(πpa) that updates to Mu,min, and respects the λ⊙+b⊙ bound of Θ.
Let Mlo be defined as Mmin+((mu,∗)−,−(mu,∗)−(1)). We’re extending the negative-measure part of Mu,∗ back to its original domain by sticking an h prefix on everything, and saying it’s 0 everywhere else. this is an a-measure that lies in Θ(π¬h∙πpa)∩NF∩{≤⊙} (because Mmin respects the λ⊙+b⊙ bound, and the thing that we added has a λ+b value of 0)
Let M be defined as Mlo+((mu,∗)+,bu,∗+(mu,∗)−(1)), it also lies in the same set Updating M makes Mu, because, unpacking M, it’s Mmin+Mu,∗, which updates to Mu,min+Mu,∗ which adds up to make Mu.
Our goal now is to explicitly construct a M∞ and Mlo,∞ in the preimage of Θ(π¬h∙πpa)∩NF s.t. they project down onto M and Mlo, Mlo,∞ lies below M∞, and M∞ updates to Mu,∞.
A sufficient way to do this is to make Mlo,∞ and M∞ by, after h, extending the measures further with the conditional probabilities of the measure component of Mu,∞. Extending ((mu,∗)+,bu,∗+(mu,∗)−(1)) with the conditional probabilities of Mu,∞ witnesses that Mlo,∞ lies below M∞. They obviously project down onto M and Mlo.
As for M∞ updating to Mu,∞, the b term and the fragment of the measure that doesn’t get ignored by projection down matches because M∞ projects to M which updates to Mu which is the projection of Mu,∞. And, for the fragment of the measure that isn’t defined in Θ(π¬h∙πpa), but that must be present on the infinite levels, we copied the conditional probabilities of the measure component Mu,∞, so we’ve got a match there.
Taking a break from setting up all our damn points for a brief recap, we have a Mlo,∞ that lies in the preimage of Θ(π¬h∙πpa)∩NF∩{≤⊙}, and a M∞ that lies above it (in the preimage of Θ(π¬h∙πpa)∩NF), and it updates to hit Mu,∞ (our original point in the preimage of ugh(Θ)(πpa)∩NF). Now, we can proceed.
So… Mlo,∞ lies in the preimage of Θ(π¬h∙πpa)∩NF∩{≤⊙}. By hausdorff-continuity for Θ and the distance between (π¬h∙πpa) and (π¬h∙π′pa) being below δ because the distance between πpa and π′pa is below δ, and using our earlier thing about how a δ distance means a ϵC difference between the clipped preimages, we can find a point (Mlo,∞)′ in the preimage of Θ(π¬h∙π′pa)∩NF∩{≤⊙} that’s that close to Mlo,∞. To go up from Mlo,∞ to M∞ requires adding ((mu,∗)+,bu,∗+(mu,∗)−(1)) (with the measure component extended with the conditional probabilities of the measure component of Mu,∞, obviously).
Also, because the λ value of Mu,∞ is the λ value of Mu, which was made by adding Mu,∗ to an a-measure, an upper bound on the λ value of that a-measure we added onto Mlo,∞ is… λ. Corresponding to the extreme case where all the measure of Mu came from Mu,∗.
Now, we can craft a point (M∞)′ which lies in the preimage of Θ(π¬h∙π′pa)∩NF that’s only ϵC+δλ away from M∞. Why? Well, we can start with (Mlo,∞)′, which is only ϵC away from Mlo,∞, and take that positive-measure-thingy we added, and reshuffle the measure on it. With earthmover distance, the δ distance between (π¬h∙π′pa) and (π¬h∙πpa) corresponds to a time-threshold where they start to differ at logγ(δ), and you’re moving dirt a γlogγ(δ)=δ difference to account for having to land in the right preimage, and you’ve got λ at most dirt to move. Then, you just add (Mlo,∞)′ and your reshuffled measure, to get your point (M∞)′. Which is the sum of two components that only differ by ϵC and δλ from the components which sum to make M∞.
Ok, so we have a point M∞ in the preimage of Θ(π¬h∙πpa)∩NF, which updates to Mu,∞ that lies in the preimage of ugh(Θ)(πpa). And a point (M∞)′ in the preimage of Θ(π¬h∙π′pa)∩NF which is (taking into account that δ<ϵC) only ϵC(1+λ) distance away from M∞.
And now we can finish up, because the preimage of ugh(Θ)(π′pa)∩NF is the update of the preimage of Θ(π¬h∙π′pa)∩NF. So, we just update (M∞)′ to get a point (Mu,∞)′ in the preimage of ugh(Θ)(π′pa). And further, the distance between M∞ and (M∞)′ is only ϵC(1+λ) at most.M∞ updates to Mu,∞, and (M∞)′ updates to (Mu,∞)′. And we know that ugh has a Lipschitz constant of C (by being a continuous linear operator between Banach spaces), so Mu,∞ only has a distance of ϵ(1+λ) from a point in the preimage of ugh(Θ)(π′pa).
So, we get Hausdorff-continuity (the Lemma 15 variant).
Condition 8: Extreme Point Condition:
We had to defer this because π¬h∙πst isn’t a stub, so we can’t use the extreme point condition we had, and instead must regenerate it completely from scratch.
Our first step in this is showing ugh(Θ)(πst)∩NF=¯¯¯¯¯¯¯¯c.h(⋃π≥πstprπ,πst(ugh(Θ)(π)∩NF))
One subset direction is easy, the closed convex hull of projections of nirvana-free stuff must all be in ugh(Θ)(πst) by consistency which we’ve shown, and all must be nirvana-free. Now for the reverse direction. Let M∈ugh(Θ)(πst)∩NF By Lemma 27, this point wasn’t added in the closure, so it has a preimage point M′∈Θ(π¬h∙πst)∩NF. Using all our nice conditions for Θ, we can invoke Lemma 21 to get that M′∈¯¯¯¯¯¯¯¯c.h(⋃π≥(π¬h∙πst)prπ,(π¬h∙πst)(Θ(π)∩NF)), so we can fix a sequence M′n limiting to M where each M′n shatters into M′i,n that came from some M′∞i,n that’s nirvana-free and lies in the associated set of a full policy above π¬h∙πst.
Updating the M′n produces a sequence Mn which is nirvana-free, in ugh(Θ)(πst), and limits to M by continuity.
Updating the M′∞i,n into M∞i,n which lie in ugh(Θ)(πi)∩NF, projecting down to get Mi,n, and mixing them, produces Mn, by our usual Lemma 28 argument.
This witnesses that all the Mn lie in c.h(⋃π>πstprπ,πst(ugh(Θ)(π)∩NF))
Thus, M lies in the closed convex hull of projections of nirvana-free stuff from above. What do we do with this? Well, now we can invoke Lemma 20, since we have Hausdorff-continuity proved, to conclude that c.h(⋃π≥πstprπ,πst(ugh(Θ)(π)∩NF)) is closed, so we didn’t really need the closed convex hull (which we’ve already shown is the same as ugh(Θ)(πst)∩NF)
And we now know that ugh(Θ)(πst)∩NF=c.h(⋃π≥πstprπ,πst(ugh(Θ)(π)∩NF))
Now, we can take a minimal extreme nirvana-free point Mex in ugh(Θ)(πst). It must be minimal and extreme and nirvana-free in the original set. If it wasn’t minimal in the original set, all minimals below it would be nirvana-free too, witnessing its nonminimiality in the restricted set. And if it wasn’t extreme in the original set, then the points that mix to make it must all be nirvana-free too, since it’s nirvana-free, so we have a witness of non-extremeness in ugh(Θ)(πst)∩NF.
Ok, so it’s extreme and nirvana-free. It must also be extreme in the convex hull set, but, since it can’t be produced by mixtures, there’s a M∞ in some ugh(Θ)(π)∩NF that projects down to Mex, establishing the extreme point condition.
That just leaves causality and pseudocausality.
Condition C: Causality
Ok, we pick a πpa and a point in ugh(Θ)(πpa) Can we make an outcome function for everything that includes our point? By our proof of full causality in the first part of the Isomorphism theorem (finite-to-full direction), this can be done as long as all other conditions are met and we can make an outcome function for any point in any ugh(Θ)(πst). So, let’s just establish finitary causality. Fix some πst and some M∈ugh(Θ)(πst).
Since M is in the updated set, there’s a sequence Mn that limits to M that we don’t need closure to get. There’s a λ and b bound on this sequence because it converges, call those bounds λ◯ and b◯. Now, we can take a M′n∈Θ(π¬h∙πpa) that updates to Mn. We can use causality for Θ to get an outcome function for M′n.
We don’t have to worry about nirvana-off-h, because M′n has no nirvana off-h, and the projection of M′n down to Θ(π¬h) preserves the off-h part, and is nirvana-free off-h, and everything above that (which is the only thing that determines the update) must also match the off-h part and certify that it’s nirvana-free.
Updating an outcome function back in produces an outcome function for ugh(Θ) by Lemma 28 (update then project equals project then update). Said outcome function for ugh(Θ) maps πst to Mn. We can restrict it to just stubs, to get an outcome function over stubs.
So, proceeding in this way, we get a sequence ofn of outcome functions for the stubs of ugh(Θ). Remember, outcome functions must match λ and b values, so the points for ofn have a λ and b value matching that of Mn, ie, less than λ◯ and b◯ since that’s our bound on the Mn sequence. this sequence ofn of outcome functions (picking out one point for each πst) can be thought of as an element of
∏π′st(ugh(Θ)(π′st)∩{(λμ,b)|λ≤λ◯,b≤b◯})
This is a product of compact sets (intersection of closed and compact sets by the Compactness Lemma) so it’s compact by Tychonoff. Thus, our sequence ofn of outcome functions has a subsequence with limit point of, and for all π′st (restricting n to the subsequence), limn→∞(ofn(π′st))=of(π′st). We have closure so all these limit points lie in their appropriate sets. In particular, of(πst)=limn→∞(ofn(πst))=limn→∞Mn=M So that checks out. Continuity of projections certifies that of is indeed an outcome function for stubs, because
prπhist,πlost∗(of(πhist))=prπhist,πlost∗(limn→∞(ofn(πhist)))=limn→∞prπhist,πlost(ofn(πhist))
=limn→∞ofn(πlost)=of(πlost)
And running through the proof of causality in the first part of the Isomorphism theorem, we get causality in general.
Condition P: Pseudocausality:
In the nirvana-free setting, fix a M∈ugh(Θ)(πpa), whose support is a subset of FNF(π′pa). Get a M′∈Θ(π¬h∙πpa) that updates to M. Its support is either on infinite histories of the off-h portion, or stuff in FNF(π′pa) (with an h prefix stuck on front), so it’s supported on FNF(π¬h∙π′pa), so M′∈Θ(π¬h∙π′pa) by pseudocausality, so then we update and get M∈ugh(Θ)(π′pa), certifying pseudocausality.
Almost done. Just invoke Lemma 24 to show that, after renormalizing, every nice property is preserved. We still do have to check that the renormalization we use is the proper renormalization to use. Our scale term for renormalization for updating is (maxπ>π¬hEΘ(π)(1★hg)−EΘ(π¬h)(0★hg))−1 and our shift term is EΘ(π¬h)(0★hg)
The scale term and shift term we should have for proper renormalization is (maxπEugh(Θ)(π)(1)−minπEugh(Θ)(π)(0))−1 and minπEugh(Θ)(π)(0) respectively. So let’s show they’re equal! We’ll be using Lemma 27, to get that every nirvana-free thing in ugh(Θ)(πpa) wasn’t added in the closure and has a preimage point.
maxπEugh(Θ)(π)(1)=maxπmin(m,b)∈ugh(Θ)(π)∩NF(m(1)+b)
=maxπ>π¬hmin(m,b)∈Θ(π)∩NF((m|h)(1)+b+m(0★hg))
=maxπ>π¬hmin(m,b)∈Θ(π)∩NF(m(1★h0)+b+m(0★hg)
=maxπ>π¬hmin(m,b)∈Θ(π)∩NF(m(1★hg)+b)=maxπ>π¬hEΘ(π)(1★hg)
Ok, so one of our normalization factors is correct. Let’s look at the second one.
minπEugh(Θ)(π)(0)=minπmin(m,b)∈ugh(Θ)(π)∩NF(b) Now, said minimal nirvana-free point projects down to π∅, the empty policy, preserving its b. Further, by Lemma 21, any point in π∅ with a lower b value, being nirvana-free, must be a finite mix of nirvana-free points from above projected down, so we get some nirvana-free point in some ugh(Θ)(π) with a too-low b, which is impossible, so we can swap out ugh(Θ)(π) with ugh(Θ)(π∅), getting
minπmin(m,b)∈ugh(Θ)(π)∩NF(b)=min(m,b)∈ugh(Θ)(π∅)∩NF(b)
=min(m,b)∈Θ(π¬h)∩NF(b+m(0★hg))=EΘ(π¬h)(0★hg)
And our second renormalization term checks out. Done!
Proposition 10: For causal, pseudocausal, acausal, and surcausal hypotheses,
EΘ(π¬h∙πpa)(f★hg)=EΘ(π¬h)(0★hg)+PgΘ,π¬h(h)⋅E(Θ|g,π¬h,h)(πpa)(fh)
First, note that fh is defined as fh(h′)=f(hh′). So it basically copies f, but it’s subtly different because it has to account for the h prefix being sliced off in the update. Let’s unpack E(Θ|g,π¬h,h)(πpa)(fh) first.
E(Θ|g,π¬h,h)(πpa)(fh)=min(m,b)∈(Θ|g,π¬h,h)(πpa)∩NF(m(fh)+b)
Then, we can invoke Lemma 27 to realize that all nirvana-free points in the update came from nirvana-free points originally, so we can rewrite this (taking the renormalization terms into account) as
min(m,b)∈Θ(π¬h∙πpa)∩NF(1PgΘ,π¬h(h)c(m|h)(fh)+1PgΘ,π¬h(h)(b+m(0★hg)−EΘ(π¬h)(0★hg)))
=min(m,b)∈Θ(π¬h∙πpa)∩NF1PgΘ,π¬h(h)(c(m|h)(fh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
=min(m,b)∈Θ(π¬h∙πpa)∩NF1PgΘ,π¬h(h)((m|h)(f)+b+m(0★hg)−EΘ(π¬h)(0★hg))
=min(m,b)∈Θ(π¬h∙πpa)∩NF1PgΘ,π¬h(h)(m(f★h0)+b+m(0★hg)−EΘ(π¬h)(0★hg))
=min(m,b)∈Θ(π¬h∙πpa)∩NF1PgΘ,π¬h(h)(m(f★hg)+b−EΘ(π¬h)(0★hg))
Now, armed with this, we can rewrite
EΘ(π¬h)(0★hg)+PgΘ,π¬h(h)⋅E(Θ|g,π¬h,h)(πpa)(fh)
as
EΘ(π¬h)(0★hg)+PgΘ,π¬h(h)(min(m,b)∈Θ(π¬h∙πpa)∩NF1PgΘ,π¬h(h)(m(f★hg)+b−EΘ(π¬h)(0★hg)))
=EΘ(π¬h)(0★hg)+(min(m,b)∈Θ(π¬h∙πpa)∩NF(m(f★hg)+b−EΘ(π¬h)(0★hg)))
=min(m,b)∈Θ(π¬h∙πpa)∩NF(m(f★hg)+b)=EΘ(π¬h∙πpa)(f★hg)
and we’re done.
Proposition 11: If hh′ is a valid o-history, then for causal, pseudocausal, acausal, and surcausal hypotheses, (Θ|g,π¬h,h)|gh,π¬h′,h′=Θ|g,(π¬h∙π¬h′),hh′
Proof sketch: We’ll work with updates assuming no closure is done, and then once we’ve established our main result, we’ll show it with the closure part of updating. This is very long but it’s mostly just a lot of algebra grinding to show that mapping a suitable point in Θ(π¬h∙(π¬h′∙πpa)) through the two individual updates and the single big update makes the same point.
We’ll take a detour first and show that Pgh(Θ|g,π¬h,h),π¬h′(h′)⋅PgΘ,π¬h(h)=PgΘ,(π¬h∙π¬h′)(hh′)
First, we can unpack Pgh(Θ|g,π¬h,h),π¬h′(h′) as:
maxπ>π¬h′E(Θ|g,π¬h,h)(π)(1★h′gh)−E(Θ|g,π¬h,h)(π¬h′)(0★h′gh)
Now, let’s unpack that first term, with the aid of our trusty Lemma 27 that when updating, no new nirvana-free points are added by closure.
maxπ>π¬h′E(Θ|g,π¬h,h)(π)(1★h′gh)=maxπ>π¬h′min(m,b)∈(Θ|g,π¬h,h)(π)∩NF(m(1★h′gh)+b)
=maxπ>π¬h′min(m,b)∈Θ(π¬h∙π)∩NF1PgΘ,π¬h(h)(c(m|h)(1★h′gh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
=1PgΘ,π¬h(h)maxπ>π¬h′min(m,b)∈Θ(π¬h∙π)∩NF(c(m|h)(1★h′gh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
=1PgΘ,π¬h(h)maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(c(m|h)(1★h′gh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
Ok, that’s good enough for now, we’ll do more rewrites later. Unpacking the second term, again, with Lemma 27, we get… pretty much the exact same thing by the same sequence of rewrites, culminating in
=1PgΘ,π¬h(h)min(m,b)∈Θ(π¬h∙π¬h′)∩NF(c(m|h)(0★h′gh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
Ok, we pulled a (PgΘ,π¬h(h))−1 term out of both pieces, which cancels out, so
Pgh(Θ|g,π¬h,h),π¬h′(h′)⋅PgΘ,π¬h(h)
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(c(m|h)(1★h′gh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF(c(m|h)(0★h′gh)+b+m(0★hg)−EΘ(π¬h)(0★hg))
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(c(m|h)(1★h′gh)+b+m(0★hg))
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF(c(m|h)(0★h′gh)+b+m(0★hg))
First, observe that 1★h′gh and 0★h′gh (since we’re taking the expectation over stuff that’s had the h clipped off), can be written as (1★hh′g)h and (0★hh′g)h respectively, because the term 1★hh′g (or 0) is “1 (or 0) on hh’, g on h but off hh’, g off h”, so (1★hh′g)h is “1 (or 0) on h’, gh off h’” (from stripping off the h prefix), which is the same as 1★h′gh (or the 0 analogue). So, we can rewrite as:
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(c(m|h)((1★hh′g)h)+b+m(0★hg))
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF(c(m|h)((0★hh′g)h)+b+m(0★hg))
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF((m|h)(1★hh′g)+b+m(0★hg))
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF((m|h)(0★hh′g)+b+m(0★hg))
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(m((1★hh′g)★h0)+b+m(0★hg))
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF(m((0★hh′g)★h0)+b+m(0★hg))
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(m((1★hh′g)★hg)+b)
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF(m((0★hh′g)★hg)+b)
Now, we should probably figure out how to rewrite (1★hh′g)★hg. This rewrites as (1★hh′g). Similarly, with (0★hh′g)★hg, it rewrites as (0★hh′g). Making these substitutions, we get
=maxπ>(π¬h∙π¬h′)min(m,b)∈Θ(π)∩NF(m(1★hh′g)+b)
−min(m,b)∈Θ(π¬h∙π¬h′)∩NF(m(0★hh′g)+b)
=maxπ>(π¬h∙π¬h′)EΘ(π)(1★hh′g)−EΘ(π¬h∙π¬h′)(0★hh′g)=PgΘ,(π¬h∙π¬h′)(hh′)
And we’re done, having shown that Pgh(Θ|g,π¬h,h),π¬h′(h′)⋅PgΘ,π¬h(h)=PgΘ,(π¬h∙π¬h′)(hh′)
Back to the bulk of the proof. First, we have to consider what points in Θ((π¬h∙π¬h′)∙πpa) can survive the pair of updates/single update. They have to have no nirvana that lacks h as a prefix, and no nirvana that has h as a prefix but lacks hh’ as a prefix. So, all the nirvana is after hh’. Let (m,b) be an appropriate point (lacking the nirvana, and in the right set) that we can shove through both updates/the single big update.
The induced point in ((Θ|g,π¬h,h)|gh,π¬h′,h′)(πpa) (minus the closure on both steps!) can be written as… well, it’s kinda big, we’ll break down the measure component and b component
1Pgh(Θ|g,π¬h,h),π¬h′(h′)c((1PgΘ,π¬h(h)c(m|h))|h′)
And the b component is still too big, we’ll split it up into two parts.
1Pgh(Θ|g,π¬h,h),π¬h′(h′)(1PgΘ,π¬h(h)(b+m(0★hg)−EΘ(π¬h)(0★hg)))
+1Pgh(Θ|g,π¬h,h),π¬h′(h′)(1PgΘ,π¬h(h)c(m|h)(0★h′gh)−E(Θ|g,π¬h,h)(π¬h′)(0★h′gh))
It may be nonobvious that this is the update, but the big fraction in front is the scale term for the second update which applies to everything, the measure term is the update of the measure term for the first update, the first b part is the b term that’s produced after the first update, and the second b part is the stuff added to the b term from the first update. Now, let’s break this down a bit. For the first measure term, we can pull out the inner scale term and use our result on what happens when you multiply the scale terms to get
1Pgh(Θ|g,π¬h,h),π¬h′(h′)c((1PgΘ,π¬h(h)c(m|h))|h′)=1PgΘ,(π¬h∙π¬h′)(hh′)c((c(m|h))|h′)
And c((c(m|h))|h′)=c((m|h)|hh′)=c(m|hh′)
This is because the first term is the process: Take m, strip out all parts of the measure that don’t have h as a prefix, clip off the h prefix, then strip out all parts of that measure that don’t have h’ as a prefix, and clip off h’. Which is the same as taking m, stripping out all parts that don’t have h as a prefix, stripping out all parts that don’t have hh’ as a prefix, and clipping off hh’ (the second term). And this is the same as taking m, stripping out all parts that don’t have hh’ as a prefix, and clipping off hh’ (the third term)
So, our final rewrite of the measure term is: 1PgΘ,(π¬h∙π¬h′)(hh′)c(m|hh′)
Now, let’s address the first b part. We can easily just pull the scale term out to rewrite it as:
1PgΘ,(π¬h∙π¬h′)(hh′)(b+m(0★hg)−EΘ(π¬h)(0★hg))
Which should be good enough for now. Moving on to the second b term,
1Pgh(Θ|g,π¬h,h),π¬h′(h′)(1PgΘ,π¬h(h)c(m|h)(0★h′gh)−E(Θ|g,π¬h,h)(π¬h′)(0★h′gh))
Again, we can pull out the scale term to rewrite as:
1PgΘ,(π¬h∙π¬h′)(hh′)(c(m|h)(0★h′gh)−PgΘ,π¬h(h)⋅E(Θ|g,π¬h,h)(π¬h′)(0★h′gh))
Now, from our earlier arguments, (0★hh′g)h=0★h′gh. So, we can rewrite as:
1PgΘ,(π¬h∙π¬h′)(hh′)(c(m|h)((0★hh′g)h)−PgΘ,π¬h(h)⋅E(Θ|g,π¬h,h)(π¬h′)((0★hh′g)h))
And now we can use Proposition 10 to swap out PgΘ,π¬h(h)⋅E(Θ|g,π¬h,h)(π¬h′)((0★hh′g)h) for
EΘ(π¬h∙π¬h′)((0★hh′g)★hg)−EΘ(π¬h)(0★hg)
And now, we can go: (0★hh′g)★hg=0★hh′g Making this substitution, we have a rewrite as
EΘ(π¬h∙π¬h′)(0★hh′g)−EΘ(π¬h)(0★hg)
And making this substitution back in, we have a rewrite of the second b term as:
1PgΘ,(π¬h∙π¬h′)(hh′)(c(m|h)((0★hh′g)h)−EΘ(π¬h∙π¬h′)(0★hh′g)+EΘ(π¬h)(0★hg))
Sticking our rewritten second and third b terms back together, we get 1PgΘ,(π¬h∙π¬h′)(hh′) times
b+m(0★hg)−EΘ(π¬h)(0★hg)+c(m|h)((0★hh′g)h)−EΘ(π¬h∙π¬h′)(0★hh′g)+EΘ(π¬h)(0★hg)
=b+m(0★hg)+c(m|h)((0★hh′g)h)−EΘ(π¬h∙π¬h′)(0★hh′g)
Let’s examine how to rewrite m(0★hg)+c(m|h)((0★hh′g)h) It rewrites as
m(0★hg)+c(m|h)((0★hh′g)h)=m(0★hg)+(m|h)(0★hh′g)
=m(0★hg)+m((0★hh′g)★h0)=m((0★hh′g)★hg)
Now, we can go (0★hh′g)★hg=0★hh′g anyways, our b term in total (both parts) finally rewrites as:
1PgΘ,(π¬h∙π¬h′)(hh′)(b+m(0★hh′g)−EΘ(π¬h∙π¬h′)(0★hh′g))
Putting our rewritten measure and rewritten b term back together, it’s
1PgΘ,(π¬h∙π¬h′)(hh′)(c(m|hh′),b+m(0★hh′g)−EΘ(π¬h∙π¬h′)(0★hh′g))
Which is exactly what you’d get from pushing (m,b) through the single big update function with g,(π¬h∙π¬h′),hh′.
Ok, so we’ve shown that ((Θ|g,π¬h,h)|gh,π¬h′,h′)(πpa)=(Θ|g,(π¬h∙πh¬h′),hh′)(πpa) for all πpa, but that’s just for the update with renormalization and without closure. How do we show “take the closure at the intermediate step and end” for the two updates and “take the closure at the end” for the big update are the same? Easy. Updates are continuous, so if we take the closure at the end, the preimage (of the final closed set for the second small update) is a closed superset of the image (for the first small update), so taking the closure there adds no new points. So, the closures don’t affect anything, and we have our theorem.
Theorem 5: Belief Function Bayes: For pseudocausal and acausal hypotheses, if there’s some i s.t.Θi|g,π¬h,h exists and is nontrivial, then
(Eζ(Θi))R|g,π¬h,h=⎛⎜⎝Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h))Eζ(PgΘi,π¬h(h))⎞⎟⎠R
This says that updating a prior works exactly as you’d expect. You rescale every updated component according to its “probability” relative to the “probability” the prior assigns to the observation, and mix them together. Because mixes may not be renormalized, you then just throw in a single scale-and-shift (which doesn’t affect things) and you’re done.
Proof sketch: A mixture of hypotheses, when renormalized, is a hypothesis, and a hypothesis updated, is a hypothesis. Because of consistency, a sufficient test for the two belief functions being equal is if we can show that they’re equal for all π, because all lower levels are uniquely generated from that. A further observation we can make is that (Eζ(PgΘi,π¬h(h)))−1 is a scaling term, so all we really need to do is to show that
((Eζ(Θi))R)|g,π¬h,h)(π)=(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))R(π)
because the renormalization compensates for the absence of that scale term.
The actual proof itself relies on first showing that we have a sufficient condition for the two sets to be well-defined in the form of nontriviality post-update. And then, one of the most aggrivating parts of this proof is keeping track of all the scale-and-shift terms for the updates and renormalization and showing how they all interact, and we spend a while doing that and laying groundwork. The basic proof path is fixing a batch of points in the Θi and pushing them through one side, and making a batch of points that hit the same resulting point when we push them through the other side. We must show both directions. We can’t just say “pushing them through the second process makes the same point” because that’s not true, we’ll need to exploit upper completion to build a different batch of points to account for Θi with 0 “probability”, because updating crashes there and we “lose” those points. This is another long algebra-grinding proof, like Proposition 11.
Proof: First, we should show some details about nontriviality. our original definition was that, for a Θ to be nontrivial, there’s a π st. EΘ(π)(1)≠EΘ(π)(0). By Proposition 6 in Section 1, triviality of Θ is equivalent to there being a single minimal point of the form (0,b) for all Θ(π) (b may vary depending on π, though) Now, our starting condition was that there’s some Θi you can update and not have it crash, so that means the Eζ(PgΘi,π¬h(h)) term is nonzero, so that’s one possible source of failure eliminated. Said nontrivial Θi|g,π¬h,h, since it has nonzero “probability” implies that the mixture has some π with different expectation values, so we can safely renormalize the mixture of updated belief functions without running into issues.
Also, updating a trivial Θi makes a trivial Θi|g,π¬h,h, because for all π, Θi(π¬h∙π) has one minimal point of the form (0,b), and everything else is an a-measure added to that, so updating is equivalent to updating (0,b) and updating your a-measure, so the updated set (without renormalization) has a single minimal point of the form (0,b) (the measure component being 0 means it can’t influence the b term), and then a scale-and-shift means your new set has a single minimal point of the form (0,b′), ie, your updated Θi is trivial. So, if there’s an i where Θi|g,π¬h,h is nontrivial, then Θi must be nontrivial. And, by “there’s a single nontrivial hypothesis” being a sufficient condition for being able to mix sets and renormalize them, (EζΘi)R is well-defined.
Admittedly, we haven’t touched the update of the renormalized set yet, but we’ll address that part later. For now, just remember that the renormalization of EζΘi doesn’t crash, and neither does the renormalization of Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h))
And that’s enough for now. Also, since we’re working in the nirvana-free setting, we can use Lemma 27 to show that we don’t need the closure part of updating.
Let Idef be the subset of the i where PgΘi,π¬h(h)>0 (ie, those where the update is defined) It’s nonempty. Now, reindex the probability distribution and hypotheses so 0∈Idef.
For an i∈Idef, let the set {i→} be the largest set of the form {i+1,i+2...} where i+1,i+2... are all not in Idef. Intuitively, {i→} is the largest contiguous string of numbers after i where PgΘi,π¬h(h) is zero if there is such a string.
One of the most aggrivating parts of this proof if we don’t do some housekeeping work first is keeping track of all the renormalization terms for all the updates and mixtures. Let’s introduce some notation for these and derive relationships between them.
α1:=maxπ>π¬hE(EζΘi)R(π)(1★hg) and β1:=minπ>π¬hE(EζΘi)R(π)(0★hg)
α2:=maxπE(EζΘi)(π)(1) and β2:=minπE(EζΘi)(π)(0)
α3:=maxπ>π¬hE(EζΘi)(π)(1★hg) and β3:=minπ>π¬hE(EζΘi)(π)(0★hg)
αi:=maxπ>π¬hEΘi(π)(1★hg) and βi:=minπ>π¬hEΘi(π)(0★hg)
And we need to take a moment to note that for the next two, π′ isn’t a normal policy, it’s one of those post-update policies that you have to glue to π¬h to get an actual policy.
α4:=maxπ′E(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))(π′)(1) and β4:=minπ′E(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))(π′)(0)
1α1−β1 and β1 are the scale-and-shift terms for updating the renormalized mixture of belief functions, 1α2−β2 and β2 are the scale-and-shift terms for renormalizing the mixture of belief functions, 1α3−β3 and β3 are the scale-and-shift terms for updating the raw mixture of belief functions, 1αi−βi and βi are the scale-and-shift terms for updating an individual belief function, and 1α4−β4 and β4 are the scale-and-shift terms for renormalizing our mixture of updated belief functions.
By our earlier considerations about nontriviality, α4≠β4, and α2≠β2, and there’s some i where αi≠βi (and in particular, 0 is one of those i) We’ll show after a bit more work, that α2≠β2 and α3≠β3, so none of the scaling terms have a divide-by-zero error except, for some i, αi=βi (maybe)
First, let’s unpack α1 and β1.
α1=maxπ>π¬hE(EζΘi)R(π)(1★hg)=maxπ>π¬hmin(m,b)∈(EζΘi)R(π)(m(1★hg)+b)
=maxπ>π¬hmin(m,b)∈(EζΘi)(π)1α2−β2(m(1★hg)+b−β2)
=1α2−β2((maxπ>π¬hmin(m,b)∈(EζΘi)(π)(m(1★hg)+b))−β2)
=1α2−β2(maxπ>π¬hE(EζΘi)(π)(1★hg)−β2)=1α2−β2(α3−β2)
Ok, so we have α1=α3−β2α2−β2. The exact same argument, just with appropriate terms switched around, establishes that β1=β3−β2α2−β2 Remember, α2≠β2 so the scale term doesn’t blow up.
For the next one, we’ll need the crucial fact that if updating fails (ie, αi=βi), then after raw updating (but before renormalization), for all policies, ugh(Θi)(π) has a single minimal point that’s of the form (0,βi).
α4=maxπ′E(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))(π′)(1)=maxπ′E(Eζ((αi−βi)(Θi|g,π¬h,h)))(π′)(1)
=maxπ′min(m,b)∈(Eζ((αi−βi)(Θi|g,π¬h,h)))(π′)(m(1)+b)
=maxπ′Eζ(αi−βi)(min(mi,bi)∈(Θi|g,π¬h,h)(π′)(mi(1)+bi))
=maxπ′(∑i∈Idefζi(αi−βi)(min(mi,bi)∈ugh(Θi)(π′)(1αi−βimi(1)+1αi−βi(bi−βi))))
=maxπ′(∑i∈Idefζi(minmi,bi∈ugh(Θi)(π′)(mi(1)+bi−βi)))
And then, we can use the fact that, if renormalization fails, regardless of the π′, ugh(Θi)(π′) is composed of a single minimal point of the form (0,βi) to get
∑i∉Idefζi(min(mi,bi)∈ugh(Θi)(π′)(mi(1)+bi−βi))=∑i∉Idefζi(0(1)+βi−βi)=0
And then, adding this in, we get
=maxπ′(∑iζi(minmi,bi∈ugh(Θi)(π′)(mi(1)+bi−βi)))
=maxπ>π¬hEζ(min(mi,bi)∈Θi(π)(c(mi|h)(1)+bi+mi(0★hg)−βi))
=maxπ>π¬hEζ(min(mi,bi)∈Θi(π)(mi(1★h0)+bi+mi(0★hg)))−Eζβi
=maxπ>π¬hEζ(min(mi,bi)∈Θi(π)(mi(1★hg)+bi))−Eζβi
=maxπ>π¬hEζ(EΘi(π)(1★hg))−Eζβi
=maxπ>π¬hE(EζΘi)(π)(1★hg)−Eζβi=α3−Eζβi
So α4=α3−Eζβi and for β4, it’s the exact same thing and same argument, β4=β3−Eζβi
Let’s verify that α3−β3≠0. See that α3−β3=α4−Eζβi−β4+Eζβi=α4−β4≠0 (we already know that α4≠β4)
And we can verify that α1−β1≠0 by α1−β1=α3−β2α2−β2−β3−β2α2−β2=α3−β2−β3+β2α2−β2=α3−β3α2−β2≠0
(we already know that α3−β3≠0, we just showed it, and we know that α2−β2≠0 so that scale term doesn’t crash)
We’ll show two directions of the proof. The first direction is, we take a bunch of (mi,bi)∈Θi(π¬h∙π) and feed them through the second process (update, mix with corrected probabilities, then renormalize), then show we can craft a bunch of (m′i,b′i)∈Θi(π¬h∙π) that, when fed through the first process (mix, renormalize, then update as a whole), produce the same point.
Feeding our (mi,bi)∈Θi(π¬h∙π) through the updates produce the points:1αi−βi(c(mi|h),bi+mi(0★hg)−βi) Well, only for i∈Idef. Otherwise the update crashes.
Mixing them produces: ∑i∈Idefζi(αi−βi)1αi−βi(c(mi|h),bi+mi(0★hg)−βi)
Which cancels to make: ∑i∈Idefζi(c(mi|h),bi+mi(0★hg)−βi)
This can be reexpressed as: (∑i∈Idefζic(mi|h),∑i∈Idefζi(bi+mi(0★hg)−βi))
And now, post-renormalization for the mixture, we get:
1α4−β4(∑i∈Idefζic(mi|h),∑i∈Idefζi(bi+mi(0★hg)−βi)−β4)
Our task is now to hit that exact point by feeding some appropriate batch of points through the mix, renormalization, and then update. If i∈Idef, then let (m′i,b′i)=(mi,bi). Otherwise, let (m′i,b′i) be some suitable point that raw-updates to (0,βi). In particular this means that m′i|h=0 and b′i+m′i(0★hg)=βi.
Anyways, mixing the (m′i,b′i) and renormalizing produces 1α2−β2(Eζm′i,Eζb′i−β2)
Then, updating produces
1α1−β1(c((1α2−β2Eζm′i)|h),1α2−β2(Eζ(b′i)−β2)+1α2−β2(Eζm′i)(0★hg)−β1)
Now we must show that this is equal to
1α4−β4(∑i∈Idefζic(mi|h),∑i∈Idefζi(bi+mi(0★hg)−βi)−β4)
Let’s begin. First, we can reexpress as:
1(α1−β1)(α2−β2)(c((Eζm′i)|h),Eζb′i−β2+(Eζm′i)(0★hg)−(α2−β2)β1)
=1(α1−β1)(α2−β2)(Eζc(m′i|h),Eζb′i−β2+Eζ(m′i(0★hg))−(α2−β2)β1)
And now, since α1=α3−β2α2−β2 and β1=β3−β2α2−β2, we can simplify (α1−β1)(α2−β2) as α3−β3, and simplify (α2−β2)β1 as β3−β2, to rewrite as
=1α3−β3(Eζc(m′i|h),Eζ(b′i+m′i(0★hg))−β3)
Then, exploit the fact that for i∉Idef, c(m′i|h)=0, and b′i+m′i(0★hg)=βi and otherwise m′i=mi,b′i=bi to get
=1α3−β3(∑i∈Idefζic(mi|h),∑i∈Idefζi(bi−mi(0★hg))+∑i∉Idefζiβi−β3)
One more step needed. We know that β3=Eζβi+β4, so rearranging this produces ∑i∉Idefζiβi−β3=−∑i∈Idefζiβi−β4, and making that substitution we get
=1α3−β3(∑i∈Idefζic(mi|h),∑i∈Idefζi(bi−mi(0★hg)−βi)−β4)
Almost there, one more step. We know that α3=Eζβi+α4 and β3=Eζβi+β4 so α3−β3=α4−β4, yielding
=1α4−β4(∑i∈Idefζic(mi|h),∑i∈Idefζi(bi−mi(0★hg)−βi)−β4)
So, pushing a collection of points through the updates individually and mixing with corrected probabilities and renormalizing can be replicated by mixing a different batch of points in the Θi(π¬h∙π) first, renormalizing, and updating, establishing that
((Eζ(Θi))R)|g,π¬h,h)(π)⊇(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))R(π)
That just leaves the other direction. This time we’ll be using (mi,bi) for stuff in the Θi(π¬h∙π) sets that are pushed through “mix, renormalize, update”, and (m′i,b′i), i∈Idef for points in Θi(π¬h∙π) that are pushed through “update, mix with corrected probabilities, renormalize”, to attempt to hit the same point.
Ok, let’s express our point of interest to hit in terms of the (mi,bi). First, mixing the (mi,bi) and renormalizing produces 1α2−β2(Eζmi,Eζbi−β2)
Then, updating produces
1α1−β1(c((1α2−β2Eζmi)|h),1α2−β2(Eζ(bi)−β2)+1α2−β2(Eζmi)(0★hg)−β1)
=1(α1−β1)(α2−β2)(c((Eζmi)|h),Eζbi−β2+(Eζmi)(0★hg)−(α2−β2)β1)
=1(α1−β1)(α2−β2)(Eζc(mi|h),Eζbi−β2+Eζ(mi(0★hg))−(α2−β2)β1)
And now, since α1=α3−β2α2−β2 and β1=β3−β2α2−β2, we can simplify (α1−β1)(α2−β2) as α3−β3, and simplify (α2−β2)β1 as β3−β2, to rewrite as
=1α3−β3(Eζc(mi|h),Eζ(bi+mi(0★hg))−β3)
Now, let’s define our (m′i,b′i), when i∈Idef (to have them not get canceled out of existence by an undefined update/multiplication-by-zero)
m′i:=mi+∑j∈{i→}ζjζi(mj|h)
b′i:=bi+∑j∈{i→}ζjζi(bj+mj(0★hg)−βj)
What this essentially does is take (mi,bi) and adds a specially chosen a-measure to it, getting another point in the same set by upper-completion. Because ζi is nonzero, and the mixture of the (mi,bi) converges, the scale term doesn’t affect the fact that this partial sum converges. Adding in the expectation-of-function terms doesn’t affect convergence, and it’s a sum of scaled a-measures because bj+mj(0★hg)−βj is just the b term of the raw update of (mj,bj) but shifted down, ie, nonnegative.
This may appear a bit mysterious, but the rationale behind it is “dang, we’re only summing up over i∈Idef, we need to take our “missing” (mj,bj), and stash them in our “safe” i somehow (via the wiggle room we get from upper-completion) so they can manifest post-update”
Feeding our (m′i,b′i)∈Θi(π¬h∙π) through the updates produce the points:
1αi−βi(c(m′i|h),b′i+m′i(0★hg)−βi) But only for i∈Idef.
Mixing them produces the point: ∑i∈Idefζi(αi−βi)1αi−βi(c(m′i|h),b′i+m′i(0★hg)−βi)
Which cancels to make: ∑i∈Idefζi(c(m′i|h),b′i+m′i(0★hg)−βi)
This can be reexpressed as: (c(∑i∈Idef(ζim′i)|h),∑i∈Idefζi(bi+mi(0★hg)−βi))
And now, post-renormalization for the mixture, we get:
1α4−β4(c(∑i∈Idef(ζim′i)|h),∑i∈Idefζi(b′i+m′i(0★hg)−βi)−β4)
Substituting in our definition of m′i and b′i we have (all this is one term, we had to break it up)
1α4−β4(c(∑i∈Idef(ζi(mi+∑j∈{i→}ζjζi(mj|h)))|h))
1α4−β4(∑i∈Idefζi(bi+∑j∈{i→}ζjζi(bj+mj(0★hg)−βj)))
+1α4−β4(∑i∈Idefζi((mi+∑j∈{i→}ζjζi(mj|h))(0★hg)−βi)−β4)
Simplifying the measure term, we get:
1α4−β4(c(∑i∈Idef(ζi(mi+∑j∈{i→}ζjζi(mj|h)))|h))
=1α4−β4(c(∑i∈Idef(ζimi+∑j∈{i→}ζj(mj|h))|h))
=1α4−β4(c(∑i∈Idef(ζi(mi|h)+∑j∈{i→}ζj((mj|h)|h))))
=1α4−β4(∑i∈Idef(ζic(mi|h)+∑j∈{i→}ζjc(mj|h)))
=1α4−β4(∑iζic(mi|h))=1α4−β4(Eζc(mi|h))
Ok, that’s the measure-term simplified. Now let’s look at the first b term.
1α4−β4(∑i∈Idefζi(bi+∑j∈{i→}ζjζi(bj+mj(0★hg)−βj)))
=1α4−β4(∑i∈Idef(ζibi+∑j∈{i→}ζj(bj+mj(0★hg)−βj)))
=1α4−β4(∑iζibi+∑i∉Idefζi(mi(0★hg)−βj))
=1α4−β4(Eζbi+∑i∉Idefζi(mi(0★hg))−∑i∉Idefζiβi)
And now let’s look at the second b term.
1α4−β4(∑i∈Idefζi((mi+∑j∈{i→}ζjζi(mj|h))(0★hg)−βi)−β4)
=1α4−β4(∑i∈Idefζi(mi(0★hg)+(∑j∈{i→}ζjζi((mj|h)(0★hg)))−βi)−β4)
=1α4−β4(∑i∈Idefζi(mi(0★hg)−βi)−β4)
(this previous step is because 0★hg is 0 on histories with h as a prefix, and mj|h is only supported on histories with h as a prefix, so the expectation is 0, and this extends to the sum of expectations)
=1α4−β4(∑i∈Idefζi(mi(0★hg))−∑i∈Idefζiβi−β4)
Uniting our two rewritten b terms, we get:
=1α4−β4(Eζbi+∑iζi(mi(0★hg))−∑iζiβi−β4)=1α4−β4(Eζ(bi+mi(0★hg))−Eζβi−β4)
and uniting this with our rewritten measure term, we get:
1α4−β4(Eζc(mi|h),Eζ(bi+mi(0★hg))−Eζβi−β4)
Now, let’s compare against
=1α3−β3(Eζc(mi|h),Eζ(bi+mi(0★hg))−β3)
We already know from our earlier results on α and β that α4−β4=α3−β3, and that −β3=−β4−Eζβi, so our equivalence is complete.
So, mixing a batch of points in the Θi(π¬h∙π), renormalizing, and updating, can be replicated by pushing a different collection of points through the updates individually and mixing with corrected probabilities and renormalizing, establishing that for all π
((Eζ(Θi))R)|g,π¬h,h)(π)⊆(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))R(π)
So, we have that, for all π
((Eζ(Θi))R)|g,π¬h,h)(π)=(Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h)))R(π)
=⎛⎜⎝Eζ(PgΘi,π¬h(h)⋅(Θi|g,π¬h,h))EζPgΘi,π¬h(h)⎞⎟⎠R(π)
And we have our result now.
Theorem 6: Dynamic Consistency: Given a hypothesis Θ (causal, pseudocausal, acausal, surcausal), and an arbitrary policy π and utility function U, then, with πh being the continuation of π post-update and π∗ being such that E(Θ|U,π¬h,h)(πh)(Uh)⪋E(Θ|U,π¬h,h)(π∗)(Uh) , then EΘ(π)(U)⪋EΘ(π¬h∙π∗)(U)
Proof sketch: We’ll need to shatter this into two parts. The first part is if the update is undefined. Then the agent gives up and cries, and all policies are equally good. So we have to show that regardless of what the agent does after h, then it matches the performance of the original policy. The second part is showing the result for a well-defined update. It’s mostly shuffling equations around.
For the first part, updates fail exactly when, for all policies, uUh(Θ)(π)∩NF has a single minimal point of the form (0,β) (same β for all policies). We’ll be using this. Also, πh will be used to denote what the policy π does after observation h, so we have π=π¬h∙πh.
EΘ(π)(U)=min(m,b)∈Θ(π)∩NF(m(U)+b)=min(m,b)∈Θ(π¬h∙πh)∩NF(m(U)+b)
=min(m,b)∈Θ(π¬h∙πh)∩NF(m(U★h0)+b+m(0★hU))
=min(m,b)∈Θ(π¬h∙πh)∩NF((m|h)(U)+b+m(0★hU))
=min(m,b)∈Θ(π¬h∙πh)∩NF(c(m|h)(Uh)+b+m(0★hU))
=min(m,b)∈uUh(Θ)(πh)∩NF(m(Uh)+b)=β=min(m,b)∈uUh(Θ)(π∗)∩NF(m(Uh)+b)
=min(m,b)∈Θ(π¬h∙π∗)∩NF(c(m|h)(Uh)+b+m(0★hU))
=min(m,b)∈Θ(π¬h∙π∗)∩NF((m|h)(U)+b+m(0★hU))
=min(m,b)∈Θ(π¬h∙π∗)∩NF(m(U★h0)+b+m(0★hU))
=min(m,b)∈Θ(π¬h∙π∗)∩NF(m(U)+b)=EΘ(π¬h∙π∗)(U)
And we’re done, EΘ(π)(U)=EΘ(π¬h∙π∗)(U)
Now for the case where the update actually goes through.
1PUπ¬h,h(h)(EΘ(π)(U)−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π)∩NF(m(U)+b)−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙πh)∩NF(m(U)+b)−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙πh)∩NF(m(U★h0)+b+m(0★hU))−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙πh)∩NF((m|h)(U)+b+m(0★hU))−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙πh)∩NF(c(m|h)(Uh)+b+m(0★hU))−EΘ(π¬h)(0★hU))
=min(m,b)∈Θ(π¬h∙πh)∩NF(1PUπ¬h,h(h)(c(m|h)(Uh)+b+m(0★hU)−EΘ(π¬h)(0★hU)))
=min(m,b)∈(Θ|U,π¬h,h)(πh)(m(Uh)+b)
=E(Θ|U,π¬h,h)(πh)(Uh)⪋E(Θ|U,π¬h,h)(π∗)(Uh)
=min(m,b)∈(Θ|U,π¬h,h)(π∗)(m(Uh)+b)
=min(m,b)∈Θ(π¬h∙π∗)∩NF(1PUπ¬h,h(h)(c(m|h)(Uh)+b+m(0★hU)−EΘ(π¬h)(0★hU)))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙π∗)∩NF(c(m|h)(Uh)+b+m(0★hU))−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙π∗)∩NF((m|h)(U)+b+m(0★hU))−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙π∗)∩NF(m(U★h0)+b+m(0★hU))−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(min(m,b)∈Θ(π¬h∙π∗)∩NF(m(U)+b)−EΘ(π¬h)(0★hU))
=1PUπ¬h,h(h)(EΘ(π¬h∙π∗)(U)−EΘ(π¬h)(0★hU))
so we know
1PUπ¬h,h(h)(EΘ(π)(U)−EΘ(π¬h)(0★hU))⪋1PUπ¬h,h(h)(EΘ(π¬h∙π∗)(U)−EΘ(π¬h)(0★hU))
So EΘ(π)(U)−EΘ(π¬h)(0★hU)⪋EΘ(π¬h∙π∗)(U)−EΘ(π¬h)(0★hU)
and we can finish up and get dynamic consistency. EΘ(π)(U)⪋EΘ(π¬h∙π∗)(U)
Theorem 7: Maximin UDT: Translating a set S of policy selection environments with a bounded modulus of continuity to an acausal hypothesis Θ always works. Also, for all utility functions U, argmaxπinfe∈SEπ⋅e(U)=argmaxπEΘ(π)(U)
Ok, so a policy selection environment e is a continuous (in the policy) function (A×O)<ω×Π→ΔO
If you really want, like, for the planting flowers problem, you can have some probability of nonexistence that’s policy-dependent, and a backup utility in case of nonexistence, though both still must be continuous in policy, by going “ok, there’s a primordial event that leads to either the distribution starting, or I get event ⊥ with b≤1 utility”, this can be crammed into the necessary framework.
A policy selection environment looks at what has happened thus far, and your policy, and picks some distribution over observations. For a single policy-selection environment, π→π⋅e is uniformly continuous. This is because, if you fix a time length t, there’s finitely many histories of length t or less. For each of these histories, there’s a δ where two policies identical up till time logγ(δ) produce only an ϵ change in ΔO (continuity means that different policies that are identical up till some sufficiently long time induce only a small change in what happens now). So, we can go “policies π,π′ that identical up till some stupidly long time mean that, for the first t steps, there’s very little change in what happens”. t can be made as long as we wish, and ϵ can be made as low as we wish, so for all ϵ, there some δ where, if d(π,π′)<δ, then π⋅e and π′⋅e are within ϵ of each other.
Bounded modulus of uniform continuity means that there’s a single δ/ϵ function that works for all your policy selection environments of interest. Ie, no matter which environment was selected, you know how long policies need to be identical for to make only an ϵ difference in the resulting distribution over histories.
Encode each π⋅e history distribution as having λ=1 and b=0 Considering the set of (π⋅e,0) points as points for Θ?ω(π), we have
argmaxπinfe∈SEπ⋅e(U)=argmaxπinf{π⋅e}:e∈S(Ee⋅π(U)+0)=argmaxπinf(m,b)∈Θ?ω(π)(m(U)+b)
We do need to show that Θ?ω fulfills the essential properties for being able to turn it into an acausal hypothesis via Proposition 2. There’s four. Nonemptiness, restricted-minimals, Hausdorff-continuity, and renormalization not failing. Nonemptiness is trivial. Restricted-minimals is easy because every point in Θ?ω(π), regardless of π, has λ=1 and b=0. Hausdorff-continuity can be shown by the set of environments having a bounded modulus of continuity, so given any (π⋅e,0)∈Θ?ω(π), we can reuse that e and there’s a δ where d((π⋅e,0),(π′⋅e,0))=d(π⋅e,π′⋅e)<ϵ, and (π′⋅e,0)∈Θ?ω(π′), so Hausdorff-continuity follows.
That just leaves being normalizable. This occurs if there’s a nontrivial Θ?ω(π), ie, EΘω(π)(1)≠EΘω(π)(0). This is obviously true, because the former is “minimal value of λ+b (always 1)”, and the latter is “minimal value of b” (always 0).
So, by proposition 2, we create an acausal Θω hypothesis from our Θ?ω. From proposition 5, we then get argmaxπinf(m,b)∈Θ?ω(π)(m(U)+b)=argmaxπmin(m,b)∈Θω(π)(m(U)+b)
And then, since we have an acausal infinitary hypothesis, we can use the Isomorphism theorem to get that Θ(π)=Θω(π), so argmaxπmin(m,b)∈Θω(π)(m(U)+b)=argmaxπmin(m,b)∈Θ(π)(m(U)+b)
And finally, we can wrap up with Proposition 5 that said argmax set of policies actually exists, showing
argmaxπinfe∈SEπ⋅e(U)=argmaxπmin(m,b)∈Θ(π)(m(U)+b)=argmaxπEΘ(π)(U)
And we’re done with UDT-copying. And, if you want, you can translate it into a surcausal hypothesis, and into a set of a-survironments from there.
Proposition 12: If the collection of hypotheses Θi is learnable, then any Infrabayes-optimal policy family for a prior on them also learns the collection of hypotheses as well.
First, we’ll recap learnability. Learnability of a countable collection of belief functions by a γ-indexed family of policies πγ is the condition that for each Θi, regret limits to 0. (we’ll use Uγ for our utility function with time-discount parameter γ) Ie,
∀i:limγ→1(maxπ∗(EΘi(π∗)(Uγ))−EΘi(πγ)(Uγ))=0
So, in the low-time-discount limit, you get a score arbitrarily close to that of an optimal agent that knows exactly what environment it’s playing against.
An Infrabayes-optimal policy family for a prior/mixture of belief functions is one where
π∗γ∈argmaxπE(EζΘi)R(π)(Uγ)
Such an argmax set exists by Proposition 5. Further, any scale-and-shift just does a scale-and-shift on the values a policy will achieve and leaves the argmax set alone, so we could get an alternate representation as: π∗γ∈argmaxπE(EζΘi)(π)(Uγ)
So, assume that a countable family of belief functions paired with a utility function U is learnable by some family of policies πγ. We’ll show that it’s also learnable by any bayes-optimal π∗γ family for the prior (EζΘi)R.
First, πγ learns the family of hypotheses,∀i:limγ→1(maxπ∗(EΘi(π∗)(Uγ))−EΘi(πγ)(Uγ))=0
This implies limγ→1Eζ(maxπ∗(EΘi(π∗)(Uγ))−EΘi(πγ)(Uγ))=0
Because you only have to go finitely far out to nab all but ϵ of the probability mass of the expectation, and you can pick some γ extremely close to 1 that ensures that all those finitely many environments have ϵ or less regret.
We can now move the expectation inside to get:
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−Eζ(EΘi(πγ)(Uγ)))=0
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−E(EζΘi)(πγ)(Uγ))=0
Now, E(EζΘi)(πγ)(Uγ)≤E(EζΘi)(π∗γ)(Uγ) because π∗γ is optimal for the prior, so it’s optimal for any rescaled version. So,
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−E(EζΘi)(π∗γ)(Uγ))=0
limγ→1(Eζ(maxπ∗(EΘi(π∗)(Uγ)))−Eζ(EΘi(π∗γ)(Uγ)))=0
limγ→1Eζ(maxπ∗(EΘi(π∗)(Uγ))−EΘi(π∗γ)(Uγ))=0
Now, symmetrically, if π∗γ doesn’t limit to 0 regret on all belief functions, then the expectation doesn’t limit to 0 either.
∀i:limγ→1(maxπ∗(EΘi(π∗)(Uγ))−EΘi(π∗γ)(Uγ))=0
and we have shown that our arbitrary Infrabayes-optimal family of policies learns the environments.
Complete Class Theorem Weak Version: Given any pareto-optimal policy π, then there is an infradistribution H over states, where ∀π′:fπ′≠fπ:EH(fπ)>EH(fπ′)
Proof sketch: Because we are able to translate from concave lipschitz monotone normalized functionals over [0,1]S to infradistributions over states, we just have to get a concave lipschitz monotone functional where our policy π is optimal, and then it can be normalized back up to 1, and then it can be turned into an infradistribution over states by LF-duality. Said concave lipschitz monotone functional is:
h(f):=mins∈S(f(s)−Eo∼ob(s)P(s,π(o)))
We just need to show that the function is indeed concave, Lipschitz, monotone, and assigns no policy a higher expectation value than the Pareto-optimal policy, because all these properties are preserved by a scale-and-shift.
Proof of Lipschitzness: If you perturb f by ϵ or less in all states, then this only affects the minimal value by ϵ or less, so we actually have a Lipschitz constant of 1.
Proof of monotonicity: If f matches or outperforms f′ in all states, than the possible values that min is picking amongst all went up, so f gets an equal or higher value, showing monotonicity.
Proof of concavity:
h(pf+(1−p)f′)=mins∈S((pf+(1−p)f′)(s)−Eo∼ob(s)P(s,π(o)))
=mins∈S(pf(s)+(1−p)f′(s)−pEo∼ob(s)P(s,π(o))−(1−p)Eo∼ob(s)P(s,π(o)))
≥mins∈S(pf(s)−pEo∼ob(s)P(s,π(o)))+mins∈S((1−p)f′(s)−(1−p)Eo∼ob(s)P(s,π(o)))
=pmins∈S(f(s)−Eo∼ob(s)P(s,π(o)))+(1−p)mins∈S(f′(s)−Eo∼ob(s)P(s,π(o)))
=ph(f)+(1−p)h(f′)
Proof of π getting the optimal value: π is on the pareto-frontier, so there is no other policy π′ that gets equal-or-greater value in all states. Thus, given any other π′, there is a state in which it underperforms the reward of π in that state, so the quantity is negative. And for π itself, it obviously matches the behavior of π in all states, so it gets a value of 0. Thus, π gets the strictly optimal score amongst policies against this h, so the same holds after renormalization, and then the same holds for the expectation values w.r.t. the infradistribution.