% operators that are separated from the operand by a space
% autosize deliminaters
% operators that require brackets
% operators that require parentheses
% Paper specific
These are Appendices B and C for the essay Catastrophe Mitigation Using DRL. They appear in a separate post because of a length limit in the website.
##Appendix B
Given p=(a,o)∈A×O, we denote pA:=a, pO:=o.
#Proposition B.1
Consider a universe υ=(μ,r) which an O-realization of an MDP M with state function S, a stationary policy π∗:SMk→A, an arbitrary I-policy π0 and some γ∈(0,1). Then,
For the sake of encumbering the notation less, we will omit the parameter γ in functions that depend on it. We will use S implicitly, i.e. given F a function on SM and h∈hdomμ, F(h):=F(S(h)). Finally, we will omit Mπ∗, using the shorthand notations V:=VMπ∗, Q:=QMπ∗.
It is easy to see that the second term vanishes, yielding the desired result.
#Proposition B.2
Consider some τ∈(0,∞), T∈N+, a universe υ=(μ,r) that is an O-realization of M with state function S, a stationary policy π∗:SMk→A and an arbitrary I-policy π0:(A×O)∗k→A. For any n∈N, let π∗n be an I-policy s.t. for any h∈hdomμ
For the sake of encumbering the notation less, we will use S implicitly, i.e. given F a function on SM and h∈hdomμ, F(h):=F(S(h)). Also, we will omit Mπ∗, using the shorthand notations V:=VMπ∗, Q:=QMπ∗.
Denote ρ∗l:=μ⋈π∗l, ρ0:=μ⋈π0. We also use the shorthand notations rn:=r(x:n), Vn(γ):=V(x:n,γ), Qn(γ):=Q(x:n,xAn,γ). Both π∗l and π∗l+1 coincide with π∗ after (l+1)T, therefore
Fix γ∈(0,1), η∈(0,N−1) and T∈N+. Denote νk:=¯μk[σkSk]. To avoid cumbersome notation, whenever Mk should appear a subscript, we will replace it by k. Let (Ω,P∈ΔΩ) be a probability space\Comment{ and {Fn⊆2Ω}n∈N⊔{−1} a filtration of Ω}. Let K:Ω→[N] be \Comment{measurable w.r.t. F−1}a random variable and the following be stochastic processes\Comment{ adapted to F}
Zn,~Zn:Ω→Δ[N]
Jn:Ω→[N]
Ψn:Ω→A
An:Ω→¯A
Θn:Ω→¯O
We also define AΘ:n:Ω→¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗ by
AΘ:n:=A0Θ0A1Θ1…An−1Θn−1
(The following conditions on A and Θ imply that the range of the above is indeed in ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗.) Let D and D!k be as in Proposition C.1 (we assume w.l.o.g. that ϵ<1|A|). We construct Ω\Comment{, F}, K, Z, ~Z, J, Ψ, A and Θ s.t K is uniformly distributed and for any k∈[N], l∈N, m∈[T] and o∈O, denoting n=lT+m
Note that the last equation has the form of a Bayesian update which is allowed to be arbitrary when update is on “impossible” information.
We now construct the ¯I-policy π∗ s.t. for any n∈N, h∈¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗ s.t.Pr[AΘ:n=h]>0 and a∈¯A
π∗(a∣h):=Pr[An=a∣AΘ:n=h]
That is, we perform Thompson sampling at time intervals of size T, moderated by the delegation routine D, and discard from our belief state hypotheses whose probability is below η and hypotheses sampling which resulted in recommending “unsafe” actions i.e. actions that D refused to perform.
In order to prove π∗ has the desired property, we will define the stochastic processes Z!, ~Z!, J!, Ψ!, A! and Θ!, each process of the same type as its shriekless counterpart (thus Ω is constructed to accommodate them). These processes are required to satisfy the following:
Here, the factor of 2 comes from the difference between the equations for Zn and Z!n (we can construct and intermediate policy between π∗ and π?k and use the triangle inequality for dtv). We conclude
Without loss of generality, we can assume that ¯τ(1−γ)≪1 (because of the form of the bound we are proving), which implies that T(1−γ)≪1 and 1−γT≈T(1−γ)≈¯τ3/4(1−γ)3/4. We get
EUπkSkυk(γ)−EUπ∗¯υk[σkSk](γ)=O(¯τ1/4(1−γ)1/4)
##Appendix C
The following is a simple special case of what appeared as “Proposition A.2” in the previous essay, where we restrict π to be single-valued (the more general case isn’t needed).
#Proposition C.1
Fix an interface I=(A,O), N∈N, ϵ∈(0,1|A|), η∈(0,1N). Consider some {σk:(A×O)∗k→A}k∈[N]. Then, there exist D:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗×A→¯A and {D!k:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗×A→¯A}k∈[N] with the following properties. Given x∈(Aׯ¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O)∗, we denote x–– its projection to ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗. Thus, x––––∈(A×O)∗.
Given μ an I-environment, π:hdomμk→A, D′:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗×A→¯A and k∈[N], we can define Ξ[μ,σk,D′,π]∈Δ(Aׯ¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O)ω as follows
We require that for every π, μ and k as above, the following conditions hold
i. 1NN−1∑j=0Ex∼Ξ[μ,σj,D!j,π][∣∣{n∈N∣xn∈A×⊥ׯO}∣∣]≤lnNηln(1+ϵ(1−ϵ)(1−ϵ)/ϵ)=O(lnNηϵ)
ii. dtv(1N∑N−1j=0Ξ[μ,σj,D,π],1N∑N−1j=0Ξ[μ,σj,D!j,π])≤(N−1)η
iii. For all x∈hdom¯μ[σk], if D!k(x,π(x––))≠⊥ then σk(D!k(x,π(x––))∣x––)>0
iv. For all x∈hdom¯μ[σk], if D!k(x,π(x––))∉{π(x––),⊥} then σk(π(x––)∣x––)≤ϵ
The following appeared in the previous essay as “Proposition A.1”.
#Proposition C.2
Consider a probability space (Ω,P∈ΔΩ), N∈N, R⊆[0,1] a finite set and random variables U:Ω→R, K:Ω→[N] and J:Ω→[N]. Assume that K∗P=J∗P=ζ∈Δ[N] and I[K;J]=0. Then
Catastrophe Mitigation Using DRL (Appendices)
% operators that are separated from the operand by a space
% autosize deliminaters
% operators that require brackets
% operators that require parentheses
% Paper specific
These are Appendices B and C for the essay Catastrophe Mitigation Using DRL. They appear in a separate post because of a length limit in the website.
##Appendix B
Given p=(a,o)∈A×O, we denote pA:=a, pO:=o.
#Proposition B.1
Consider a universe υ=(μ,r) which an O-realization of an MDP M with state function S, a stationary policy π∗:SMk→A, an arbitrary I-policy π0 and some γ∈(0,1). Then,
EUπ∗Sυ(γ)−EUπ0υ(γ)=∞∑n=0γnEx∼μ⋈π0[VMπ∗(S(x:n),γ)−QMπ∗(S(x:n),xAn,γ)]
#Proof of Proposition B.1
For the sake of encumbering the notation less, we will omit the parameter γ in functions that depend on it. We will use S implicitly, i.e. given F a function on SM and h∈hdomμ, F(h):=F(S(h)). Finally, we will omit Mπ∗, using the shorthand notations V:=VMπ∗, Q:=QMπ∗.
For any x∈hdomωμ, it is easy to see that
EUπ∗Sυ=V(λ)=∞∑n=0γn(V(x:n)−γV(x:n+1))
Ur(x)=(1−γ)∞∑n=0γnr(x:n)
EUπ∗Sυ−Ur(x)=∞∑n=0γn(V(x:n)−(1−γ)r(x:n)−γV(x:n+1))
EUπ∗Sυ−Ur(x)=∞∑n=0γn(V(x:n)−Q(x:n,xAn)+Q(x:n,xAn)−(1−γ)r(x:n)−γV(x:n+1))
Taking expected value over x, we get
EUπ∗Sυ−EUπ0υ=∞∑n=0γn(Eμ⋈π0[V(x:n)−Q(x:n,xAn)]+Eμ⋈π0[Q(x:n,xAn)−(1−γ)r(x:n)−γV(x:n+1)])
It is easy to see that the second term vanishes, yielding the desired result.
#Proposition B.2
Consider some τ∈(0,∞), T∈N+, a universe υ=(μ,r) that is an O-realization of M with state function S, a stationary policy π∗:SMk→A and an arbitrary I-policy π0:(A×O)∗k→A. For any n∈N, let π∗n be an I-policy s.t. for any h∈hdomμ
π∗n(h):={π0(h) if |h|<nTπ∗(S(h)) otherwise
Assume that
i. For any h∈hdomμ suppπ0(h)⊆A0Mπ∗(S(h))
ii. For any s∈SM and γ∈(0,1) ∣∣∣dVMπ∗(s,γ)dγ∣∣∣≤τ
Then, for any γ∈(0,1),
EUπ∗Sυ(γ)−EUπ0υ(γ)≤(1−γ)∞∑n=0T−1∑m=0γnT+m(Ex∼μ⋈π∗n[r(x:nT+m)]−Ex∼μ⋈π0[r(x:nT+m)])+2τγT(1−γ)1−γT
#Proof of Proposition B.2
For the sake of encumbering the notation less, we will use S implicitly, i.e. given F a function on SM and h∈hdomμ, F(h):=F(S(h)). Also, we will omit Mπ∗, using the shorthand notations V:=VMπ∗, Q:=QMπ∗.
By Proposition B.1, for any l∈N
EUπ∗υ(γ)−EUπ∗lυ(γ)=∞∑n=0γnEx∼μ⋈π∗l[V(x:n,γ)−Q(x:n,xAn,γ)]
π∗l coincides with π∗ after lT, therefore the corresponding expected values vanish.
EUπ∗υ(γ)−EUπ∗lυ(γ)=lT−1∑n=0γnEx∼μ⋈π0[V(x:n,γ)−Q(x:n,xAn,γ)]
Subtracting the equalities for l+1 and l, we get
EUπ∗lυ(γ)−EUπ∗l+1υ(γ)=(l+1)T−1∑n=lTγnEx∼μ⋈π0[V(x:n,γ)−Q(x:n,xAn,γ)]
(1−γ)∞∑n=0γn(Ex∼μ⋈π∗l[r(x:n)]−Ex∼μ⋈π∗l+1[r(x:n)])=(l+1)T−1∑n=lTγnEx∼μ⋈π0[V(x:n,γ)−Q(x:n,xAn,γ)]
π∗l and π∗l+1 coincide until lT, therefore
(1−γ)∞∑n=lTγn(Ex∼μ⋈π∗l[r(x:n)]−Ex∼μ⋈π∗l+1[r(x:n)])=(l+1)T−1∑n=lTγnEx∼μ⋈π0[V(x:n,γ)−Q(x:n,xAn,γ)]
Denote ρ∗l:=μ⋈π∗l, ρ0:=μ⋈π0. We also use the shorthand notations rn:=r(x:n), Vn(γ):=V(x:n,γ), Qn(γ):=Q(x:n,xAn,γ). Both π∗l and π∗l+1 coincide with π∗ after (l+1)T, therefore
(1−γ)(l+1)T−1∑n=lTγn(Eρ∗l[rn]−Eρ0[rn])+γ(l+1)T(Eρ∗l[V(l+1)T(γ)]−Eρ0[V(l+1)T(γ)])=(l+1)T−1∑n=lTγnEρ0[Vn(γ)−Qn(γ)]
Denote V′(s,γ):=dV(s,γ)dγ. By the mean value theorem, for each s∈SM there is γ∗∈(γ,1) s.t.
V(s,γ)=V0(s)−V′(s,γ∗)⋅(1−γ)
V0(s)−τ(1−γ)≤V(s,γ)≤V0(s)+τ(1−γ)
It follows that
(1−γ)(l+1)T−1∑n=lTγnEρ∗l−ρ0[rn]+γ(l+1)T(Eρ∗l−ρ0[V0(l+1)T]+2τ(1−γ))≥(l+1)T−1∑n=lTγnEρ0[Vn(γ)−Qn(γ)]
Here, an expected value w.r.t. the difference of two probability measures is understood to mean the corresponding difference of expected values.
It is easy to see that assumption i implies that V0n is a submartingale for ρ0 (whereas it is a martingale for μ⋈π∗) and therefore
Eρ∗l−ρ0[V0(l+1)T]≤0
We get
(1−γ)(l+1)T−1∑n=lTγnEρ∗l−ρ0[rn]+2τγ(l+1)T(1−γ)≥(l+1)T−1∑n=lTγnEρ0[Vn(γ)−Qn(γ)]
Summing over l, we get
(1−γ)∞∑l=0(l+1)T−1∑n=lTγnEρ∗l−ρ0[rn]+2τγT(1−γ)1−γT≥∞∑n=0γnEρ0[Vn(γ)−Qn(γ)]
Applying Proposition B.1 to the right hand side
(1−γ)∞∑l=0(l+1)T−1∑n=lTγnEρ∗l−ρ0[rn]+2τγT(1−γ)1−γT≥EUπ∗υ(γ)−EUπ0υ(γ)
#Proof of Lemma A.1
Fix γ∈(0,1), η∈(0,N−1) and T∈N+. Denote νk:=¯μk[σkSk]. To avoid cumbersome notation, whenever Mk should appear a subscript, we will replace it by k. Let (Ω,P∈ΔΩ) be a probability space\Comment{ and {Fn⊆2Ω}n∈N⊔{−1} a filtration of Ω}. Let K:Ω→[N] be \Comment{measurable w.r.t. F−1}a random variable and the following be stochastic processes\Comment{ adapted to F}
Zn,~Zn:Ω→Δ[N]
Jn:Ω→[N]
Ψn:Ω→A
An:Ω→¯A
Θn:Ω→¯O
We also define AΘ:n:Ω→¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗ by
AΘ:n:=A0Θ0A1Θ1…An−1Θn−1
(The following conditions on A and Θ imply that the range of the above is indeed in ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗.) Let D and D!k be as in Proposition C.1 (we assume w.l.o.g. that ϵ<1|A|). We construct Ω\Comment{, F}, K, Z, ~Z, J, Ψ, A and Θ s.t K is uniformly distributed and for any k∈[N], l∈N, m∈[T] and o∈O, denoting n=lT+m
~Z0(k)≡1N
Zn(k)=~Zn(k)[[~Zn(k)≥η]]∑N−1j=0~Zn(j)[[~Zn(j)≥η]]
Pr[Jl=k∣ZlT]=ZlT(k)
Ψn=πJl(AΘ:n)
Pr[Θn=o∣AΘ:n]=νK(o∣AΘ:n)
An=D(AΘ:n,Ψn)
~Zn+1(k)N−1∑j=0Zn(j)[[An=D!j(AΘ:n,Ψn)]]νj(Θn∣AΘ:nAn)=Zn(k)[[An=D!k(AΘ:n,Ψn)]]νk(Θn∣AΘ:nAn)
Note that the last equation has the form of a Bayesian update which is allowed to be arbitrary when update is on “impossible” information.
We now construct the ¯I-policy π∗ s.t. for any n∈N, h∈¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗ s.t.Pr[AΘ:n=h]>0 and a∈¯A
π∗(a∣h):=Pr[An=a∣AΘ:n=h]
That is, we perform Thompson sampling at time intervals of size T, moderated by the delegation routine D, and discard from our belief state hypotheses whose probability is below η and hypotheses sampling which resulted in recommending “unsafe” actions i.e. actions that D refused to perform.
In order to prove π∗ has the desired property, we will define the stochastic processes Z!, ~Z!, J!, Ψ!, A! and Θ!, each process of the same type as its shriekless counterpart (thus Ω is constructed to accommodate them). These processes are required to satisfy the following:
~Z!0(k)≡1N
Z!n(k)=~Z!n(k)[[~Z!n(k)≥η]]∑N−1j=0~Z!n(j)[[~Z!n(j)≥η]][[~Z!n(K)≥η]]+[[K=k]]⋅[[~Z!n(K)<η]]
Pr[J!l=k∣Z!lT]=Z!lT(k)
Ψ!n=πJ!l(AΘ!:n)
Pr[Θ!n=o∣AΘ!:n]=νK(o∣AΘ!:n)
A!n=D!K(AΘ!:n,Ψ!n)
~Z!n+1(k)=Z!n(k)[[A!n=D!k(AΘ!:n,Ψ!n)]]νk(Θ!n∣AΘ!:nA!n)∑N−1j=0Z!n(j)[[A!n=D!j(AΘ!:n,Ψ!n)]]νj(Θ!n∣AΘ!:nA!n)
For any k∈[N], we construct the ¯I-policy π?k s.t. for any n∈N, h∈¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗ s.t.Pr[AΘ!:n=h, K=k]>0 and a∈¯A
π?k(a∣h):=Pr[A!n=a∣AΘ!:n=h, K=k]
Given any ¯I-policy π and I-policy σ we define ασπ:(A×O)∗k→¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗ by
ασπ(g∣h):=[[h=g–]]Ch|h|−1∏n=0∑a∈A([[gn∈⊥aO]]π(⊥∣g:n)σ(a∣h:n)+[[gn∈a⊥O]]π(a∣g:n))
Here, Ch∈R is a constant defined s.t. the probabilities sum to 1. We define the I-policy [σ]π–– by
[σ]π––(a∣h):=Prg∼ασπ(h)[π(g)=a∨(π(g)=⊥∧σ(h)=a)]
Condition iii of Proposition C.1 and condition i of Definition A.1 imply that for any h∈hdomμk
supp[σk]π––?k(h)⊆A0Mkπk(Sk(h))
This means we can apply Proposition B.2 and get
EUπkSkυk(γ)−EUπ?k¯υk[σkSk](γ)≤(1−γ)∞∑n=0T−1∑m=0γnT+m(Ex∼μk⋈π∗kn[r(x:nT+m)]−Ex∼νk⋈π?k[r(x––:nT+m)])+2¯τγT(1−γ)1−γT
Here, the I-policy π∗kn is defined as π∗n in Proposition B.2. We also define the ¯I-policies π!kn and π!!kn by
π!kn(a∣h):={π?k(a∣h) if |h|<nTPr[A!|h|=a∣AΘ!:|h|=h, K=k, J!n=k] otherwise
π!!kn(a∣h):=⎧⎪⎨⎪⎩π?k(a∣h) if |h|<nTπ!kn(a∣h)+π!kn(⊥∣h)⋅π∗kn(a∣h––) if |h|≥nT and a≠⊥0 if |h|≥nT and a=⊥
Denote
ρ∗kn:=μk⋈π∗kn
ρ!!kn:=νk⋈π!!kn
ρ!kn:=νk⋈π!kn
ρ?k:=νk⋈π?k
R?k:=EUπkSkυk(γ)−EUπ?k¯υk[σkSk](γ)
For each n∈N, denote
EU∗kn(γ):=1−γ1−γTT−1∑m=0γmEx∼ρ∗kn[r(x:nT+m)]
EU!!kn(γ):=1−γ1−γTT−1∑m=0γmEx∼ρ!!kn[r(x––:nT+m)]
EU!kn(γ):=1−γ1−γTT−1∑m=0γmEx∼ρ!kn[r(x––:nT+m)]
EU?kn(γ):=1−γ1−γTT−1∑m=0γmEx∼ρ?k[r(x––:nT+m)]
We have
R?k≤(1−γT)∞∑n=0γnT(EU∗kn(γ)−EU?kn(γ))+2¯τγT(1−γ)1−γT
R?k≤(1−γT)∞∑n=0γnT(EU∗kn(γ)−EU!!kn(γ)+EU!!kn(γ)−EU!kn(γ)+EU!kn(γ)−EU?kn(γ))+2¯τγT(1−γ)1−γT
Condition iv of Proposition C.1 and condition ii of Definition A.1 imply that, given h∈hdomνk s.t. |h|≥nT
suppπ!kn(h)⊆{πk(Sk(h)),⊥}
π!!kn(πk(Sk(h))∣h)=1
Therefore, π!!kn=π∗kn, and we remain with
R?k≤(1−γT)∞∑n=0γnT(EU!!kn(γ)−EU!kn(γ)+EU!kn(γ)−EU?kn(γ))+2¯τγT(1−γ)1−γT
We have
∣∣EU!!kn(γ)−EU!kn(γ)∣∣≤Prx∼ρ!kn[∃m∈[T]:xnT+m∈⊥¯O]
Since Z!nT(K)≥η, it follows that
∣∣EU!!kn(γ)−EU!kn(γ)∣∣≤1ηPrx∼ρ?k[∃m∈[T]:xnT+m∈⊥¯O]≤1ηEx∼ρ?k[∣∣{m∈[T]∣xnT+m∈⊥¯O}∣∣]
∞∑n=0∣∣EU!!kn(γ)−EU!kn(γ)∣∣≤1ηEx∼ρ?k[∣∣{n∈N∣xn∈⊥¯O}∣∣]
Using condition i of Proposition C.1, we conclude
R?k≤(1−γT)∞∑n=0γnT(EU!kn(γ)−EU?kn(γ))+O(1−γTη2+¯τ(1−γ)1−γT)
Define the random variables {U!n:Ω→[0,1]}n∈N by
U!n:=1−γ1−γTT−1∑m=0γmr(AΘ!:nT+m)
Averaging the previous inequality over k, we get
1NN−1∑k=0R?k≤(1−γT)∞∑n=0γnTE[E[U!n∣J!n=K, Z!nT]−E[U!n∣Z!nT]]+O(1−γTη2+¯τ(1−γ)1−γT)
1NN−1∑k=0R?k= ⎷(1−γT)∞∑n=0γnTE[(E[U!n∣J!n=K, Z!nT]−E[U!n∣Z!nT])2]+O(1−γTη2+¯τ(1−γ)1−γT)
We apply Proposition C.2 to each term in the sum over n.
1NN−1∑k=0R?k= ⎷(1−γT)∞∑n=0γnTE[12ηI[K;J!n,U!n∣Z!nT]]+O(1−γTη2+¯τ(1−γ)1−γT)
1NN−1∑k=0R?k= ⎷1−γT2η∞∑n=0γnTE[H(Z!nT)−H(Z!(n+1)T)]+O(1−γTη2+¯τ(1−γ)1−γT)
1NN−1∑k=0R?k=√1−γT2ηlnN+O(1−γTη2+¯τ(1−γ)1−γT)
1NN−1∑k=0R?k=O(√1−γTη+1−γTη2+¯τ(1−γ)1−γT)
Condition ii of Proposition C.1 implies that
dtv(1NN−1∑k=0¯νk[σk]⋈π∗, 1NN−1∑k=0¯νk[σk]⋈π?k)≤2(N−1)η
Here, the factor of 2 comes from the difference between the equations for Zn and Z!n (we can construct and intermediate policy between π∗ and π?k and use the triangle inequality for dtv). We conclude
EUπkSkυk(γ)−EUπ∗¯υk[σkSk](γ)=O(η+√1−γTη+1−γTη2+¯τ(1−γ)1−γT)
Now we set
η:=¯τ1/4(1−γ)1/4
T:=⌈¯τ3/4(1−γ)−1/4⌉
Without loss of generality, we can assume that ¯τ(1−γ)≪1 (because of the form of the bound we are proving), which implies that T(1−γ)≪1 and 1−γT≈T(1−γ)≈¯τ3/4(1−γ)3/4. We get
EUπkSkυk(γ)−EUπ∗¯υk[σkSk](γ)=O(¯τ1/4(1−γ)1/4)
##Appendix C
The following is a simple special case of what appeared as “Proposition A.2” in the previous essay, where we restrict π to be single-valued (the more general case isn’t needed).
#Proposition C.1
Fix an interface I=(A,O), N∈N, ϵ∈(0,1|A|), η∈(0,1N). Consider some {σk:(A×O)∗k→A}k∈[N]. Then, there exist D:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗×A→¯A and {D!k:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗×A→¯A}k∈[N] with the following properties. Given x∈(Aׯ¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O)∗, we denote x–– its projection to ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗. Thus, x––––∈(A×O)∗. Given μ an I-environment, π:hdomμk→A, D′:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O∗×A→¯A and k∈[N], we can define Ξ[μ,σk,D′,π]∈Δ(Aׯ¯¯¯¯¯¯¯¯¯¯¯¯¯¯A×O)ω as follows
Ξ[μ,σk,D′,π](b,a,o∣x):=π(b∣x––––)D′(a∣x––,b)¯μ[σk](o∣x––a)
We require that for every π, μ and k as above, the following conditions hold
i. 1NN−1∑j=0Ex∼Ξ[μ,σj,D!j,π][∣∣{n∈N∣xn∈A×⊥ׯO}∣∣]≤lnNηln(1+ϵ(1−ϵ)(1−ϵ)/ϵ)=O(lnNηϵ)
ii. dtv(1N∑N−1j=0Ξ[μ,σj,D,π],1N∑N−1j=0Ξ[μ,σj,D!j,π])≤(N−1)η
iii. For all x∈hdom¯μ[σk], if D!k(x,π(x––))≠⊥ then σk(D!k(x,π(x––))∣x––)>0
iv. For all x∈hdom¯μ[σk], if D!k(x,π(x––))∉{π(x––),⊥} then σk(π(x––)∣x––)≤ϵ
The following appeared in the previous essay as “Proposition A.1”.
#Proposition C.2
Consider a probability space (Ω,P∈ΔΩ), N∈N, R⊆[0,1] a finite set and random variables U:Ω→R, K:Ω→[N] and J:Ω→[N]. Assume that K∗P=J∗P=ζ∈Δ[N] and I[K;J]=0. Then
I[K;J,U]≥2(mini∈[N]ζ(i))(E[U∣J=K]−E[U])2