We prove the existence of a probability distribution over a theory T
with the property that for certain definable quantities φ, the
expectation of the value of a function
E[┌φ┐] is accurate, i.e. it equals the
actual expectation of φ; and with the property that it assigns
probability 1 to E behaving this way. This may be useful for
self-verification, by allowing an agent to satisfy a reflective
consistency property and at the same time believe itself or similar
agents to satisfy the same property. Thanks to Sam Eisenstat for
listening to an earlier version of this proof, and pointing out a
significant gap in the argument. The proof presented here has not been
vetted yet.
Problem statement
Given a distribution P coherent over a theory A, and some
real-valued function f on completions of A, we can define the
expectation E[f] of f according to P. Then
we can relax the probabilistic reflection principle by asking that for
some class of functions f, we have that
E[┌E[f]┐]=E[f], where E is a symbol in the language of A meant
to represent E. Note that this notion of
expectation-reflection is weaker than probabilistic reflection, since
our distribution is now permitted to, for example, assign a bunch of
probability mass to over- and under-estimates of E[f], as long as
they balance out.
Christiano asked whether it is possible to have a distribution that
satisfies this reflection principle, and also assigns probability 1 to
the statement that E satisfies this reflection principle. This was not
possible for strong probabilistic reflection, but it turns out to be
possible for expectation reflection, for some choice of the functions
f.
Sketch of the approach
(This is a high level description of what we are doing, so many concepts
will be left vague until later.)
Christiano et
al. applied
Kakutani’s theorem to the space of coherent P. Instead we
will work in the space of expectations over some theory T, where an
expectation over a theory is, roughly speaking, a function from the set
of variables provably defined by that theory, into the intervals proved
to bound each variable. These are essentially interchangeable with
coherent probability distributions over T. The point of doing this
is to make the language simpler, for example reflection statements will
mention a single symbol representing an expectation, rather than a
complicated formula defining the expectation in terms of probability.
We will again apply Kakutani’s theorem, now requiring that some
expectation G reflects F only when
G expects E to behave like F, and when
G assigns some significant probability to the statement
that E is reflective. This confidence in reflection must increase
the closer that F is to being reflective. Then a fixed
point of this correspondence will be expectation-reflective, and will
assign probability 1 to E being expectation-reflective.
The form of our correspondence will make most of the conditions of
Kakutani’s theorem straightforward. The main challenge will be to show
non-emptiness, i.e. that there is some expectation that reflects a given
F and believes in reflection to some extent. In the case of
probabilistic reflection, this does not go through at all, since if we
reflect a non-reflective probability distribution exactly, we must
assign probability 0 to reflection.
However, in the case of expectations, we can mix different expectations
together while maintaining expectation-reflection, by carefully
balancing the mixture. The main idea will be to take a distribution
GH that believes in some reflective
expectation H, take another distribution
GJ that believes in some pseudo-expectation
J, and mix them together. The resulting mixture will
somewhat expect E to be reflective, since
GH expects this, and by a good choice of
J counterbalancing H, G will
expect E to behave like F.
Before carrying out this approach, we need some formal notions and facts
about expectations, given in Sections 3 and 4. Also, in order to be careful
about what we mean by an expectation, a pseudo-expectation, and a
variable, we will in Section 5 develop a base theory T over which
our distributions will be defined. Then Section 6 will give the main
theorem, following the above sketch. Section 7 discusses the meaning
of these results and extensions to definable reflection.
Basic definitions and facts about expectations
We will work with probability distributions (or, in a moment,
expectations) that are coherent over some base theory T in a
language that can talk about rationals, functions, and has a symbol
E.
Random variables for theories and their bounds
These notions are due to Fallenstein.
We are interested in taking expectations of quantities expressed in the
language of T. This amounts to viewing a probability distribution
P coherent over T as a measure on the Stone space
ST, and then asking for the expectation
E[f]:=∫STfdP.
A natural choice for the kind of random variable f to look at is those
values definable over T, i.e. formulas φ(x) such that
T⊢∃!x∈R:φ(x). Then any completion of T will
make statements of the form
∀r∈R:φ(r)>a for various
a∈Q in a way consistent with φ holding on a
unique real, and perhaps we can extract a value for the random variable
φ.
However, we have to be a little careful. If this is all that T
proves about φ, then there will be completions of T which,
for everya∈Q, contain the statement
∀r∈R:φ(r)>a. Then there is no real
number reasonably corresponding to φ. Even if this is not an
issue, there are distributions which assign non-negligible probabilities
to a sequence of completions of T that put quickly growing values on
φ, such that the integral E[φ] does not
exist.
Therefore we also require that T proves some concrete bounds on the
real numbers that can satisfy φ(x). Then we will be able to
extract values for φ from completions of T and define the
expectation of φ according to P.
Definition
[Definition of bounded variables Var(A) for A.]
For any consistent theory A, the set Var(A) is the set of
formulas φ(x) such that A proves φ(x) is
well-defined and in some particular bounds, i.e.:
For φ∈Var(A), let [a,b]A,φ be the
complete bound put on φ by A, taking into account all bounds
on φ proved by A, i.e.
[a,b]A,φ:=⋂{[s,t]∣s,t∈Q,A⊢φ∈[s,t]}.=:
Note that the A-bound [a,b]A,φ on a variable
φ∈Var(A) is a well-defined non-empty closed
interval; it is the intersection of non-disjoint closed intervals all
contained in some rational interval, by the definition of
Var(A) and the fact that A is consistent.
Expectations and pseudo-expectations over a theory
The definition of Exp(A) and the theorem in Section 4 are
based on a comment of Eisenstat.
Now we define expectations over a theory, analogously to probability
distributions. Here linearity will play the role of coherence.
Definition
[Sum of two A-variables.] For
φ,ψ∈Var(A), we write φ+ψ for
the sum of the two variables, i.e.:
(φ+ψ)(x)⇔∃q,r∈R:x=q+r∧φ(q)∧ψ(r).
Then φ+ψ∈Var(A) for reasonable A.=:
Definition
[Expectations Exp(A) over a theory A.] An expectation
over a theory A is a function
E:Var(A)→R such that for all
φ,ψ∈Var(A):
(In A-bounds)
E[φ]∈[a,b]A,φ, i.e.
E takes values in the bounds proved by A, and
(Linear)
E[φ+ψ]=E[φ]+E[ψ].
=:
In order to carry out the counterbalancing argument described above, we
need some rather extreme affine combinations of expectations. So extreme
that they will not even be proper expectations; so we define
pseudo-expectations analogously to expectations but with much looser
bounds on their values.
Definition
[Pseudo-expectations PseudoExp(A) over a theory A.] A
pseudo-expectation over a theory A is a function
E:Var(A)→R such that for all
φ,ψ∈Var(A):
(Loosely in A-bounds) If φ has A-bound
[a,b]A,φ, we have that
E[φ]∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)],
and
(Linear)
E[φ+ψ]=E[φ]+E[ψ].
=:
For any theories A⊂B, we have that
Exp(B)⊂Exp(A)⊂PseudoExp(A)
and Exp(B)⊂PseudoExp(B)⊂PseudoExp(A). We are implicitly restricting elements of
Exp(B) and PseudoExp(B) to Var(A)
in these comparisons, and will do so freely in what follows. We take the
product topology on both Exp(A) and
PseudoExp(A).
Isomorphism of expectations and probability distributions
To actually construct elements of Exp(A), we will use a
natural relationship between probability distributions P
and expectations E over a theory, proved formally below to
be an isomorphism. On the one hand we can get a probability distribution
from E by taking the expectation of indicator variables for
the truth of sentences; on the other hand we can get an expectation from
a probability distribution by integrating a variable over the Stone
space of our theory.
Definition
[The value of an A-variable.] For a complete theory A, the value
A(φ) of some A-variable φ is
sup{q∈Q∣A⊢∀x:φ(x)→x>q}.
Since φ∈Var(A), this value is well-defined, and
A(φ)∈[a,b]A,φ.=:
Theorem
For any theory A, there is a canonical isomorphism ι between
Exp(A) and the space of coherent probability distributions
over A, given by:
ι:Exp(A)→Δ(A)
ι(E)(θ):=E[Ind(θ)],
where Ind(θ) is the 0-1 valued indicator variable
for the sentence θ, i.e.
Ind(θ):=(x=0∧¬θ)∨(x=1∧θ).
The alleged inverse ι−1 is given by:
ι−1(P)[φ(x)]:=∫A′∈SAA′(φ(x))dP.
Proof. By the previous discussion, ι and ι−1
are well-defined in the sense that they return functions of the correct
type.
ι−1(P)∈Exp(A)
By definition of Var(A), the integrals in the definition of
ι−1(P) are defined and within A-bounds. For
any φ,ψ∈Var(A) and any
a,b∈Q, we have that
A⊢(∀x:φ(x)→x>a)∧(∀y:ψ(y)→y>b)→(∀z:(φ+ψ)(z)→z>a+b)
and
Thus
ι−1(P)[φ+ψ]=ι−1(P)[φ]+ι−1(P)[ψ], so ι−1(P)
is linear and hence is an expectation.
ι(E)∈Δ(A)
For any θ∈A, we have that
A⊢Ind(θ)=1, so since E is in
A-bounds,
ι(E)(θ)=E[Ind(θ)]=1.
Similarly, for any partition of truth into three sentences, A proves
the indicators of those sentences have values summing to 1; so
E assigns values to their indicators summing to 1, using
linearity a few times and the fact that E assigns the same
value to variables with A⊢∀x:φ(x)↔ψ(x).
This last fact follows by considering the A-bound of [0,0] on the
variable φ(x)+(−ψ(x)). Linearity gives that
0=E[φ(x)+(−ψ(x))]=E[φ(x)]+E[−ψ(x)], so
E[φ(x)]=−E[−ψ(x)]. If
φ≡ψ this gives
E[ψ(x)]=−E[−ψ(x)], so that in general
E[φ(x)]=E[ψ(x)], as desired.
since any completion of A with A⊢θ also has
A⊢Ind(θ)=1, and any completion of A with
A⊢¬θ also has
A⊢Ind(θ)=0.
ι is continuous
Take a θ sub-basis open subset of Δ(A), the set of
distributions assigning probability in (a,b) to θ. The
preimage of this set is the set of expectations with
E[Ind(θ)]∈(a,b), which is an open
subset of Exp(A).
ι−1∘ι is identity
Take any E∈Exp(A). We want to show that
E[φ(x)]=∫A′∈SAA′(φ(x))d(ιE)
for all φ(x)∈Var(A). In the following we will
repeatedly apply linearity and the fact shown above that E
respects provable equivalence of variables. Take such a φ(x)
and assume for clarity that the A-bound of φ(x) is
[0,1]. Then for any n∈N, we have that
Note that the last interval in these sums is closed instead of
half-open. Since A proves that
(φ−kn)Ind(φ∈[kn,k+1n)) is non-negative,
E[φ]≥∑k∈[n](kn)E[Ind(φ∈[kn,k+1n))].
By the arguments given earlier,
E[Ind(φ∈[kn,k+1n))]=∫A′∈SAA′(Ind(φ∈[kn,k+1n)))d(ιE).
Hence
E[φ]≥∑k∈[n](kn)∫A′∈SAA′(Ind(φ∈[kn,k+1n)))d(ιE)E[φ]≥∑k∈[n](kn)ιE(φ∈[kn,k+1n)).
As n→∞, the right is the definition of the Lebesgue integral
of φ. Combining this with a similar argument giving an upper
bound on E[φ], we have that
E[φ(x)]=∫A′∈SAA′(φ(x))d(ιE)
as desired.
ι−1 is continuous
Take a φ(x) sub-basis open set in Exp(A), the set
of expectations assigning a value in (a,b) to φ. Let
P be a probability distribution with
ι−1(P)[φ]∈(a,b). As in the previous
section of the proof, we can cut up the bound [c,d]A,φ
into finitely many very small intervals. Then any probability
distribution that assigns probabilities sufficiently close to those
assigned by P to the indicators for φ being in
those small intervals, will have an expectation for φ that is
also inside (a,b). This works out to an open set around
P, so that the preimage of the φ(x) sub-basis
open set is a union of open sets. ⊣
A base theory that accommodates reflection variables
So, we have a dictionary between distributions and expectations. This
will let us build expectations by completing theories and taking
expectations according to the resulting 0-1 valued distribution.
Some preparatory work remains, because in order to have the reflection
principle
E[E[┌φ┐]]=E[φ],
we at least want E[┌φ┐] to be a variable
whenever φ is. Thus we will need a theory T that bounds
E[┌φ┐] whenever it bounds φ.
However, in order to make extreme mixes of elements of
Exp(T) possible to reflect into an expectation over
T, we will need that all elements of PseudoExp(T) are
valid interpretations of E for T.
Stratified definition of the base theory T
We start with a theory such as ZFC that is strong enough to
talk about rational numbers and so on. We add to the language a symbol
E that will represent an expectation. We also add the sentence
stating that E is a partial function from N to
R, and that E is linear at φ+ψ if it
happens to be defined on φ,ψ, and φ+ψ. This
gives the theory T0.
Now define inductively the theories Tn+1⊃Tn:
Tn+1:=Tn+∀┌φ┐,k∈N:∀a,b∈Q:[k witnesses Tn⊢(∃!x∈R:φ(x))∧(∀x:φ(x)→x∈[a,b])]→(∃!x∈R:E[┌φ┐]=x)∧(∀x:E[┌φ┐]=x→x∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)])
In English, this says that Tn+1 is Tn along with the
statement that whenever Tn proves that some φ is
well-defined and bounded in some interval [a,b], then it is the case
that E is defined on φ and
E[┌φ┐] is inside the much looser bound
[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)].
Intuitively we are adding into Var(Tn+1) the variable
E[┌φ┐] whenever
φ∈Var(Tn), but we are not restricting its
value very much at all. The form of the loose bound on
E[┌φ┐] is an artifact of the metric we
will later put on Exp(T).
Finally, we define the base theory we will use in the main argument as
the limit of the Tn, that is:
T:=⋃n∈NTn. Note that T is at
least (exactly?) as strong as (T0)ω, the theory T0
with ω-iterated consistency statements, since the loose bounds
are the same as the true bounds when the true bound is [a,a]. Also
note that it is important that T0 is arithmetically sound, or
else T may believe in nonstandard proofs and hence put inconsistent
bounds on E. I think this restriction could be avoided by making the
statement in Tn+1−Tn into a schema over specific standard
naturals that might be proofs.
Soundness of T over PseudoExp(T)
We will be applying Kakutani’s theorem to the space
Exp(T), and making forays into
PseudoExp(T). So we want T to at least be consistent,
so that Exp(T) is nonempty, and furthermore we want T
to allow for E to be interpreted by anything in
PseudoExp(T).
Recall that a (pseudo)expectation over a theory A is a function
E:Var(A)→R that is linear, and
such that given φ with A-bound [a,b]A,φ, we
have that E[φ]∈[a,b] (or
E[φ]∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)]).
As noted before, for any theories A⊂B, we have that
Exp(B)⊂Exp(A)⊂PseudoExp(A)
and Exp(B)⊂PseudoExp(B)⊂PseudoExp(A), where we are restricting elements of
Exp(B) and PseudoExp(B) to
Var(A).
Lemma
For any consistent theory A, Exp(A) is nonempty.
This follows from the isomorphism ι−1; we take a completion
of A, which is a coherent probability distribution P over
A, and then take expectations according to P. That is,
ι−1(P)∈Exp(A). ⊣
We assume that we have some standard model for the theory over which
T was constructed. For concreteness we take that theory to be
ZFC, and we take the standard model to be the cumulative
hierarchy V.
Theorem
Exp(T) is nonempty, and for all
J∈PseudoExp(T), we have that
(V,J)⊨T.
(To follow the proof, keep in mind the distinction between
E being a (pseudo)expectation over a theory, versus
E providing a model for a theory.)
Proof. The claim is true for T0 in place of T, since
T0 is consistent and places no restrictions other than linearity
on E.
Say the claim holds for Tn, so PseudoExp(Tn)
is non-empty. For any J∈PseudoExp(Tn),
by hypothesis (V,J)⊨Tn. Also, by definition of
PseudoExp(Tn), J satisfies that
whenever Tn bounds φ in [a,b], also
J[φ]∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)].
Hence (V,J)⊨Tn+1. Thus Tn+1 is
consistent. Since
PseudoExp(Tn+1)⊂PseudoExp(Tn),
this also shows that for all
J∈PseudoExp(Tn+1), we have
(V,J)⊨Tn+1.
By induction the claim holds for all n, and hence T is consistent
and Exp(T) is nonempty. Since
PseudoExp(T)⊂PseudoExp(Tn)
for all n, for any J∈PseudoExp(T) we
have (V,J)⊨Tn, and hence
(V,J)⊨T. ⊣
Main theorem: reflection and assigning probability 1 to reflection
We have a theory T that is consistent, so that
Exp(T) is nonempty, and sound over all
pseudo-expectations. We want an expectation that is reflective, and also
believes that it is reflective. First we formalize this notion and show
that there are reflective expectations.
Existence of reflective expectations
Define the sentence
refl:=∀n∈N:(E[n] defined)→(E[┌E[n]┐] defined, and E[┌E[n]┐]=E[n]).
This says that whenever E is defined on some variable, it expects
E to take some value on that variable, and it expects the correct
value. In short, its expectations about its expectations are correct.
Define Refl(T)⊂Exp(T) to be the
reflective expectations over T, i.e. those that satisfy
refl.
Some observations: the spaces
Refl(T)⊂Exp(T)⊂PseudoExp(T)⊂[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)]Var(T)T,φ
are all compact, as they are closed subsets of the product of the loose
bounds on Var(T), that product being a compact space.
Both Exp(T) and PseudoExp(T) are
convex, as linearity and being in bounds are preserved by convex
combinations. (For the same reason, Refl(T) is convex,
and is in fact an affine subspace of Exp(T).)
Lemma
Refl(T) is nonempty.
Proof. We apply Kakutani’s theorem to Exp(T) where
G corresponds to F when
∀φ∈Var(T):G[E[┌φ┐]]=F[φ].
The set of G corresponding to F is compact and
convex, and the graph is closed. For any F there is a
corresponding G: we take an expectation over the theory
TF:=T+{E[┌φ┐]∈(a,b)∣a,b∈Q,F[φ]∈(a,b)}
stating that E behaves according to F. This theory
TF is consistent because F provides a
model. Any completion T′F has
T′F(E[┌φ┐])=F[φ],
so the resulting expectation corresponds to F. Kakutani’s
theorem gives a fixed point of this correspondence, which is in
Refl. ⊣
The correspondence ⊲E: exact reflection and assigning high probability to reflection for distributions close to reflective
We can’t simply take a correspondence ⊲E that also
requires G to assign probability 1 to refl; in
general there would not be any expectation corresponding to any
F∈Exp(T)−Refl(T). Instead
we will soften this requirement, and only require that
G[refl] approach 1 as F approaches
being reflective, in order for F⊲EG.
Definition
Define a metric on Exp(T) by
d(F,G):=∑φ∈Var(T)|F[φ]−G[φ]|2┌φ┐|[a,b]T,φ|.
(If |[a,b]T,φ|=0 then the φ coordinate plays
no role in the metric by fiat.)=:
The factor of 1/2┌φ┐ ensures that the
metric will converge, since the factor of 1/|[a,b]T,φ|
corrects the projection of Exp(T) in each coordinate to
be [0,1].
We abbreviate
d⟨F⟩:=d(F,Refl)=minH∈Refld(F,H)
to mean the distance from F to the nearest element of the
set Refl. Since Refl is compact, this is
well-defined and continuous on Exp(T).
Definition
For F,G∈Exp(T), we say that
G reflects F and we write
F⊲EG precisely when:
G expects E to behave just like
F, i.e.
∀φ∈Var(T):G[E[┌φ┐]]=F[φ],
and
G is somewhat confident that E is reflective,
specifically
G[refl]≥1−d⟨F⟩.
=:
Fixed points of the correspondence are reflective and believe they are reflective
Say G⊲EG. Then
G∈Refl(T), by definition of
⊲E. In particular, d⟨G⟩=0, so
that G[refl]=1, and G the desired
distribution.
Compact and convex images; closed graph
For a fixed F, the conditions for
F⊲EG are just closed subintervals in
some coordinates, so
{G∣F⊲EG} is
compact and convex.
Consider a sequence
F0⊲EG0,F1⊲EG1,…,
converging to F and G. For
φ∈Var(T), since
Gn[E[┌φ┐]]=Fn[φ]→F[φ], we have
Gn[E[┌φ┐]]→G[E[┌φ┐]]=F[φ].
Also, since
d⟨Fn⟩→d⟨F⟩, we
have that the
Gn[refl]≥1−d⟨Fn⟩
converge to something at least 1−d⟨F⟩, so
G[refl]≥1−d⟨F⟩.
Thus
⊲E⊂Exp(T)×Exp(T)
is closed.
Images of the correspondence are nonempty: interpolating reflective and pseudo-expectations
Finally, we need to show that for any
F∈Exp(T), there is some
G∈Exp(T0) such that
F⊲EG. (The case distinction is just
for explanatory purposes.)
Case 1. F∈Refl(T).
Recall the theory
TF:=T+{E[┌φ┐]∈(a,b)∣a,b∈Q,F[φ]∈(a,b)}
stating that E behaves according to F. By the theorem about T,
(V,F)⊨T, so along with
F∈Refl(T) we also have
(V,F)⊨TF+refl. Thus that
theory is consistent, so we can take some
G∈Exp(TF+refl). This G expects E to behave like
F, and
G[refl]=1≥1−d⟨F⟩=1.
Case 2. F∉Refl(T).
Pick some H∈Refl(T) with
d(F,H)=d⟨F⟩>0. As in
the previous case, find some
GH∈Exp(TH+refl),
so GH expects E to behave like
H, and GH[refl]=1. We
will define G with F⊲EG
by taking a convex combination of GH with
another GJ∈Exp(T):
G:=(1−d⟨F⟩)GH+d⟨F⟩GJ.
By convexity, G∈Exp(T), and since
GJ[refl]∈[0,1], we will have
G[refl]≥(1−d⟨F⟩) as
desired.
However, we also need
G[E[┌φ┐]]=F[φ].
That is, we need ((1−d⟨F⟩)GH+d⟨F⟩GJ)[E[┌φ┐]]=F[φ]GJ[E[┌φ┐]]=F[φ]−(1−d⟨F⟩)GH[E[┌φ┐]]d⟨F⟩J[φ]:=1d⟨F⟩F[φ]+(1−1d⟨F⟩)H[φ],
where GJ believes that E behaves like
J. We take the last line to be the definition of
J.
In general, this function J is not in
Exp(T). It may be that d(F,H)
is very small, but for some large φ, F[φ]
is large and H[φ] is small, so that
J[φ] is very large and actually outside of
[a,b]T,φ, and hence not an expectation. However,
J is, in fact, a pseudo-expectation over T:
J[φ]=H[φ]+1d⟨F⟩(F[φ]−H[φ])J[φ]∈[a−K,b+K], where
H[φ]∈[a,b]T,φ, and
K:=1d⟨F⟩(|F[φ]−H[φ]|).
That is, the claim is that K≤(b−a)(2┌φ┐). Indeed:
Therefore J∈PseudoExp(T). By the theorem on T,
(V,J)⊨T, so that TJ is consistent
and we obtain GJ∈Exp(T) that
expects E to behave like J. Then
G=(1−d⟨F⟩)GH+d⟨F⟩GJ
is in Exp(T), expects E to behave like
F, and has G[refl]≥(1−d⟨F⟩). That is,
F⊲EG.
The conditions of Kakutani’s theorem are satisfied, so there is a fixed
point E⊲EE, and therefore we have an
expectation that believes E behaves like itself, and that assigns
probability 1 to E having this property. ⊣
Extension to belief in any generic facts about Refl
The above argument goes through in exactly the same way for any
statement θ that is satisfied by all reflective expectations;
we just have GH also assign probability 1 to
θ, and modify ⊲E by adding a condition for
θ analogous to that for refl. For example, we can
have our reflective E assign probability 1 to
E∈Exp(T), which is analogous to an inner coherence
principle.
Discussion
I think that if the base theory is strong enough to prove
Exp(T)≅Δ(T), then this whole argument can be
carried out with E defined in terms of P, a symbol for a
probability distribution, and so we get a probability distribution over
the original language with the desired beliefs about itself as a
probability distribution.
I think it should be possible to have a distribution that is reflective
in the sense of ⊲E be definable and reflective for its
definition, using the methods from this
post. But it doesn’t seem as
straightforward here. One strategy might be to turn the sentence in the
definition of Tn+1, stating that E is in the loose
Tn-bounds on variables, into a schema, and diagonalizing at once
against all the Tn refuting finite behaviors. But, the proof of
soundness of T over pseudo-expectations, and diagonalizing also
against refuting finite behaviors in conjunction with refl,
seems to require a little more work (and may be false).
It would be nice to have a good theory of logical probability. The
existence proof of an expectation-reflective distribution given here
shows that expectation-reflection is a desideratum that might be
achievable in a broader context (i.e. in conjunction with other
desiderata).
I don’t know what class of variables a ⊲E-reflective
E is reflective for. Universes that use E in a
way that only looks at E’s opinions on variables in
Var(Tn) for some n, and are defined and uniformly
bounded whenever E is in PseudoExp(Tn),
will be reflected accurately. If the universe looks at all of
E, and for instance does something crazy if E
is not in Exp(T), then T may not be able to prove
Existence of distributions that are expectation-reflective and know it
We prove the existence of a probability distribution over a theory T with the property that for certain definable quantities φ, the expectation of the value of a function E[┌φ┐] is accurate, i.e. it equals the actual expectation of φ; and with the property that it assigns probability 1 to E behaving this way. This may be useful for self-verification, by allowing an agent to satisfy a reflective consistency property and at the same time believe itself or similar agents to satisfy the same property. Thanks to Sam Eisenstat for listening to an earlier version of this proof, and pointing out a significant gap in the argument. The proof presented here has not been vetted yet.
Problem statement
Given a distribution P coherent over a theory A, and some real-valued function f on completions of A, we can define the expectation E[f] of f according to P. Then we can relax the probabilistic reflection principle by asking that for some class of functions f, we have that E[┌E[f]┐]=E[f], where E is a symbol in the language of A meant to represent E. Note that this notion of expectation-reflection is weaker than probabilistic reflection, since our distribution is now permitted to, for example, assign a bunch of probability mass to over- and under-estimates of E[f], as long as they balance out.
Christiano asked whether it is possible to have a distribution that satisfies this reflection principle, and also assigns probability 1 to the statement that E satisfies this reflection principle. This was not possible for strong probabilistic reflection, but it turns out to be possible for expectation reflection, for some choice of the functions f.
Sketch of the approach
(This is a high level description of what we are doing, so many concepts will be left vague until later.)
Christiano et al. applied Kakutani’s theorem to the space of coherent P. Instead we will work in the space of expectations over some theory T, where an expectation over a theory is, roughly speaking, a function from the set of variables provably defined by that theory, into the intervals proved to bound each variable. These are essentially interchangeable with coherent probability distributions over T. The point of doing this is to make the language simpler, for example reflection statements will mention a single symbol representing an expectation, rather than a complicated formula defining the expectation in terms of probability.
We will again apply Kakutani’s theorem, now requiring that some expectation G reflects F only when G expects E to behave like F, and when G assigns some significant probability to the statement that E is reflective. This confidence in reflection must increase the closer that F is to being reflective. Then a fixed point of this correspondence will be expectation-reflective, and will assign probability 1 to E being expectation-reflective.
The form of our correspondence will make most of the conditions of Kakutani’s theorem straightforward. The main challenge will be to show non-emptiness, i.e. that there is some expectation that reflects a given F and believes in reflection to some extent. In the case of probabilistic reflection, this does not go through at all, since if we reflect a non-reflective probability distribution exactly, we must assign probability 0 to reflection.
However, in the case of expectations, we can mix different expectations together while maintaining expectation-reflection, by carefully balancing the mixture. The main idea will be to take a distribution GH that believes in some reflective expectation H, take another distribution GJ that believes in some pseudo-expectation J, and mix them together. The resulting mixture will somewhat expect E to be reflective, since GH expects this, and by a good choice of J counterbalancing H, G will expect E to behave like F.
Before carrying out this approach, we need some formal notions and facts about expectations, given in Sections 3 and 4. Also, in order to be careful about what we mean by an expectation, a pseudo-expectation, and a variable, we will in Section 5 develop a base theory T over which our distributions will be defined. Then Section 6 will give the main theorem, following the above sketch. Section 7 discusses the meaning of these results and extensions to definable reflection.
Basic definitions and facts about expectations
We will work with probability distributions (or, in a moment, expectations) that are coherent over some base theory T in a language that can talk about rationals, functions, and has a symbol E.
Random variables for theories and their bounds
These notions are due to Fallenstein.
We are interested in taking expectations of quantities expressed in the language of T. This amounts to viewing a probability distribution P coherent over T as a measure on the Stone space ST, and then asking for the expectation
E[f]:=∫STfdP .
A natural choice for the kind of random variable f to look at is those values definable over T, i.e. formulas φ(x) such that T⊢∃!x∈R:φ(x). Then any completion of T will make statements of the form ∀r∈R:φ(r)>a for various a∈Q in a way consistent with φ holding on a unique real, and perhaps we can extract a value for the random variable φ.
However, we have to be a little careful. If this is all that T proves about φ, then there will be completions of T which, for every a∈Q, contain the statement ∀r∈R:φ(r)>a. Then there is no real number reasonably corresponding to φ. Even if this is not an issue, there are distributions which assign non-negligible probabilities to a sequence of completions of T that put quickly growing values on φ, such that the integral E[φ] does not exist.
Therefore we also require that T proves some concrete bounds on the real numbers that can satisfy φ(x). Then we will be able to extract values for φ from completions of T and define the expectation of φ according to P.
Definition
[Definition of bounded variables Var(A) for A.]
For any consistent theory A, the set Var(A) is the set of formulas φ(x) such that A proves φ(x) is well-defined and in some particular bounds, i.e.:
φ∈Var(A)⇔∃a,b∈Q:A⊢[∃!x∈R:φ(x)]∧[∀x∈R:φ(x)→x∈[a,b]] .
Elements of Var(A) are called A-variables.=:
Definition
[Definition of A-bounds on variables.]
For φ∈Var(A), let [a,b]A,φ be the complete bound put on φ by A, taking into account all bounds on φ proved by A, i.e.
[a,b]A,φ:=⋂{[s,t]∣s,t∈Q,A⊢φ∈[s,t]} .=:
Note that the A-bound [a,b]A,φ on a variable φ∈Var(A) is a well-defined non-empty closed interval; it is the intersection of non-disjoint closed intervals all contained in some rational interval, by the definition of Var(A) and the fact that A is consistent.
Expectations and pseudo-expectations over a theory
The definition of Exp(A) and the theorem in Section 4 are based on a comment of Eisenstat.
Now we define expectations over a theory, analogously to probability distributions. Here linearity will play the role of coherence.
Definition
[Sum of two A-variables.] For φ,ψ∈Var(A), we write φ+ψ for the sum of the two variables, i.e.:
(φ+ψ)(x)⇔∃q,r∈R:x=q+r∧φ(q)∧ψ(r) .
Then φ+ψ∈Var(A) for reasonable A.=:
Definition
[Expectations Exp(A) over a theory A.] An expectation over a theory A is a function E:Var(A)→R such that for all φ,ψ∈Var(A):
(In A-bounds) E[φ]∈[a,b]A,φ, i.e. E takes values in the bounds proved by A, and
(Linear) E[φ+ψ]=E[φ]+E[ψ].
=:
In order to carry out the counterbalancing argument described above, we need some rather extreme affine combinations of expectations. So extreme that they will not even be proper expectations; so we define pseudo-expectations analogously to expectations but with much looser bounds on their values.
Definition
[Pseudo-expectations PseudoExp(A) over a theory A.] A pseudo-expectation over a theory A is a function E:Var(A)→R such that for all φ,ψ∈Var(A):
(Loosely in A-bounds) If φ has A-bound [a,b]A,φ, we have that E[φ]∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)], and
(Linear) E[φ+ψ]=E[φ]+E[ψ].
=:
For any theories A⊂B, we have that Exp(B)⊂Exp(A)⊂PseudoExp(A) and Exp(B)⊂PseudoExp(B)⊂PseudoExp(A). We are implicitly restricting elements of Exp(B) and PseudoExp(B) to Var(A) in these comparisons, and will do so freely in what follows. We take the product topology on both Exp(A) and PseudoExp(A).
Isomorphism of expectations and probability distributions
To actually construct elements of Exp(A), we will use a natural relationship between probability distributions P and expectations E over a theory, proved formally below to be an isomorphism. On the one hand we can get a probability distribution from E by taking the expectation of indicator variables for the truth of sentences; on the other hand we can get an expectation from a probability distribution by integrating a variable over the Stone space of our theory.
Definition
[The value of an A-variable.] For a complete theory A, the value A(φ) of some A-variable φ is sup{q∈Q∣A⊢∀x:φ(x)→x>q}. Since φ∈Var(A), this value is well-defined, and A(φ)∈[a,b]A,φ.=:
Theorem
For any theory A, there is a canonical isomorphism ι between Exp(A) and the space of coherent probability distributions over A, given by:
ι:Exp(A)→Δ(A)
ι(E)(θ):=E[Ind(θ)] ,
where Ind(θ) is the 0-1 valued indicator variable for the sentence θ, i.e. Ind(θ):=(x=0∧¬θ)∨(x=1∧θ). The alleged inverse ι−1 is given by:
ι−1(P)[φ(x)]:=∫A′∈SAA′(φ(x))dP .
Proof. By the previous discussion, ι and ι−1 are well-defined in the sense that they return functions of the correct type.
ι−1(P)∈Exp(A)
By definition of Var(A), the integrals in the definition of ι−1(P) are defined and within A-bounds. For any φ,ψ∈Var(A) and any a,b∈Q, we have that A⊢(∀x:φ(x)→x>a)∧(∀y:ψ(y)→y>b)→(∀z:(φ+ψ)(z)→z>a+b) and
A⊢(∀x:(φ+ψ)(x)→x>a)→∃b,c:(∀y:φ(y)→y>b)∧(∀z:ψ(z)→z>c) .
Thus ι−1(P)[φ+ψ]=ι−1(P)[φ]+ι−1(P)[ψ], so ι−1(P) is linear and hence is an expectation.
ι(E)∈Δ(A)
For any θ∈A, we have that A⊢Ind(θ)=1, so since E is in A-bounds, ι(E)(θ)=E[Ind(θ)]=1. Similarly, for any partition of truth into three sentences, A proves the indicators of those sentences have values summing to 1; so E assigns values to their indicators summing to 1, using linearity a few times and the fact that E assigns the same value to variables with A⊢∀x:φ(x)↔ψ(x).
This last fact follows by considering the A-bound of [0,0] on the variable φ(x)+(−ψ(x)). Linearity gives that 0=E[φ(x)+(−ψ(x))]=E[φ(x)]+E[−ψ(x)], so E[φ(x)]=−E[−ψ(x)]. If φ≡ψ this gives E[ψ(x)]=−E[−ψ(x)], so that in general E[φ(x)]=E[ψ(x)], as desired.
ι∘ι−1 is identity
For any P∈Δ(A) and any sentence θ, we have
ι∘ι−1(P)(θ)=ι−1(P)[Ind(θ)]=∫A′∈SAA′(Ind(θ))dP=P(θ) ,
since any completion of A with A⊢θ also has A⊢Ind(θ)=1, and any completion of A with A⊢¬θ also has A⊢Ind(θ)=0.
ι is continuous
Take a θ sub-basis open subset of Δ(A), the set of distributions assigning probability in (a,b) to θ. The preimage of this set is the set of expectations with E[Ind(θ)]∈(a,b), which is an open subset of Exp(A).
ι−1∘ι is identity
Take any E∈Exp(A). We want to show that
E[φ(x)]=∫A′∈SAA′(φ(x))d(ιE)
for all φ(x)∈Var(A). In the following we will repeatedly apply linearity and the fact shown above that E respects provable equivalence of variables. Take such a φ(x) and assume for clarity that the A-bound of φ(x) is [0,1]. Then for any n∈N, we have that
E[φ]=∑k∈[n]E[φInd(φ∈[kn,k+1n))] E[φ]=∑k∈[n](kn)E[Ind(φ∈[kn,k+1n))]+E[(φ−kn)Ind(φ∈[kn,k+1n))] .
Note that the last interval in these sums is closed instead of half-open. Since A proves that (φ−kn)Ind(φ∈[kn,k+1n)) is non-negative,
E[φ]≥∑k∈[n](kn)E[Ind(φ∈[kn,k+1n))] .
By the arguments given earlier, E[Ind(φ∈[kn,k+1n))]=∫A′∈SAA′(Ind(φ∈[kn,k+1n)))d(ιE) .
Hence E[φ]≥∑k∈[n](kn)∫A′∈SAA′(Ind(φ∈[kn,k+1n)))d(ιE) E[φ]≥∑k∈[n](kn)ιE(φ∈[kn,k+1n)) . As n→∞, the right is the definition of the Lebesgue integral of φ. Combining this with a similar argument giving an upper bound on E[φ], we have that
E[φ(x)]=∫A′∈SAA′(φ(x))d(ιE) as desired.
ι−1 is continuous
Take a φ(x) sub-basis open set in Exp(A), the set of expectations assigning a value in (a,b) to φ. Let P be a probability distribution with ι−1(P)[φ]∈(a,b). As in the previous section of the proof, we can cut up the bound [c,d]A,φ into finitely many very small intervals. Then any probability distribution that assigns probabilities sufficiently close to those assigned by P to the indicators for φ being in those small intervals, will have an expectation for φ that is also inside (a,b). This works out to an open set around P, so that the preimage of the φ(x) sub-basis open set is a union of open sets. ⊣
A base theory that accommodates reflection variables
So, we have a dictionary between distributions and expectations. This will let us build expectations by completing theories and taking expectations according to the resulting 0-1 valued distribution.
Some preparatory work remains, because in order to have the reflection principle E[E[┌φ┐]]=E[φ], we at least want E[┌φ┐] to be a variable whenever φ is. Thus we will need a theory T that bounds E[┌φ┐] whenever it bounds φ. However, in order to make extreme mixes of elements of Exp(T) possible to reflect into an expectation over T, we will need that all elements of PseudoExp(T) are valid interpretations of E for T.
Stratified definition of the base theory T
We start with a theory such as ZFC that is strong enough to talk about rational numbers and so on. We add to the language a symbol E that will represent an expectation. We also add the sentence stating that E is a partial function from N to R, and that E is linear at φ+ψ if it happens to be defined on φ,ψ, and φ+ψ. This gives the theory T0.
Now define inductively the theories Tn+1⊃Tn: Tn+1:= Tn+∀┌φ┐,k∈N:∀a,b∈Q:[k witnesses Tn⊢(∃!x∈R:φ(x))∧(∀x:φ(x)→x∈[a,b])]→(∃!x∈R:E[┌φ┐]=x)∧(∀x:E[┌φ┐]=x→x∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)])
In English, this says that Tn+1 is Tn along with the statement that whenever Tn proves that some φ is well-defined and bounded in some interval [a,b], then it is the case that E is defined on φ and E[┌φ┐] is inside the much looser bound [a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)]. Intuitively we are adding into Var(Tn+1) the variable E[┌φ┐] whenever φ∈Var(Tn), but we are not restricting its value very much at all. The form of the loose bound on E[┌φ┐] is an artifact of the metric we will later put on Exp(T).
Finally, we define the base theory we will use in the main argument as the limit of the Tn, that is: T:=⋃n∈NTn. Note that T is at least (exactly?) as strong as (T0)ω, the theory T0 with ω-iterated consistency statements, since the loose bounds are the same as the true bounds when the true bound is [a,a]. Also note that it is important that T0 is arithmetically sound, or else T may believe in nonstandard proofs and hence put inconsistent bounds on E. I think this restriction could be avoided by making the statement in Tn+1−Tn into a schema over specific standard naturals that might be proofs.
Soundness of T over PseudoExp(T)
We will be applying Kakutani’s theorem to the space Exp(T), and making forays into PseudoExp(T). So we want T to at least be consistent, so that Exp(T) is nonempty, and furthermore we want T to allow for E to be interpreted by anything in PseudoExp(T).
Recall that a (pseudo)expectation over a theory A is a function E:Var(A)→R that is linear, and such that given φ with A-bound [a,b]A,φ, we have that E[φ]∈[a,b] (or E[φ]∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)]). As noted before, for any theories A⊂B, we have that Exp(B)⊂Exp(A)⊂PseudoExp(A) and Exp(B)⊂PseudoExp(B)⊂PseudoExp(A), where we are restricting elements of Exp(B) and PseudoExp(B) to Var(A).
Lemma
For any consistent theory A, Exp(A) is nonempty.
This follows from the isomorphism ι−1; we take a completion of A, which is a coherent probability distribution P over A, and then take expectations according to P. That is, ι−1(P)∈Exp(A). ⊣
We assume that we have some standard model for the theory over which T was constructed. For concreteness we take that theory to be ZFC, and we take the standard model to be the cumulative hierarchy V.
Theorem
Exp(T) is nonempty, and for all J∈PseudoExp(T), we have that (V,J)⊨T.
(To follow the proof, keep in mind the distinction between E being a (pseudo)expectation over a theory, versus E providing a model for a theory.)
Proof. The claim is true for T0 in place of T, since T0 is consistent and places no restrictions other than linearity on E.
Say the claim holds for Tn, so PseudoExp(Tn) is non-empty. For any J∈PseudoExp(Tn), by hypothesis (V,J)⊨Tn. Also, by definition of PseudoExp(Tn), J satisfies that whenever Tn bounds φ in [a,b], also J[φ]∈[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)]. Hence (V,J)⊨Tn+1. Thus Tn+1 is consistent. Since PseudoExp(Tn+1)⊂PseudoExp(Tn), this also shows that for all J∈PseudoExp(Tn+1), we have (V,J)⊨Tn+1.
By induction the claim holds for all n, and hence T is consistent and Exp(T) is nonempty. Since PseudoExp(T)⊂PseudoExp(Tn) for all n, for any J∈PseudoExp(T) we have (V,J)⊨Tn, and hence (V,J)⊨T. ⊣
Main theorem: reflection and assigning probability 1 to reflection
We have a theory T that is consistent, so that Exp(T) is nonempty, and sound over all pseudo-expectations. We want an expectation that is reflective, and also believes that it is reflective. First we formalize this notion and show that there are reflective expectations.
Existence of reflective expectations
Define the sentence refl:=∀n∈N:(E[n] defined)→(E[┌E[n]┐] defined, and E[┌E[n]┐]=E[n]) .
This says that whenever E is defined on some variable, it expects E to take some value on that variable, and it expects the correct value. In short, its expectations about its expectations are correct. Define Refl(T)⊂Exp(T) to be the reflective expectations over T, i.e. those that satisfy refl.
Some observations: the spaces Refl(T)⊂Exp(T)⊂PseudoExp(T)⊂[a−(b−a)(2┌φ┐),b+(b−a)(2┌φ┐)]Var(T)T,φ are all compact, as they are closed subsets of the product of the loose bounds on Var(T), that product being a compact space. Both Exp(T) and PseudoExp(T) are convex, as linearity and being in bounds are preserved by convex combinations. (For the same reason, Refl(T) is convex, and is in fact an affine subspace of Exp(T).)
Lemma
Refl(T) is nonempty.
Proof. We apply Kakutani’s theorem to Exp(T) where G corresponds to F when ∀φ∈Var(T):G[E[┌φ┐]]=F[φ]. The set of G corresponding to F is compact and convex, and the graph is closed. For any F there is a corresponding G: we take an expectation over the theory
TF:=T+{E[┌φ┐]∈(a,b)∣a,b∈Q,F[φ]∈(a,b)}
stating that E behaves according to F. This theory TF is consistent because F provides a model. Any completion T′F has T′F(E[┌φ┐])=F[φ], so the resulting expectation corresponds to F. Kakutani’s theorem gives a fixed point of this correspondence, which is in Refl. ⊣
The correspondence ⊲E: exact reflection and assigning high probability to reflection for distributions close to reflective
We can’t simply take a correspondence ⊲E that also requires G to assign probability 1 to refl; in general there would not be any expectation corresponding to any F∈Exp(T)−Refl(T). Instead we will soften this requirement, and only require that G[refl] approach 1 as F approaches being reflective, in order for F⊲EG.
Definition
Define a metric on Exp(T) by
d(F,G):=∑φ∈Var(T)|F[φ]−G[φ]|2┌φ┐|[a,b]T,φ| .
(If |[a,b]T,φ|=0 then the φ coordinate plays no role in the metric by fiat.)=:
The factor of 1/2┌φ┐ ensures that the metric will converge, since the factor of 1/|[a,b]T,φ| corrects the projection of Exp(T) in each coordinate to be [0,1].
We abbreviate d⟨F⟩:=d(F,Refl)=minH∈Refld(F,H) to mean the distance from F to the nearest element of the set Refl. Since Refl is compact, this is well-defined and continuous on Exp(T).
Definition
For F,G∈Exp(T), we say that G reflects F and we write F⊲EG precisely when:
G expects E to behave just like F, i.e. ∀φ∈Var(T):G[E[┌φ┐]]=F[φ], and
G is somewhat confident that E is reflective, specifically G[refl]≥1−d⟨F⟩.
=:
Fixed points of the correspondence are reflective and believe they are reflective
Say G⊲EG. Then G∈Refl(T), by definition of ⊲E. In particular, d⟨G⟩=0, so that G[refl]=1, and G the desired distribution.
Compact and convex images; closed graph
For a fixed F, the conditions for F⊲EG are just closed subintervals in some coordinates, so {G∣F⊲EG} is compact and convex.
Consider a sequence F0⊲EG0,F1⊲EG1,…, converging to F and G. For φ∈Var(T), since Gn[E[┌φ┐]]=Fn[φ]→F[φ], we have Gn[E[┌φ┐]]→G[E[┌φ┐]]=F[φ]. Also, since d⟨Fn⟩→d⟨F⟩, we have that the Gn[refl]≥1−d⟨Fn⟩ converge to something at least 1−d⟨F⟩, so G[refl]≥1−d⟨F⟩. Thus ⊲E⊂Exp(T)×Exp(T) is closed.
Images of the correspondence are nonempty: interpolating reflective and pseudo-expectations
Finally, we need to show that for any F∈Exp(T), there is some G∈Exp(T0) such that F⊲EG. (The case distinction is just for explanatory purposes.)
Case 1. F∈Refl(T).
Recall the theory TF:=T+{E[┌φ┐]∈(a,b)∣a,b∈Q,F[φ]∈(a,b)} stating that E behaves according to F. By the theorem about T, (V,F)⊨T, so along with F∈Refl(T) we also have (V,F)⊨TF+refl. Thus that theory is consistent, so we can take some G∈Exp(TF+refl). This G expects E to behave like F, and G[refl]=1≥1−d⟨F⟩=1.
Case 2. F∉Refl(T).
Pick some H∈Refl(T) with d(F,H)=d⟨F⟩>0. As in the previous case, find some GH∈Exp(TH+refl), so GH expects E to behave like H, and GH[refl]=1. We will define G with F⊲EG by taking a convex combination of GH with another GJ∈Exp(T):
G:=(1−d⟨F⟩)GH+d⟨F⟩GJ .
By convexity, G∈Exp(T), and since GJ[refl]∈[0,1], we will have G[refl]≥(1−d⟨F⟩) as desired.
However, we also need G[E[┌φ┐]]=F[φ]. That is, we need ((1−d⟨F⟩)GH+d⟨F⟩GJ)[E[┌φ┐]]=F[φ]GJ[E[┌φ┐]]=F[φ]−(1−d⟨F⟩)GH[E[┌φ┐]]d⟨F⟩J[φ]:=1d⟨F⟩F[φ]+(1−1d⟨F⟩)H[φ] ,
where GJ believes that E behaves like J. We take the last line to be the definition of J.
In general, this function J is not in Exp(T). It may be that d(F,H) is very small, but for some large φ, F[φ] is large and H[φ] is small, so that J[φ] is very large and actually outside of [a,b]T,φ, and hence not an expectation. However, J is, in fact, a pseudo-expectation over T:
J[φ]=H[φ]+1d⟨F⟩(F[φ]−H[φ]) J[φ]∈[a−K,b+K] , where H[φ]∈[a,b]T,φ, and K:=1d⟨F⟩(|F[φ]−H[φ]|). That is, the claim is that K≤(b−a)(2┌φ┐). Indeed:
K=1d⟨F⟩(|F[φ]−H[φ]|)=|F[φ]−H[φ]|d(F,H)=|F[φ]−H[φ]|∑ψ∈Var(T)|F[ψ]−H[ψ]|2┌ψ┐|[a,b]T,ψ|≤|F[φ]−H[φ]||F[φ]−H[φ]|2┌φ┐|[a,b]T,φ|=2┌φ┐|[a,b]T,φ|=(b−a)(2┌φ┐) .
Therefore J∈PseudoExp(T). By the theorem on T, (V,J)⊨T, so that TJ is consistent and we obtain GJ∈Exp(T) that expects E to behave like J. Then G=(1−d⟨F⟩)GH+d⟨F⟩GJ is in Exp(T), expects E to behave like F, and has G[refl]≥(1−d⟨F⟩). That is, F⊲EG.
The conditions of Kakutani’s theorem are satisfied, so there is a fixed point E⊲EE, and therefore we have an expectation that believes E behaves like itself, and that assigns probability 1 to E having this property. ⊣
Extension to belief in any generic facts about Refl
The above argument goes through in exactly the same way for any statement θ that is satisfied by all reflective expectations; we just have GH also assign probability 1 to θ, and modify ⊲E by adding a condition for θ analogous to that for refl. For example, we can have our reflective E assign probability 1 to E∈Exp(T), which is analogous to an inner coherence principle.
Discussion
I think that if the base theory is strong enough to prove Exp(T)≅Δ(T), then this whole argument can be carried out with E defined in terms of P, a symbol for a probability distribution, and so we get a probability distribution over the original language with the desired beliefs about itself as a probability distribution.
I think it should be possible to have a distribution that is reflective in the sense of ⊲E be definable and reflective for its definition, using the methods from this post. But it doesn’t seem as straightforward here. One strategy might be to turn the sentence in the definition of Tn+1, stating that E is in the loose Tn-bounds on variables, into a schema, and diagonalizing at once against all the Tn refuting finite behaviors. But, the proof of soundness of T over pseudo-expectations, and diagonalizing also against refuting finite behaviors in conjunction with refl, seems to require a little more work (and may be false).
It would be nice to have a good theory of logical probability. The existence proof of an expectation-reflective distribution given here shows that expectation-reflection is a desideratum that might be achievable in a broader context (i.e. in conjunction with other desiderata).
I don’t know what class of variables a ⊲E-reflective E is reflective for. Universes that use E in a way that only looks at E’s opinions on variables in Var(Tn) for some n, and are defined and uniformly bounded whenever E is in PseudoExp(Tn), will be reflected accurately. If the universe looks at all of E, and for instance does something crazy if E is not in Exp(T), then T may not be able to prove