I have discovered another minor point. You have written at the beginning of Direction 17 that any computable utility function U:(O×A)ω→[0,1] is automatically continuous. This seems to be not always true.
I fix some definitions to make sure that we talk about the same stuff. For reasons of simplicity, I assume that Oand A are finite. Let (O×A)ω be the space of all infinite sequences with values in O×A. The i-th projection pi:(O×A)ω→O×A is given by
pi(o1a1o2a2…)=oiai
The product topology is defined as the coarsest topology such that all projection maps are continuous. A base of this topology consists of all sets S such that there are finitely many indices i1,…,in∈N and subsets S1,…,Sn⊆O×A with
S={(onan)n∈N|oijaij⊆Sj∀j=1,…,n}
In particular, any open set contains such an S as a subset, which means that its image under pi is O×A for all but finitely many i. For my counterexample, let O=A={0,1,2}. Let s=(sn)n∈N be a sequence with values in {0,1,2}. If sn is never 2, we define
U(s)=∞∑k=1sk2−k−1
Otherwise, we define U(s)=34. U is computable in the sense that for any ϵ>0 we find a finite program Pwhose input is a sequence such that |P(s)−U(s)|<ϵ and P uses only finitely many values of s. The preimage of the open set (0,12) is
{(sn)n∈N∈{0,1,2}ω|sn∈{0,1}∀n∈N,sn≠0000…,sn≠1111…}
which is not open since its ith projection is always {0,1}2≠O×A. Therefore, U is not contiuous. However, we have the following lemma:
Let r:(O×A)∗→[0,1] be a reward function and let U:(O×A)ω→[0,1] be given by
U(o0a0o1a1…)=(1−γ)∞∑t=0γtr(o0a0o1a1…otat)
where 0<γ<1 is the time discount rate. Then U is continuous with respect to the product topology.
Proof: Since the open intervals are a base of the standard topology of R, it suffices to prove that the preimage of any interval (a,b), (a,1], [0,b) or [0,1] with 0<a<b<1 is open in (O×A)ω. For reasons of simplicity, we consider only (a,b). The other cases are analogous. Let o0a0o1a1…∈(O×A)ω such that u:=U(o0a0o1a1…)∈(a,b). Moreover, let ϵ>0 such that (u−ϵ,u+ϵ)⊆(a,b). Finally, we choose an n∈N such that γn+1<ϵ. We define the set
M={o′0a′0o′1a′1…∈(O×A)ω|o′0=o0,a′0=a0,…,a′n=an}
M is an open subset of (O×A)ω. Since the reward is non-negative, we have
0≤U(o0a0o1a1…)−r(o0a0o1a1…onan)≤γn+1<ϵ
and for any o′0a′0o′1a′1…∈M, we have
0≤U(o′0a′0o′1a′1…)−r(o0a0o1a1…onan)≤γn+1<ϵ,
too. Therefore,
∣∣U(o′0a′0o′1a′1…)−U(o0a0o1a1…)∣∣<ϵ
and furthermore M⊆U−1(u−ϵ,u+ϵ)⊆U−1(a,b). All in all, any o0a0o1a1…∈U−1(a,b) has an open neighborhood that is a subset of U−1(a,b). Therefore, U−1(a,b) is open.
That a utility function U:(O×A)ω→[0,1] is continuous roughly means that for any ϵ>0 there are only finitely many events that have an influence of more than ϵ on the utility. This could be a problem for studying longtermist agents with zero time discount rate. However, studying such agents is hard anyway since there is no guarantee that the sum of rewards converges and we have to deal with infinity ethics. As far as I know, it is standard in learning theory to avoid such situations by assuming a non-zero time discount rate or a finite time horizon. Therefore, it should not be a big deal to add the condition that U is continuous to all theorems.
Your alleged counterexample is wrong, because the U you constructed is not computable. First, “computable” means that there is a program P which receives s and ϵ∈Q as inputs s.t. for all s and ϵ, it halts and
|P(s,ϵ)−U(s)|<ϵ
Second, even your weaker definition fails here. Let ϵ=12. Then, there is no program that computes U within accuracy ϵ, because for every n, U(0n20ω)=34 while U(0ω)=0. Therefore, determining the value of U(s) within ϵ requires looking at infinitely many elements of the sequence. Any program would that outputs 0 on 0ω has to halt after reading some finite m symbols, in which case it would output 0 on 0m20ω as well.
You are right and now it is clear, why your original statement is correct, too. Let U be an arbitrary computable utility function. As above, let u=U(o0a0o1a1…) and (u−ϵ,u+ϵ)⊆(a,b) with ϵ>0 and ϵ∈Q. Choose P as in your definition of “computable”. Since P(s,ϵ) terminates, its output depends only on finitely many oi1,…,oik,aj1,…,ajl. Now
{s′=o′0a′0o′1a′1…|o′i1=oi1,…,a′jl=ajl}
is open and a subset of U−1(u−ϵ,u+ϵ), since |P(s′,ϵ)−u|<ϵ.
I have discovered another minor point. You have written at the beginning of Direction 17 that any computable utility function U:(O×A)ω→[0,1] is automatically continuous. This seems to be not always true.
I fix some definitions to make sure that we talk about the same stuff. For reasons of simplicity, I assume that Oand A are finite. Let (O×A)ω be the space of all infinite sequences with values in O×A. The i-th projection pi:(O×A)ω→O×A is given by
pi(o1a1o2a2…)=oiaiThe product topology is defined as the coarsest topology such that all projection maps are continuous. A base of this topology consists of all sets S such that there are finitely many indices i1,…,in∈N and subsets S1,…,Sn⊆O×A with
S={(onan)n∈N|oijaij⊆Sj∀j=1,…,n}In particular, any open set contains such an S as a subset, which means that its image under pi is O×A for all but finitely many i. For my counterexample, let O=A={0,1,2}. Let s=(sn)n∈N be a sequence with values in {0,1,2}. If sn is never 2, we define
U(s)=∞∑k=1sk2−k−1Otherwise, we define U(s)=34. U is computable in the sense that for any ϵ>0 we find a finite program Pwhose input is a sequence such that |P(s)−U(s)|<ϵ and P uses only finitely many values of s. The preimage of the open set (0,12) is
{(sn)n∈N∈{0,1,2}ω|sn∈{0,1}∀n∈N,sn≠0000…,sn≠1111…}which is not open since its ith projection is always {0,1}2≠O×A. Therefore, U is not contiuous. However, we have the following lemma:
Let r:(O×A)∗→[0,1] be a reward function and let U:(O×A)ω→[0,1] be given by
U(o0a0o1a1…)=(1−γ)∞∑t=0γtr(o0a0o1a1…otat)where 0<γ<1 is the time discount rate. Then U is continuous with respect to the product topology.
Proof: Since the open intervals are a base of the standard topology of R, it suffices to prove that the preimage of any interval (a,b), (a,1], [0,b) or [0,1] with 0<a<b<1 is open in (O×A)ω. For reasons of simplicity, we consider only (a,b). The other cases are analogous. Let o0a0o1a1…∈(O×A)ω such that u:=U(o0a0o1a1…)∈(a,b). Moreover, let ϵ>0 such that (u−ϵ,u+ϵ)⊆(a,b). Finally, we choose an n∈N such that γn+1<ϵ. We define the set
M={o′0a′0o′1a′1…∈(O×A)ω|o′0=o0,a′0=a0,…,a′n=an}M is an open subset of (O×A)ω. Since the reward is non-negative, we have
0≤U(o0a0o1a1…)−r(o0a0o1a1…onan)≤γn+1<ϵand for any o′0a′0o′1a′1…∈M, we have
0≤U(o′0a′0o′1a′1…)−r(o0a0o1a1…onan)≤γn+1<ϵ,too. Therefore,
∣∣U(o′0a′0o′1a′1…)−U(o0a0o1a1…)∣∣<ϵand furthermore M⊆U−1(u−ϵ,u+ϵ)⊆U−1(a,b). All in all, any o0a0o1a1…∈U−1(a,b) has an open neighborhood that is a subset of U−1(a,b). Therefore, U−1(a,b) is open.
That a utility function U:(O×A)ω→[0,1] is continuous roughly means that for any ϵ>0 there are only finitely many events that have an influence of more than ϵ on the utility. This could be a problem for studying longtermist agents with zero time discount rate. However, studying such agents is hard anyway since there is no guarantee that the sum of rewards converges and we have to deal with infinity ethics. As far as I know, it is standard in learning theory to avoid such situations by assuming a non-zero time discount rate or a finite time horizon. Therefore, it should not be a big deal to add the condition that U is continuous to all theorems.
Your alleged counterexample is wrong, because the U you constructed is not computable. First, “computable” means that there is a program P which receives s and ϵ∈Q as inputs s.t. for all s and ϵ, it halts and
|P(s,ϵ)−U(s)|<ϵSecond, even your weaker definition fails here. Let ϵ=12. Then, there is no program that computes U within accuracy ϵ, because for every n, U(0n20ω)=34 while U(0ω)=0. Therefore, determining the value of U(s) within ϵ requires looking at infinitely many elements of the sequence. Any program would that outputs 0 on 0ω has to halt after reading some finite m symbols, in which case it would output 0 on 0m20ω as well.
You are right and now it is clear, why your original statement is correct, too. Let U be an arbitrary computable utility function. As above, let u=U(o0a0o1a1…) and (u−ϵ,u+ϵ)⊆(a,b) with ϵ>0 and ϵ∈Q. Choose P as in your definition of “computable”. Since P(s,ϵ) terminates, its output depends only on finitely many oi1,…,oik,aj1,…,ajl. Now
{s′=o′0a′0o′1a′1…|o′i1=oi1,…,a′jl=ajl}is open and a subset of U−1(u−ϵ,u+ϵ), since |P(s′,ϵ)−u|<ϵ.