We construct a family or oracles relative to which all estimation problems which have approximate solutions in exponential time are generatable. Relatively to these oracles, we can use the optimal predictor scheme Λ for assigning expectation values to arbitrary deterministic computations within time that is superquasipolynomial in the logarithm of the computation time. In particular, this should allow reflection in the sense of Garrabrant once we succeed to derandomize Λ relative to the same oracle.
Results
New notation
Given O⊆{0,1}∗ and appropriate sets X and Y, A:XO→Y denotes a pair consisting of O and an oracle machine that, supplied with oracle O, halts on every input in X and produces an output in Y. In particular, it defines a function from X to Y in the obvious way.
For every n∈N, we fix a prefix free universal oracle machine UOn with one program tape, n input tapes and one output tape (we are only going to use a finite number of ns). Given O⊆{0,1}∗, evO,n:N×{0,1}∗n+1O→{0,1}∗ is the following. When evkO,n(Q,x1…xn) is computed, Q is interpreted as a program for UOn and Q(x1…xn) is executed with oracle O for time k. The resulting output is produced.
The notation evkO(Q,x1…xn) means evkO,n(Q,x1…xn).
Given A:XO→Y and X′⊆X, TA(X′) will denote the maximal runtime of A on elements of X′. QA(X′) will denote the set of queries made to the oracle when applying A to elements of X′.
All the theory of predictors with logarithmic advice can be relativized with respect to an arbitrary oracle. We denote the corresponding concepts using the oracle as a superscript. For example, the relativized version of a Δ(poly,log)-optimal predictor schemes is Δ(poly,log)O-optimal predictor schemes etc.
Given μ a word ensemble, suppμ denotes ⋃k∈Nsuppμk. We previously defined a distributional estimation problem to be a pair (f,μ) where μ is a word ensemble and f is a function from {0,1}∗ to [0,1]. From now on, we allow f to be defined only on suppμ. This doesn’t affect any of the previous results.
Omitting parentheses after an error space will mean zero advice. For example a Δ-sampler is a Δ(0)-sampler.
We start by constructing an exponential time estimation problem which is complete in the class of exponential time estimation problem with samplable word ensembles with respect to Δ-pseudo-invertible reductions (the concept of Δ-pseudo-invertible reductions was previously introduced for unidistributional problems but it is defined in the same day for distributional problems, see Appendix A). Everything is relative to an arbitrary fixed oracle.
Construction 1
Fix O⊆{0,1}∗.
Define μSamp(O) to be the following word ensemble. Sampling μkSamp(O) is equivalent to sampling S, F and y independently from Uk and producing (k,S,F,evkO(S,k,y)).
Define fEXP(O):suppμSamp(O)→[0,1] by
fEXP(O)(k,S,F,x):=β(ev2kO(F,x))
Theorem 1
Fix error spaces Δ1, Δ2 of ranks 1 and 2 respectively. Assume that for any δ∈Δ1 the function δ′(k,j):=δ(k) is in Δ2. Consider (f,μ) a distributional estimation problem. Suppose q:N→N is a polynomial, F:{0,1}∗O→[0,1] and ^S=(S,rS) is a (Δ1)O-sampler for μ s.t.
(i) Eμk[(F(x)−f(x))2]∈Δ1
(ii) ∀k∈N:TF(suppμk)≤2q(k)
(iii) ∀k∈N:|F|≤q(k)
(iv) ∀k∈N:TS({0,1}rS(k))≤q(k)
(v) ∀k∈N:|S|≤q(k)
It is possible to choose a polynomial p:N→N and ~S a (poly,0)O-scheme of signature 1→{0,1}∗ s.t.
(i) ∀k∈N:p(k)≥q(k)
(i) ∀k∈N,y∈{0,1}p(k):~Sp(k)(y)=Sk(y<rS(k))
(ii) ∀k∈N:T~S({0,1}p(k))≤p(k)
Denote ^ζ~S,F,p=(ζ~S,F,p,r~S,F,p) to the {0,1}∗-valued (poly,0)-scheme defined by
Then, ^ζ~S,F,p is a Δ2-pseudo-invertible reduction of (f,μ) to (fEXP(O),μSamp(O)).
The proofs of this and subsequent results are in Appendix B.
In particular, if (fEXP(O),μSamp(O)) has a uniform Δ2(poly,log)O-optimal predictor scheme then we can construct a Δ2(poly,log)O-optimal predictor scheme for any Δ1-samplable distributional estimation problem which has an exponential time approximation with error in Δ1. We don’t expect this to be true for reasonable error spaces and O=∅, but as the following construction (drawing heavily from Heller) shows it is true for some oracles.
Construction 2
Consider O⊆{0,1}∗, p:N→N a polynomial and σ:N×{0,1}∗O→{0,1}∗. We give a recursive definition of Oσp,k⊆{0,1}∗ for k∈N.
Define Δ1pow to be the set of functions δ:N→R≥0 s.t.
∃ϵ>0:limk→∞kϵδ(k)=0
It is easy to see Δ1pow is an error space.
Theorem 2
Consider O⊆{0,1}∗ and σ:N×{0,1}∗O→{0,1}∗ s.t.σk is a bijection on {0,1}3k+p(k). Then, for any sufficiently large polynomial p:N→N, (fEXP(Oσp),μSamp(Oσp)) is (Δ1pow)Oσp-generatable.
Note 1
For σ the identity function, (fEXP(Oσp),μSamp(Oσp)) admits a Monte Carlo algorithm. This is too strong for our purposes since it means there is no logical uncertainty in exponential time problems. Such oracles can be regarded as bounded analogues of reflective oracles.
Note 2
It is easy to see that Oσp∈EXPO.
Theorem 2 is insufficient by itself to enable reflection with optimal predictor schemes relative to Oσp since Λ is a random algorithm. Potentially this can be solved by derandomization but at present we don’t know whether under what conditions derandomization with respect to Oσp is possible.
Appendix A
Fix Δ an error space of rank 2.
Definition A
Consider (f,μ), (g,ν) distributional estimation problems, ^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-bischeme.^ζ is called a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) when there is a polynomial p:N→N s.t. the following conditions hold:
(i) Eμk×Urζ(k,j)[(g(^ζkj(x))−f(x))2]∈Δ
(ii) Prμk×Urζ(k,j)[νp(k)(^ζkj(x))=0]∈Δ
(iii) There is M>0 and ^R=(R,rR,aR) a Q∩[0,M]-valued (poly,log)-bischeme s.t.
^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-scheme is called a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) when it becomes such when adding trivial dependence on j.
Theorem A
Suppose there is a polynomial h:N2→N s.t. h−1∈Δ. Consider (f,μ), (g,ν) distributional estimation problems, ^ζ a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^Pg a Δ(poly,log)-optimal predictor scheme for (g,ν). Define ^Pf by ^Pkjf(x):=^Pkjg(^ζkj(x)). Then, ^Pf is a Δ(poly,log)-optimal predictor scheme for (f,μ).
(ii) If 0x∉O then 0x∉Oσp since all words in Oσ,+p,k begin with 1. If 0x∈O then 0x∈Oσp by (i).
(iii) If 1x∈Oσ,+p,k∖⋃i≤kOσ,−p,i then 1x∈Oσp,k+1 and by (i) 1x∈Oσp. If 1x∈Oσp then there is n∈N s.t.1x∈Oσ,+p,n∖⋃i≤nOσ,−p,i and n=k because any word in Oσ,+p,n has length 4n+p(n)+1.
(iv) All oracle queries made in evkOσp,k(S,y) return the same values when addressed to Oσp,i for i>k: queries that return 0 keep returning 0 since they are included in Oσ,−p,k and queries that return 1 keep returning 1 by (i).
(v) Note that (iv) implies suppμkSamp(Oσp)=suppμkSamp(Oσp,k) and apply the same argument as in (iv) to ev2kOσp,i(F,x).
Proof of Theorem 2
We describe ^G the (Δ1pow)Oσp-generator for (fEXP(Oσp),μSamp(Oσp)). ^G=(G,3k+p(k)) is the (poly,0)Oσp-scheme of signature 1→{0,1}∗×[0,1] defined as follows. Consider the computation of Gk(w).
zSFy:=σk(w) is computed, where |S|=|F|=|y|=k. x:=evkOσp(S,k,y) is computed. For each i∈{0,1…k}, we compute αi defined as ik rounded to k binary digits.α:=1k+1|{i∈{0,1…k}∣1wαi∈Oσp}| is computed within k binary digits. The output is ((k,S,F,x),α).
^G1 reproduces μSamp(Oσp) precisely since σk preserves U3k+p(k).
It is easy to see that there is a polynomial q s.t. for any p we have |⋃i≤kOσ,−p,i|≤2q(k). Take any polynomial p s.t. limk→∞(3k+p(k)−q(k))=+∞. By Lemma C, (iii) we have
The first term approximates ev2kOσp,k(F,x) within error 1k+1∈Δ1pow by the construction of Oσ,+p,k. By Lemma C, (v) it approximates fEXP(Oσp)(k,S,F,x)=ev2kOσp(F,x) within the same error. The second terms vanishes for all but a negligible fraction of the w-s by the properties of p and q.
Towards reflection with relative optimal predictor schemes
We construct a family or oracles relative to which all estimation problems which have approximate solutions in exponential time are generatable. Relatively to these oracles, we can use the optimal predictor scheme Λ for assigning expectation values to arbitrary deterministic computations within time that is superquasipolynomial in the logarithm of the computation time. In particular, this should allow reflection in the sense of Garrabrant once we succeed to derandomize Λ relative to the same oracle.
Results
New notation
Given O⊆{0,1}∗ and appropriate sets X and Y, A:XO→Y denotes a pair consisting of O and an oracle machine that, supplied with oracle O, halts on every input in X and produces an output in Y. In particular, it defines a function from X to Y in the obvious way.
For every n∈N, we fix a prefix free universal oracle machine UOn with one program tape, n input tapes and one output tape (we are only going to use a finite number of ns). Given O⊆{0,1}∗, evO,n:N×{0,1}∗n+1O→{0,1}∗ is the following. When evkO,n(Q,x1…xn) is computed, Q is interpreted as a program for UOn and Q(x1…xn) is executed with oracle O for time k. The resulting output is produced.
The notation evkO(Q,x1…xn) means evkO,n(Q,x1…xn).
Given A:XO→Y and X′⊆X, TA(X′) will denote the maximal runtime of A on elements of X′. QA(X′) will denote the set of queries made to the oracle when applying A to elements of X′.
All the theory of predictors with logarithmic advice can be relativized with respect to an arbitrary oracle. We denote the corresponding concepts using the oracle as a superscript. For example, the relativized version of a Δ(poly,log)-optimal predictor schemes is Δ(poly,log)O-optimal predictor schemes etc.
Given μ a word ensemble, suppμ denotes ⋃k∈Nsuppμk. We previously defined a distributional estimation problem to be a pair (f,μ) where μ is a word ensemble and f is a function from {0,1}∗ to [0,1]. From now on, we allow f to be defined only on suppμ. This doesn’t affect any of the previous results.
Omitting parentheses after an error space will mean zero advice. For example a Δ-sampler is a Δ(0)-sampler.
We start by constructing an exponential time estimation problem which is complete in the class of exponential time estimation problem with samplable word ensembles with respect to Δ-pseudo-invertible reductions (the concept of Δ-pseudo-invertible reductions was previously introduced for unidistributional problems but it is defined in the same day for distributional problems, see Appendix A). Everything is relative to an arbitrary fixed oracle.
Construction 1
Fix O⊆{0,1}∗.
Define μSamp(O) to be the following word ensemble. Sampling μkSamp(O) is equivalent to sampling S, F and y independently from Uk and producing (k,S,F,evkO(S,k,y)).
Define fEXP(O):suppμSamp(O)→[0,1] by
fEXP(O)(k,S,F,x):=β(ev2kO(F,x))
Theorem 1
Fix error spaces Δ1, Δ2 of ranks 1 and 2 respectively. Assume that for any δ∈Δ1 the function δ′(k,j):=δ(k) is in Δ2. Consider (f,μ) a distributional estimation problem. Suppose q:N→N is a polynomial, F:{0,1}∗O→[0,1] and ^S=(S,rS) is a (Δ1)O-sampler for μ s.t.
(i) Eμk[(F(x)−f(x))2]∈Δ1
(ii) ∀k∈N:TF(suppμk)≤2q(k)
(iii) ∀k∈N:|F|≤q(k)
(iv) ∀k∈N:TS({0,1}rS(k))≤q(k)
(v) ∀k∈N:|S|≤q(k)
It is possible to choose a polynomial p:N→N and ~S a (poly,0)O-scheme of signature 1→{0,1}∗ s.t.
(i) ∀k∈N:p(k)≥q(k)
(i) ∀k∈N,y∈{0,1}p(k):~Sp(k)(y)=Sk(y<rS(k))
(ii) ∀k∈N:T~S({0,1}p(k))≤p(k)
Denote ^ζ~S,F,p=(ζ~S,F,p,r~S,F,p) to the {0,1}∗-valued (poly,0)-scheme defined by
r~S,F,p(k)=2p(k)−|~S|−|F|
∀k∈N,x∈{0,1}∗,y1∈{0,1}p(k)−|~S|,y2∈{0,1}p(k)−|F|:ζk~S,F,p(x,y1y2)=(p(k),~Sy1,Fy2,x)
Then, ^ζ~S,F,p is a Δ2-pseudo-invertible reduction of (f,μ) to (fEXP(O),μSamp(O)).
The proofs of this and subsequent results are in Appendix B.
In particular, if (fEXP(O),μSamp(O)) has a uniform Δ2(poly,log)O-optimal predictor scheme then we can construct a Δ2(poly,log)O-optimal predictor scheme for any Δ1-samplable distributional estimation problem which has an exponential time approximation with error in Δ1. We don’t expect this to be true for reasonable error spaces and O=∅, but as the following construction (drawing heavily from Heller) shows it is true for some oracles.
Construction 2
Consider O⊆{0,1}∗, p:N→N a polynomial and σ:N×{0,1}∗O→{0,1}∗. We give a recursive definition of Oσp,k⊆{0,1}∗ for k∈N.
Recursion base:
Oσp,0:={0x∣x∈O}
Denote
Oσ,+p,k:={1wα∣|α|=k∧∃S,F,y∈{0,1}k,z∈{0,1}p(k):σk(w)=zSFy∧fkEXP(Oσp,k)(S,F,evkOσp,k(S,k,y))≥β(α)}
Oσ,−p,k:=({0,1}∗∖Oσp,k)∩(QfEXP(Oσp,k)(suppμkSamp(Oσp,k))∪QevOσp,k({0,1}k×{k}×{0,1}k))
Recursion step:
Oσp,k+1:=Oσp,k∪Oσ,+p,k∖⋃i≤kOσ,−p,i
Finally, we define
Oσp:=⋃k∈NOσp,k
Construction 3
Define Δ1pow to be the set of functions δ:N→R≥0 s.t.
∃ϵ>0:limk→∞kϵδ(k)=0
It is easy to see Δ1pow is an error space.
Theorem 2
Consider O⊆{0,1}∗ and σ:N×{0,1}∗O→{0,1}∗ s.t.σk is a bijection on {0,1}3k+p(k). Then, for any sufficiently large polynomial p:N→N, (fEXP(Oσp),μSamp(Oσp)) is (Δ1pow)Oσp-generatable.
Note 1
For σ the identity function, (fEXP(Oσp),μSamp(Oσp)) admits a Monte Carlo algorithm. This is too strong for our purposes since it means there is no logical uncertainty in exponential time problems. Such oracles can be regarded as bounded analogues of reflective oracles.
Note 2
It is easy to see that Oσp∈EXPO.
Theorem 2 is insufficient by itself to enable reflection with optimal predictor schemes relative to Oσp since Λ is a random algorithm. Potentially this can be solved by derandomization but at present we don’t know whether under what conditions derandomization with respect to Oσp is possible.
Appendix A
Fix Δ an error space of rank 2.
Definition A
Consider (f,μ), (g,ν) distributional estimation problems, ^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-bischeme.^ζ is called a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) when there is a polynomial p:N→N s.t. the following conditions hold:
(i) Eμk×Urζ(k,j)[(g(^ζkj(x))−f(x))2]∈Δ
(ii) Prμk×Urζ(k,j)[νp(k)(^ζkj(x))=0]∈Δ
(iii) There is M>0 and ^R=(R,rR,aR) a Q∩[0,M]-valued (poly,log)-bischeme s.t.
Eνp(k)×UrR(k,j)[(^Rkj(y)−Prμk×Urζ(k,j)[^ζkj(x)=y]νp(k)(y))2]∈Δ
(iv) There is a {0,1}∗-valued (poly,log)-scheme ^ξ=(ξ,rξ,aξ) s.t.
Eμk×Urζ(k,j)[∑x′∈{0,1}∗|PrUrξ(k,j)[^ξkj(^ζkj(x,z),w)=x′]−Prμk×Urζ(k,j)[x′′=x′∣^ζkj(x′′,z′)=^ζkj(x,z)]|]∈Δ
Such ^ξ is called a Δ-pseudo-inverse of ^ζ.
^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-scheme is called a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) when it becomes such when adding trivial dependence on j.
Theorem A
Suppose there is a polynomial h:N2→N s.t. h−1∈Δ. Consider (f,μ), (g,ν) distributional estimation problems, ^ζ a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^Pg a Δ(poly,log)-optimal predictor scheme for (g,ν). Define ^Pf by ^Pkjf(x):=^Pkjg(^ζkj(x)). Then, ^Pf is a Δ(poly,log)-optimal predictor scheme for (f,μ).
Theorem A.1 is proved exactly the same way as Theorem 8 for unidistributional problems and we leave out the details.
Appendix B
Proof of Theorem 1
We need to show the 4 defining conditions of Δ2-pseudo-invertible reductions.
(i) Eμk×Ur~S,F,p(k)[(fEXP(O)(^ζk~S,F,p(x))−f(x))2]=Eμk×Ur~S,F,p(k)[(fp(k)EXP(O)(~Sy1,Fy2,x)−f(x))2]
Eμk×Ur~S,F,p(k)[(fEXP(O)(^ζk~S,F,p(x))−f(x))2]=Eμk[(β(ev2p(k)O(F,x))−f(x))2]
Eμk×Ur~S,F,p(k)[(fEXP(O)(^ζk~S,F,p(x))−f(x))2]=Eμk[(F(x)−f(x))2]
Using assumption (i), we get the required result.
(ii) Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(^ζk~S,F,p(x))=0]=Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(p(k),~Sy1,Fy2,x)=0]
Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(^ζk~S,F,p(x))=0]=Prμk[x∉Sk({0,1}rS(k))]
Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(^ζk~S,F,p(x))=0]=∑x∈{0,1}∗∖Sk({0,1}rS(k))μk(x)
Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(^ζk~S,F,p(x))=0]=∑x∈{0,1}∗∖Sk({0,1}rS(k))|μk(x)−Pr{0,1}rS(k)[Sk(y)=x]|
Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(^ζk~S,F,p(x))=0]≤∑x∈{0,1}∗|μk(x)−Pr{0,1}rS(k)[Sk(y)=x]|
Since S is a (Δ1)O-sampler for μ, we get
Prμk×Ur~S,F,p(k)[μp(k)Samp(O)(^ζk~S,F,p(x))=0]∈Δ1
(iii) Define ^R by
^Rkj(i,S′,F′,x)=⎧⎪⎨⎪⎩0 if i≠p(k)0 if ~S is not a prefix of S′2|~S|+|F| in other cases
Consider k∈N, x′∈{0,1}∗, y1∈{0,1}p(k)−|~S|, y2∈{0,1}p(k)−|F|.
Prμk×Urζ(k,j)[^ζk~S,F,p(x)=(p(k),~Sy1,Fy2,x′)]μp(k)Samp(O)(p(k),~Sy1,Fy2,x′)=2−p(k)+|~S|2−p(k)+|F|PrUrS(k)[Sk(y)=x]2−p(k)2−p(k)PrUrS(k)[Sk(y)=x]=2|~S|+|F|
(iv) Define ^ξ by
^ξkj(i,S′,F′,x)=x
Proposition B
Consider O⊆{0,1}∗, p:N→N a polynomial and σ:N×{0,1}∗O→{0,1}∗. Then
(i) ∀k∈N,i>k,x∈{0,1}∗:x∈Oσp,k⟹x∈Oσp,i∧x∈Oσp
(ii) ∀x∈{0,1}∗:0x∈Oσp⟺x∈O
(iii) ∀k∈N,∀x∈{0,1}4k+p(k):1x∈Oσp⟺1x∈Oσ,+p,k∖⋃i≤kOσ,−p,i
(iv) ∀k∈N,S∈{0,1}k,y∈{0,1}k:evkOσp(S,y)=evkOσp,k(S,y)
(v) ∀k∈N,F∈{0,1}k,x∈suppμkSamp(Oσp):ev2kOσp(F,x)=ev2kOσp,k(F,x)
Proof of Proposition B
(i) By induction: x∈Oσ,−p,k+1 implies x∉Oσp,k.
(ii) If 0x∉O then 0x∉Oσp since all words in Oσ,+p,k begin with 1. If 0x∈O then 0x∈Oσp by (i).
(iii) If 1x∈Oσ,+p,k∖⋃i≤kOσ,−p,i then 1x∈Oσp,k+1 and by (i) 1x∈Oσp. If 1x∈Oσp then there is n∈N s.t.1x∈Oσ,+p,n∖⋃i≤nOσ,−p,i and n=k because any word in Oσ,+p,n has length 4n+p(n)+1.
(iv) All oracle queries made in evkOσp,k(S,y) return the same values when addressed to Oσp,i for i>k: queries that return 0 keep returning 0 since they are included in Oσ,−p,k and queries that return 1 keep returning 1 by (i).
(v) Note that (iv) implies suppμkSamp(Oσp)=suppμkSamp(Oσp,k) and apply the same argument as in (iv) to ev2kOσp,i(F,x).
Proof of Theorem 2
We describe ^G the (Δ1pow)Oσp-generator for (fEXP(Oσp),μSamp(Oσp)). ^G=(G,3k+p(k)) is the (poly,0)Oσp-scheme of signature 1→{0,1}∗×[0,1] defined as follows. Consider the computation of Gk(w).
zSFy:=σk(w) is computed, where |S|=|F|=|y|=k. x:=evkOσp(S,k,y) is computed. For each i∈{0,1…k}, we compute αi defined as ik rounded to k binary digits.α:=1k+1|{i∈{0,1…k}∣1wαi∈Oσp}| is computed within k binary digits. The output is ((k,S,F,x),α).
^G1 reproduces μSamp(Oσp) precisely since σk preserves U3k+p(k).
It is easy to see that there is a polynomial q s.t. for any p we have |⋃i≤kOσ,−p,i|≤2q(k). Take any polynomial p s.t. limk→∞(3k+p(k)−q(k))=+∞. By Lemma C, (iii) we have
α=1k+1(|{i∈{0,1…k}∣1wαi∈Oσ,+p,k}|−|{i∈{0,1…k}∣1wαi∈Oσ,+p,k∩⋃n≤kOσ,−p,n}|)
The first term approximates ev2kOσp,k(F,x) within error 1k+1∈Δ1pow by the construction of Oσ,+p,k. By Lemma C, (v) it approximates fEXP(Oσp)(k,S,F,x)=ev2kOσp(F,x) within the same error. The second terms vanishes for all but a negligible fraction of the w-s by the properties of p and q.