There are two commonly used formalisms in average case complexity theory: problems with a single global probability measure on parameter space or problems with a family of such probability measures. Previouspostsaboutoptimal predictors focused on the family approach. In this post we give the formalism for global measures.
Results
New notation
1 will denote the one-element set.
Given μ a probability measure on {0,1}∗, X a set, Q:{0,1}∗2alg−→X, TμQ(s) stands for the maximal runtime of Q(x,y) for x∈suppμ, y∈{0,1}s.
Definition 1
A unidistributional estimation problem is a pair (f,μ) where μ is a probability measure on {0,1}∗ and f:suppμ→[0,1] is an arbitrary function.
Definition 2
Given appropriate sets X, Y, consider P:N×X×{0,1}∗2alg−→Y, r:N→N polynomial and a:N→{0,1}∗. The triple ^P=(P,r,a) is called a (poly,log)-scheme of signature X→Y when
(i) The runtime of P(j,x,y,z) is bounded by p(j) with p polynomial.
(ii) |a(j)|≤c1+c2log(j+1) for some c1,c2∈N.
A (poly,log)-scheme of signature {0,1}∗→[0,1] will also be called a (poly,log)-predictor.
Given j∈N, x∈X, ^Pj(x) will denote the Y-valued random variable P(j,x,y,a(j)) where y is sampled from the probability measure Ur(j). We will also use the notation ^Pj(x,y):=P(j,x,y,a(j)).
Fix Δ an error space of rank 1.
Definition 3
Consider (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a (poly,log)-predictor.^P is called a Δ(poly,log)-optimal predictor for (f,μ) when for any (poly,log)-predictor ^Q=(Q,s,b), there is δ∈Δ s.t.
Δ(poly,log)-optimal predictors for unidistributional problems have properties and existence theorems analogical to Δ(poly,log)-optimal predictor schemes, where the role of the rank 2 error space Δ2avg is taken by the rank 1 error space Δ1ll (see Definition 8). The theorems are listed below. The proofs of Theorem 4, Theorem 8 (which is stated stronger than previous analogues) and Theorem 10 are given in the appendix. Adapting the other proofs is straightforward.
Theorem 1
Suppose (j+1)−1∈Δ. Consider (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a Δ(poly,log)-optimal predictor for (f,μ). Suppose {pj∈[0,1]}j∈N, {qj∈[0,1]}j∈N are s.t.
∃ϵ>0∀j:(μ×Ur(j)){(x,y)∈{0,1}∗2∣pj≤^Pj(x,y)≤qj}≥ϵ
Define
ϕj:=Eμ×Ur(j)[f(x)−^Pj(x,y)∣pj≤^Pj(x,y)≤qj]
Assume that either pj,qj have a number of digits logarithmically bounded in j or Pj produces outputs with a number of digits logarithmically bounded in j (by Theorem A.7 if any Δ(poly,log)-optimal predictor exists for (f,μ) then a Δ(poly,log)-optimal predictor with this property exists as well). Then, |ϕ|∈Δ.
Theorem 2
Consider μ a probability measure on {0,1}∗ and f1,f2:suppμ→[0,1] s.t. f1+f2≤1. Suppose ^P1 is a Δ(poly,log)-optimal predictor for (f1,μ) and ^P2 is a Δ(poly,log)-optimal predictor for (f2,μ). Then η(^P1+^P2) is a Δ(poly,log)-optimal predictor for (f1+f2,μ).
Theorem 3
Consider μ a probability measure on {0,1}∗ and f1,f2:suppμ→[0,1] s.t. f1+f2≤1. Suppose ^P1 is a Δ(poly,log)-optimal predictor for (f1,μ) and ^P is a Δ(poly,log)-optimal predictor for (f1+f2,μ). Then, η(^P−^P1) is a Δ(poly,log)-optimal predictor for (f2,μ).
Definition 4
Fix Δ an error space of rank 1. A probability measure μ on {0,1}∗ is called Δ(log)-sampleable when there is a (poly,log)-scheme ^S of signature 1→{0,1}∗ such that
∑x∈{0,1}∗|μ(x)−Pr[^Sk=x]|∈Δ
^S is called a Δ(log)-sampler for μ.
Definition 5
Consider Δ an error space of rank 1. A unidistributional estimation problem (f,μ) is called Δ(log)-generatable when there is a (poly,log)-scheme of ^G of signature 1→{0,1}∗×[0,1] such that
(i) ^G1 is a Δ(log)-sampler for μ.
(ii) E[(^Gk2−f(^Gk1))2]∈Δ
^G is called a Δ(log)-generator for (f,μ).
Theorem 4
Consider (f1,μ1), (f2,μ2) unidistributional estimation problems with respective Δ(poly,log)-optimal predictors ^P1 and ^P2. Assume μ1 is Δ(log)-sampleable and (f2,μ2) is Δ(log)-generatable. Define ^P by ^Pj((x1,x2)):=^Pj1(x1)^Pj2(x2). Then, ^P is a Δ(poly,log)-optimal predictor for (f1×f2,μ1×μ2).
Theorem 5
Consider μ a probability measure on {0,1}∗ and f:suppμ→[0,1], D⊆suppμ. Assume ^PD is a Δ(poly,log)-optimal predictor for (χD,μ) and ^Pf∣D is a Δ(poly,log)-optimal predictor for (f,μ∣D). Define ^P by ^Pj(x):=^PjD(x)^Pjf∣D(x). Then ^Pj(x) is a Δ(poly,log)-optimal predictor for (χDf,μ).
Theorem 6
Fix h a polynomial s.t. 2−h∈Δ. Consider μ a probability measure on {0,1}∗, f:suppμ→[0,1] and D⊆suppμ non-empty. Assume ^PD is a Δ(poly,log)-optimal predictor for (χD,μ) and ^PχDf is a Δ(poly,log)-optimal predictor for (χDf,μ). Define ^Pf∣D by
^Pjf∣D(x):=⎧⎪
⎪⎨⎪
⎪⎩1if ^PjD(x)=0η(^PjχDf(x)^PjD(x))rounded to h(j) binary places if ^PjD(x)>0
Then, ^Pf∣D is a Δ(poly,log)-optimal predictor for (f,μ∣D).
Definition 6
Consider μ a probability measure on {0,1}∗ and ^Q1=(Q1,s1,b1), ^Q2=(Q2,s2,b2)(poly,log)-predictors. We say ^Q1 is Δ-similar to ^Q2 relative to μ (denoted ^Q1μ≃Δ^Q2) when Eμ×Us1(k,j)×Us2(k,j)[(^Qj1(x,y1)−^Qj2(x,y2)2]∈Δ.
Theorem 7
Consider (f,μ) a unidistributional estimation problem, ^P a Δ(poly,log)-optimal predictor for (f,μ) and ^Q a (poly,log)-predictor. Then, ^Q is a Δ(poly,log)-optimal predictor for (f,μ) if and only if ^Pμ≃Δ^Q.
Definition 7
Consider (f,μ), (g,ν) unidistributional estimation problems, ^ζ=(ζ,rζ,aζ) a (poly,log)-scheme of signature {0,1}∗→{0,1}∗. ^ζ is called a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) when the following conditions hold:
(i) Eμ×Urζ(j)[(g(^ζj(x))−f(x))2]∈Δ
(ii) Prμ×Urζ(j)[ν(^ζj(x))=0]∈Δ
(iii) There is M>0 and ^R=(R,rR,aR) a (poly,log)-scheme of signature {0,1}∗→Q∩[0,M] s.t.
Eν×UrR(j)[(^Rj(y)−Prμ×Urζ(j)[^ζj(x)=y]ν(y))2]∈Δ
(iv) There is a (poly,log)-scheme ^ξ=(ξ,rξ,aξ) of signature {0,1}∗→{0,1}∗ s.t.
The conditions of Definition 7 are weaker than corresponding definitions in previous posts in the sense that exact equalities were replaced by approximate equalities with error bounds related to Δ. However, analogous relaxations can be done in the “multidistributional” theory too.
Theorem 8
Suppose (j+1)−1∈Δ. Consider (f,μ), (g,ν) unidistributional estimation problems, ^ζ a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^Pg a Δ(poly,log)-optimal predictor for (g,ν). Define ^Pf by ^Pjf(x):=^Pjg(^ζj(x)). Then, ^Pf is a Δ(poly,log)-optimal predictor for (f,μ).
Defintion 8
Δ1ll is the set of bounded functions δ:N→R≥0 s.t.
∃ϵ>0:limj→∞(loglogj)ϵδ(j)=0
It is easily seen that Δ1ll is a rank 1 error space.
Theorem 9
Consider (f,μ) a unidistributional estimation problem. Define Υ:N×{0,1}∗3alg−→[0,1] by
Υj(x,y,Q):=β(evj(Q,x,y))
Define υf,μ:N→{0,1}∗ by
υjf,μ:=argmin|Q|≤logjEμ×Uj[(Υj(x,y,Q)−f(x))2]
Then, (Υ,j,υf,μ) is a Δ1ll(poly,log)-optimal predictor for (f,μ).
Theorem 10
There is an oracle machine Λ that accepts an oracle of signature O:N×{0,1}∗→{0,1}∗×[0,1] and a polynomial r:N→N where the allowed oracle calls are Ok(x) for |x|=r(k) and computes a function of signature N×{0,1}∗2→[0,1] s.t. for any (f,μ) a unidistributional estimation problem and ^G a corresponding Δ1ll(log)-generator, Λ[^G] is a Δ1ll(poly,log)-optimal predictor for (f,μ).
The following is the description of Λ. Consider O:N×{0,1}∗→{0,1}∗×[0,1] and a polynomial r:N→N. We describe the computation of Λ[O,r]j(x) where the extra argument of Λ is regarded as internal coin tosses.
We loop over the first j words in lexicographic order. Each word is interpreted as a program Q:{0,1}∗2alg−→[0,1]. We loop over j2 “test runs”. At test run i, we generate (xi∈{0,1}∗,ti∈[0,1]) by evaluating Oj(yi) for yi sampled from Ur(j). We then sample zi from Uj and compute si:=evj(Q,xi,zi). At the end of the test runs, we compute the average error ϵ(Q):=1j2∑i(si−ti)2. At the end of the loop over programs, the program Q∗ with the lowest error is selected and the output evj(Q∗,x) is produced.
Appendix
Definition 9
Given n∈N, a function δ:Nn+1→R≥0 is called Δ-moderate when
(i) δ is non-decreasing in arguments 2 to n+1.
(ii) For any collection of polynomials {pi:N2→N}i<n, δ(j,p0(j)…pn−1(j))∈Δ
Lemma 1
Fix (f,μ) a unidistributional estimation problem and ^P:=(P,r,a) a (poly,log)-predictor. Then, ^P is Δ(poly,log)-optimal iff there is a Δ-moderate function δ:N3→[0,1] s.t. for any j,s∈N, Q:{0,1}∗2alg−→[0,1]
The proof is analogous to the case of Δ(poly,log)-optimal predictor schemes and we omit it.
Lemma 2
Suppose (j+1)−1∈Δ. Fix (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a corresponding Δ(poly,log)-optimal predictor. Consider ^Q=(Q,s,b) a (poly,log)-predictor, M>0 and ^w=(w,rw,aw) a (poly,log)-scheme of signature {0,1}∗→Q∩[0,M]. Assume rw(j)≥max(r(j),s(j)). Then there is δ∈Δ s.t.
The proof is analogous to the case of Δ(poly,log)-optimal predictor schemes and we omit it.
Lemma 3
Consider (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a Δ(poly,log)-optimal predictor for (f,μ). Then there are c1,c2∈R and a Δ-moderate function δ:N3→[0,1] s.t. for any j,s∈N, Q:{0,1}∗2alg−→[0,1]
Conversely, consider M∈Q and ^P a (poly,log)-scheme of signature {0,1}∗→Q∩[−M,+M]. Suppose that for any (poly,log)-scheme ^Q=(Q,s,b) of signature {0,1}∗→Q∩[−M−1,+M] we have
|Eμ×Us(j)×Ur(j)[^Qj(^Pj−f)]|∈Δ
Define ~P to be a (poly,log)-predictor s.t. computing ~Pj is equivalent to computing η(^Pj) rounded to h(j) digits after the binary point, where 2−h∈Δ. Then, ~P is a Δ(poly,log)-optimal predictor for (f,μ).
The proof is analogous to the case of Δ(poly,log)-optimal predictor schemes and we omit it.
Lemma 4
Consider μ a probability measure on {0,1}∗. Suppose ^S is a Δ(log)-sampler for μ. Then, ∃δ∈Δ s.t. for any bounded function f:suppμ→R
Since ^S is a Δ(log)-sampler, we get the desired result.
Lemma 5
Consider a family of sets {Xk}k∈N and family of probability measures {μk}k∈N on Xk. Denote Y:=⊔ksuppμk. Consider f:Y→R a function and g:Y→R a bounded function. Suppose that
Eμk[f(x)2]∈Δ
Then it follows that
|Eμk[g(x)f(x)]|∈Δ
Proof of Lemma 5
|Eμk[g(x)f(x)]|≤Eμk[|g(x)f(x)|]
|Eμk[g(x)f(x)]|≤(sup|g|)Eμk[|f(x)|]
|Eμk[g(x)f(x)]|≤(sup|g|)√Eμk[f(x)2]
Proposition 1
Consider (f,μ) a unidistributional estimation problem, ^G a Δ(log)-generator for (f,μ) and g:suppμ→R a bounded function. Then
|Eμ[gf]−E[g(^Gk1)^Gk2]|∈Δ
Proof of Proposition 1
By Lemma 4 we have
|Eμ[gf]−E[g(^Gk1)f(^Gk1)]|∈Δ
By property (ii) of generators and Lemma 5 we have
By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose ^G=(G,rG,aG) is a Δ(log)-generator for (f2,μ2). Applying Proposition 1 to the first term, we get
Consider a family of sets {Xk}k∈N and family of probability measures {μk}k∈N on Xk. Denote Y:=⊔ksuppμk. Consider f1,f2:Y→R bounded functions and {gα:Y→R}α∈I a uniformly bounded family of functions indexed by some set I. Suppose that
Consider (f,μ), (g,ν) unidistributional estimation problems. Suppose ^ζ=(ζ,rζ,aζ) is a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^ξ=(ξ,rξ,aξ) is it’s Δ-pseudo-inverse. Then there is δ∈Δ s.t. for any bounded function h:{0,1}∗2→R
Suppose α is s.t. limj→∞(loglogj)αδ(j)=0. We claim that limj→∞(loglogj)αδ′(j)=0. Assume to the contrary that for some ϵ>0 we have an unbounded sequence {ni}i∈N s.t. (loglogni)αδ′(ni)≥ϵ. We can then choose {mi}i∈N s.t.mi≥ni and (loglogni)αδ(mi)≥ϵ2. But this implies (loglogmi)αδ(mi)≥ϵ2 which is a contradiction.
Proof of Theorem 10
Consider ^P=(P,r,a) a (poly,log)-predictor. Choose p:N→N a non-constant polynomial s.t. evaluating Λ[G]p(j) involves running ^Pj until it halts “naturally” (such p exists because ^P runs in at most polynomial time and has at most logarithmic advice). Given i,j∈N, consider the execution of Λ[G]p(j)+i. The standard deviation of ϵ(^Pj) with respect to the internal coin tosses of Λ is at most (p(j)+i)−1. Applying Lemma 6 followed by Lemma 4, the expectation value is E[(^Pj−f)2]+γP where |γP|≤δ(p(j)+i) for δ∈Δ1ll. By Chebyshev’s inequality,
Optimal predictors for global probability measures
There are two commonly used formalisms in average case complexity theory: problems with a single global probability measure on parameter space or problems with a family of such probability measures. Previous posts about optimal predictors focused on the family approach. In this post we give the formalism for global measures.
Results
New notation
1 will denote the one-element set.
Given μ a probability measure on {0,1}∗, X a set, Q:{0,1}∗2alg−→X, TμQ(s) stands for the maximal runtime of Q(x,y) for x∈suppμ, y∈{0,1}s.
Definition 1
A unidistributional estimation problem is a pair (f,μ) where μ is a probability measure on {0,1}∗ and f:suppμ→[0,1] is an arbitrary function.
Definition 2
Given appropriate sets X, Y, consider P:N×X×{0,1}∗2alg−→Y, r:N→N polynomial and a:N→{0,1}∗. The triple ^P=(P,r,a) is called a (poly,log)-scheme of signature X→Y when
(i) The runtime of P(j,x,y,z) is bounded by p(j) with p polynomial.
(ii) |a(j)|≤c1+c2log(j+1) for some c1,c2∈N.
A (poly,log)-scheme of signature {0,1}∗→[0,1] will also be called a (poly,log)-predictor.
Given j∈N, x∈X, ^Pj(x) will denote the Y-valued random variable P(j,x,y,a(j)) where y is sampled from the probability measure Ur(j). We will also use the notation ^Pj(x,y):=P(j,x,y,a(j)).
Fix Δ an error space of rank 1.
Definition 3
Consider (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a (poly,log)-predictor.^P is called a Δ(poly,log)-optimal predictor for (f,μ) when for any (poly,log)-predictor ^Q=(Q,s,b), there is δ∈Δ s.t.
Eμ×Ur(j)[(^Pj(x)−f(x))2]≤Eμ×Us(j)[(^Qj(x)−f(x))2]+δ(j)
Δ(poly,log)-optimal predictors for unidistributional problems have properties and existence theorems analogical to Δ(poly,log)-optimal predictor schemes, where the role of the rank 2 error space Δ2avg is taken by the rank 1 error space Δ1ll (see Definition 8). The theorems are listed below. The proofs of Theorem 4, Theorem 8 (which is stated stronger than previous analogues) and Theorem 10 are given in the appendix. Adapting the other proofs is straightforward.
Theorem 1
Suppose (j+1)−1∈Δ. Consider (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a Δ(poly,log)-optimal predictor for (f,μ). Suppose {pj∈[0,1]}j∈N, {qj∈[0,1]}j∈N are s.t.
∃ϵ>0∀j:(μ×Ur(j)){(x,y)∈{0,1}∗2∣pj≤^Pj(x,y)≤qj}≥ϵ
Define
ϕj:=Eμ×Ur(j)[f(x)−^Pj(x,y)∣pj≤^Pj(x,y)≤qj]
Assume that either pj,qj have a number of digits logarithmically bounded in j or Pj produces outputs with a number of digits logarithmically bounded in j (by Theorem A.7 if any Δ(poly,log)-optimal predictor exists for (f,μ) then a Δ(poly,log)-optimal predictor with this property exists as well). Then, |ϕ|∈Δ.
Theorem 2
Consider μ a probability measure on {0,1}∗ and f1,f2:suppμ→[0,1] s.t. f1+f2≤1. Suppose ^P1 is a Δ(poly,log)-optimal predictor for (f1,μ) and ^P2 is a Δ(poly,log)-optimal predictor for (f2,μ). Then η(^P1+^P2) is a Δ(poly,log)-optimal predictor for (f1+f2,μ).
Theorem 3
Consider μ a probability measure on {0,1}∗ and f1,f2:suppμ→[0,1] s.t. f1+f2≤1. Suppose ^P1 is a Δ(poly,log)-optimal predictor for (f1,μ) and ^P is a Δ(poly,log)-optimal predictor for (f1+f2,μ). Then, η(^P−^P1) is a Δ(poly,log)-optimal predictor for (f2,μ).
Definition 4
Fix Δ an error space of rank 1. A probability measure μ on {0,1}∗ is called Δ(log)-sampleable when there is a (poly,log)-scheme ^S of signature 1→{0,1}∗ such that
∑x∈{0,1}∗|μ(x)−Pr[^Sk=x]|∈Δ
^S is called a Δ(log)-sampler for μ.
Definition 5
Consider Δ an error space of rank 1. A unidistributional estimation problem (f,μ) is called Δ(log)-generatable when there is a (poly,log)-scheme of ^G of signature 1→{0,1}∗×[0,1] such that
(i) ^G1 is a Δ(log)-sampler for μ.
(ii) E[(^Gk2−f(^Gk1))2]∈Δ
^G is called a Δ(log)-generator for (f,μ).
Theorem 4
Consider (f1,μ1), (f2,μ2) unidistributional estimation problems with respective Δ(poly,log)-optimal predictors ^P1 and ^P2. Assume μ1 is Δ(log)-sampleable and (f2,μ2) is Δ(log)-generatable. Define ^P by ^Pj((x1,x2)):=^Pj1(x1)^Pj2(x2). Then, ^P is a Δ(poly,log)-optimal predictor for (f1×f2,μ1×μ2).
Theorem 5
Consider μ a probability measure on {0,1}∗ and f:suppμ→[0,1], D⊆suppμ. Assume ^PD is a Δ(poly,log)-optimal predictor for (χD,μ) and ^Pf∣D is a Δ(poly,log)-optimal predictor for (f,μ∣D). Define ^P by ^Pj(x):=^PjD(x)^Pjf∣D(x). Then ^Pj(x) is a Δ(poly,log)-optimal predictor for (χDf,μ).
Theorem 6
Fix h a polynomial s.t. 2−h∈Δ. Consider μ a probability measure on {0,1}∗, f:suppμ→[0,1] and D⊆suppμ non-empty. Assume ^PD is a Δ(poly,log)-optimal predictor for (χD,μ) and ^PχDf is a Δ(poly,log)-optimal predictor for (χDf,μ). Define ^Pf∣D by
^Pjf∣D(x):=⎧⎪ ⎪⎨⎪ ⎪⎩1if ^PjD(x)=0η(^PjχDf(x)^PjD(x))rounded to h(j) binary places if ^PjD(x)>0
Then, ^Pf∣D is a Δ(poly,log)-optimal predictor for (f,μ∣D).
Definition 6
Consider μ a probability measure on {0,1}∗ and ^Q1=(Q1,s1,b1), ^Q2=(Q2,s2,b2) (poly,log)-predictors. We say ^Q1 is Δ-similar to ^Q2 relative to μ (denoted ^Q1μ≃Δ^Q2) when Eμ×Us1(k,j)×Us2(k,j)[(^Qj1(x,y1)−^Qj2(x,y2)2]∈Δ.
Theorem 7
Consider (f,μ) a unidistributional estimation problem, ^P a Δ(poly,log)-optimal predictor for (f,μ) and ^Q a (poly,log)-predictor. Then, ^Q is a Δ(poly,log)-optimal predictor for (f,μ) if and only if ^Pμ≃Δ^Q.
Definition 7
Consider (f,μ), (g,ν) unidistributional estimation problems, ^ζ=(ζ,rζ,aζ) a (poly,log)-scheme of signature {0,1}∗→{0,1}∗. ^ζ is called a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) when the following conditions hold:
(i) Eμ×Urζ(j)[(g(^ζj(x))−f(x))2]∈Δ
(ii) Prμ×Urζ(j)[ν(^ζj(x))=0]∈Δ
(iii) There is M>0 and ^R=(R,rR,aR) a (poly,log)-scheme of signature {0,1}∗→Q∩[0,M] s.t.
Eν×UrR(j)[(^Rj(y)−Prμ×Urζ(j)[^ζj(x)=y]ν(y))2]∈Δ
(iv) There is a (poly,log)-scheme ^ξ=(ξ,rξ,aξ) of signature {0,1}∗→{0,1}∗ s.t.
Eμ×Urζ(k)[∑x′∈{0,1}∗|PrUrξ(k)[^ξk(^ζk(x,z),w)=x′]−Prμ×Urζ(k)[x′′=x′∣^ζk(x′′,z′)=^ζk(x,z)]|]∈Δ
Such ^ξ is called a Δ-pseudo-inverse of ^ζ.
Note 1
The conditions of Definition 7 are weaker than corresponding definitions in previous posts in the sense that exact equalities were replaced by approximate equalities with error bounds related to Δ. However, analogous relaxations can be done in the “multidistributional” theory too.
Theorem 8
Suppose (j+1)−1∈Δ. Consider (f,μ), (g,ν) unidistributional estimation problems, ^ζ a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^Pg a Δ(poly,log)-optimal predictor for (g,ν). Define ^Pf by ^Pjf(x):=^Pjg(^ζj(x)). Then, ^Pf is a Δ(poly,log)-optimal predictor for (f,μ).
Defintion 8
Δ1ll is the set of bounded functions δ:N→R≥0 s.t.
∃ϵ>0:limj→∞(loglogj)ϵδ(j)=0
It is easily seen that Δ1ll is a rank 1 error space.
Theorem 9
Consider (f,μ) a unidistributional estimation problem. Define Υ:N×{0,1}∗3alg−→[0,1] by
Υj(x,y,Q):=β(evj(Q,x,y))
Define υf,μ:N→{0,1}∗ by
υjf,μ:=argmin|Q|≤logjEμ×Uj[(Υj(x,y,Q)−f(x))2]
Then, (Υ,j,υf,μ) is a Δ1ll(poly,log)-optimal predictor for (f,μ).
Theorem 10
There is an oracle machine Λ that accepts an oracle of signature O:N×{0,1}∗→{0,1}∗×[0,1] and a polynomial r:N→N where the allowed oracle calls are Ok(x) for |x|=r(k) and computes a function of signature N×{0,1}∗2→[0,1] s.t. for any (f,μ) a unidistributional estimation problem and ^G a corresponding Δ1ll(log)-generator, Λ[^G] is a Δ1ll(poly,log)-optimal predictor for (f,μ).
The following is the description of Λ. Consider O:N×{0,1}∗→{0,1}∗×[0,1] and a polynomial r:N→N. We describe the computation of Λ[O,r]j(x) where the extra argument of Λ is regarded as internal coin tosses.
We loop over the first j words in lexicographic order. Each word is interpreted as a program Q:{0,1}∗2alg−→[0,1]. We loop over j2 “test runs”. At test run i, we generate (xi∈{0,1}∗,ti∈[0,1]) by evaluating Oj(yi) for yi sampled from Ur(j). We then sample zi from Uj and compute si:=evj(Q,xi,zi). At the end of the test runs, we compute the average error ϵ(Q):=1j2∑i(si−ti)2. At the end of the loop over programs, the program Q∗ with the lowest error is selected and the output evj(Q∗,x) is produced.
Appendix
Definition 9
Given n∈N, a function δ:Nn+1→R≥0 is called Δ-moderate when
(i) δ is non-decreasing in arguments 2 to n+1.
(ii) For any collection of polynomials {pi:N2→N}i<n, δ(j,p0(j)…pn−1(j))∈Δ
Lemma 1
Fix (f,μ) a unidistributional estimation problem and ^P:=(P,r,a) a (poly,log)-predictor. Then, ^P is Δ(poly,log)-optimal iff there is a Δ-moderate function δ:N3→[0,1] s.t. for any j,s∈N, Q:{0,1}∗2alg−→[0,1]
Eμ×Ur(j)[(^Pj(x,y)−f(x))2]≤Eμ×Us[(Q(x,y)−f(x))2]+δ(j,TμQ(s),2|Q|)
The proof is analogous to the case of Δ(poly,log)-optimal predictor schemes and we omit it.
Lemma 2
Suppose (j+1)−1∈Δ. Fix (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a corresponding Δ(poly,log)-optimal predictor. Consider ^Q=(Q,s,b) a (poly,log)-predictor, M>0 and ^w=(w,rw,aw) a (poly,log)-scheme of signature {0,1}∗→Q∩[0,M]. Assume rw(j)≥max(r(j),s(j)). Then there is δ∈Δ s.t.
Eμ×Urw(j)[^wj(x,y)(^Pj(x,y≤r(j))−f(x))2]≤Eμ×Urw(j)[^wj(x,y)(^Qj(x,y≤s(j))−f(x))2]+δ(j)
The proof is analogous to the case of Δ(poly,log)-optimal predictor schemes and we omit it.
Lemma 3
Consider (f,μ) a unidistributional estimation problem and ^P=(P,r,a) a Δ(poly,log)-optimal predictor for (f,μ). Then there are c1,c2∈R and a Δ-moderate function δ:N3→[0,1] s.t. for any j,s∈N, Q:{0,1}∗2alg−→[0,1]
|Eμ×Us×Ur(j)[Q(^Pj−f)]|≤(c1+c2Eμ×Us[Q2])δ(j,TμQ(s),2|Q|)
Conversely, consider M∈Q and ^P a (poly,log)-scheme of signature {0,1}∗→Q∩[−M,+M]. Suppose that for any (poly,log)-scheme ^Q=(Q,s,b) of signature {0,1}∗→Q∩[−M−1,+M] we have
|Eμ×Us(j)×Ur(j)[^Qj(^Pj−f)]|∈Δ
Define ~P to be a (poly,log)-predictor s.t. computing ~Pj is equivalent to computing η(^Pj) rounded to h(j) digits after the binary point, where 2−h∈Δ. Then, ~P is a Δ(poly,log)-optimal predictor for (f,μ).
The proof is analogous to the case of Δ(poly,log)-optimal predictor schemes and we omit it.
Lemma 4
Consider μ a probability measure on {0,1}∗. Suppose ^S is a Δ(log)-sampler for μ. Then, ∃δ∈Δ s.t. for any bounded function f:suppμ→R
|Eμ[f]−E[f(^Sk)]|≤(sup|f|)δ(k)
Proof of Lemma 4
Eμ[f]−E[f(^Sk)]=∑x∈{0,1}∗μ(x)f(x)−∑x∈{0,1}∗Pr[^Sk=x]f(x)
Eμ[f]−E[f(^Sk)]=∑x∈{0,1}∗(μ(x)−Pr[^Sk=x])f(x)
|Eμ[f]−E[f(^Sk)]|≤∑x∈{0,1}∗|(μ(x)−Pr[^Sk=x])f(x)|
|Eμ[f]−E[f(^Sk)]|≤(sup|f|)∑x∈{0,1}∗|μ(x)−Pr[^Sk=x]|
Since ^S is a Δ(log)-sampler, we get the desired result.
Lemma 5
Consider a family of sets {Xk}k∈N and family of probability measures {μk}k∈N on Xk. Denote Y:=⊔ksuppμk. Consider f:Y→R a function and g:Y→R a bounded function. Suppose that
Eμk[f(x)2]∈Δ
Then it follows that
|Eμk[g(x)f(x)]|∈Δ
Proof of Lemma 5
|Eμk[g(x)f(x)]|≤Eμk[|g(x)f(x)|]
|Eμk[g(x)f(x)]|≤(sup|g|)Eμk[|f(x)|]
|Eμk[g(x)f(x)]|≤(sup|g|)√Eμk[f(x)2]
Proposition 1
Consider (f,μ) a unidistributional estimation problem, ^G a Δ(log)-generator for (f,μ) and g:suppμ→R a bounded function. Then
|Eμ[gf]−E[g(^Gk1)^Gk2]|∈Δ
Proof of Proposition 1
By Lemma 4 we have
|Eμ[gf]−E[g(^Gk1)f(^Gk1)]|∈Δ
By property (ii) of generators and Lemma 5 we have
|E[g(^Gk1)^Gk2]−E[g(^Gk1)f(^Gk1)]|∈Δ
Combining the two we get the desired result.
Proof of Theorem 4
We have
^P((x1,x2))−(f1×f2)(x1,x2)=(^P1(x1)−f1(x1))f2(x2)+^P1(x1)(^P2(x2)−f2(x2))
Therefore, for any (poly,log)-scheme ^Q=(Q,s,b) of signature {0,1}∗→Q∩[−1,+1]
|E[^Q(^P−f1×f2)]|≤|E[^Q((x1,x2))^P1(x1)−f1(x1))f2(x2)]|+|E[^Q((x1,x2))^P1(x1)(^P2(x2)−f2(x2))]|
By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose ^G=(G,rG,aG) is a Δ(log)-generator for (f2,μ2). Applying Proposition 1 to the first term, we get
|Eμ1×μ2×Us(j)+r1(j)[^Qj((x1,x2))(^Pj1(x1)−f1(x1))f2(x2)]|≤|Eμ1×UrG(j)×Us(j)+r1(j)[^Qj((x1,^Gj1))(^Pj1(x1)−f1(x1))^Gj2]|+δ12(j)
where δ12∈Δ1.
|Eμ1×μ2×Us(j)+r1(j)[^Qj((x1,x2))(^Pj1(x1)−f1(x1))f2(x2)]|≤Eμ1[|EUrG(j)+s(j)+r1(j)[^Qj((x1,^Gj1))^Gj2(^Pj1(x1)−f1(x1))]|]+δ12(j)
Applying Lemma 3 for P1, we get
|Eμ1×μ2×Us(j)+r1(j)[^Qj((x1,x2))(^Pj1(x1)−f1(x1))f2(x2)]|≤Eμ1[δ1(j)]+δ12(j)≤δ1(j)+δ12(j)
where δ1∈Δ.
Suppose (S,rS,aS) is a Δ(log)-sampler for μ1. Applying Lemma 4 to the second term, we get
|Eμ1×μ2×Us(j)+r1(j)[^Qj((x1,x2))^P1(x1)(^Pj2(x2)−f2(x2))]|≤|EUrS(j)×μ2×Us(j)+r1(j)[^Qj((^Sj,x2))^P1(^Sj)(^Pj2(x2)−f2(x2))]|+δ11(j)
where δ11∈Δ1.
|Eμ1×μ2×Us(j)+r1(j)[^Qj((x1,x2))^P1(x1)(^Pj2(x2)−f2(x2))]|≤Eμ2[|EUrS(j)+s(j)+r1(j)[^Qj((^Sj,x2))^P1(^Sj)(^Pj2(x2)−f2(x2))]|]+δ11(j)
Applying Lemma 3 for P2, we get
|Eμ1×μ2×Us(j)+r1(j)[^Qj((x1,x2))^P1(x1)(^Pj2(x2)−f2(x2))]|≤Eμ2[δ2(j)]+δ11(j)≤δ2(j)+δ11(j)
where δ2∈Δ. Again, we got the required bound.
Lemma 6
Consider a family of sets {Xk}k∈N and family of probability measures {μk}k∈N on Xk. Denote Y:=⊔ksuppμk. Consider f1,f2:Y→R bounded functions and {gα:Y→R}α∈I a uniformly bounded family of functions indexed by some set I. Suppose that
Eμk[(f1(x)−f2(x))2]∈Δ
Then there is δ∈Δ s.t.
∀α∈I:|Eμk[(gα(x)−f1(x))2−(gα(x)−f2(x))2]|≤δ(k)
Proof of Lemma 6
(gα(x)−f1(x))2−(gα(x)−f2(x))2=(2gα(x)−f1(x)−f2(x))(f2(x)−f1(x))
|Eμk[(2gα(x)−f1(x)−f2(x))(f2(x)−f1(x))]|≤sup(|2gα−f1−f2|)√Eμk[(f1(x)−f2(x))2]
Proposition 2
Consider (f,μ), (g,ν) unidistributional estimation problems. Suppose ^ζ=(ζ,rζ,aζ) is a Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^ξ=(ξ,rξ,aξ) is it’s Δ-pseudo-inverse. Then there is δ∈Δ s.t. for any bounded function h:{0,1}∗2→R
|Eμ×Urζ(j)×Urξ(j)[h(^ξj(^ζj(x,z),w),^ζj(x,z))]−Eμ×Urζ(j)[h(x,^ζj(x,z))]|≤(sup|h|)δ(j)
Proof of Proposition 2
Denote μjζ:=μ×Urζ(j). According to the definitive property of ^ξ
Eμjζ[∑x′∈{0,1}∗|PrUrξ(j)[^ξj(^ζj(x,z),w)=x′]−Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)]|]=δ(j)
where δ∈Δ. Therefore
Eμjζ[∑x′∈{0,1}∗|(PrUrξ(j)[^ξj(^ζj(x,z),w)=x′]−Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)])h(x′,^ζj(x,z))|]≤(sup|h|)δ(j)
|Eμjζ[∑x′∈{0,1}∗(PrUrξ(j)[^ξj(^ζj(x,z),w)=x′]−Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)])h(x′,^ζj(x,z))]|≤(sup|h|)δ(j)
|Eμjζ[∑x′∈{0,1}∗Pr[^ξj(^ζj(x,z),w)=x′]h(x′,^ζj(x,z))]−Eμjζ[∑x′∈{0,1}∗Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)]h(x′,^ζj(x,z))]|≤(sup|h|)δ(j)
|Eμjζ×Urξ(j)[h(^ξj(^ζj(x,z),w),^ζj(x,z))]−Eμjζ[∑x′∈{0,1}∗Prμjζ[x′′=x′∣^ζj(x′′,z′)=^ζj(x,z)]h(x′,^ζj(x,z))]|≤(sup|h|)δ(j)
|Eμjζ×Urξ(j)[h(^ξj(^ζj(x,z),w),^ζj(x,z))]−Eμjζ[Eμjζ[h(x′,^ζj(x,z))∣^ζj(x′,z′)=^ζj(x,z)]]|≤(sup|h|)δ(j)
|Eμjζ×Urξ(j)[h(^ξj(^ζj(x,z),w),^ζj(x,z))]−Eμjζ[Eμjζ[h(x′,^ζj(x′,z′))∣^ζj(x′,z′)=^ζj(x,z)]]|≤(sup|h|)δ(j)
|Eμjζ×Urξ(j)[h(^ξj(^ζj(x,z),w),^ζj(x,z))]−Eμjζ[h(x,^ζj(x,z))]|≤(sup|h|)δ(j))
Proof of Theorem 8
Consider ^Qf=(Qf,sf,bf) a (poly,log)-predictor. Let ^Qg=(Qg,sg,bg) be the (poly,log)-predictor defined by
^Qjg(x):=^Qjf(^ξj(x))
Applying Lemma 2 we get
Eν×UrR(j)×Urg(j)[^Rj(y)(^Pjg(y)−g(y))2]≤Eν×UrR(j)×Usg(j)[^Rj(y)(^Qjg(y)−g(y))2]+δ(j)
where δ∈Δ.
Using the definitive property of ^R we can apply Lemma 5 to the left hand side and get
Eν×UrR(j)×Urg(j)[^Rj(y)(^Pjg(y)−g(y))2]=Eν×Urg(j)[Prμ×Urζ(j)[^ζj(x)=y]ν(y)(^Pjg(y)−g(y))2]+γR(j)
where |γR|∈Δ. Using property (ii) of pseudo-invertible reductions, we get
Eν×UrR(j)×Urg(j)[^Rj(y)(^Pjg(y)−g(y))2]=Eμ×Urg(j)×Urζ(j)[(^Pjg(^ζj(x))−g(^ζj(x)))2]+γR(j)
Using the definition of ^Pf and Lemma 6 applied via property (i) of pseudo-invertible reductions, we get
Eν×UrR(j)×Urg(j)[^Rj(y)(^Pjg(y)−g(y))2]=Eμ×Urg(j)×Urζ(j)[(^Pjf(x)−f(x))2]+γζ(j)+γR(j)
where |γζ|∈Δ.
Using the definitive property of ^R, Lemma 5 and property (ii) of pseudo-invertible reductions on the right-hand side, we get
Eν×UrR(j)×Usg(j)[^Rj(y)(^Qjg(y)−g(y))2]=Eμ×Usf(j)×Urζ(j)×Urξ(j)[(^Qjf(^ξj(^ζj(x)))−g(^ζj(x)))2]+γ′R(j)
where |γ′R|∈Δ. Applying Proposition 2
Eν×UrR(j)×Usg(j)[^Rj(y)(^Qjg(y)−g(y))2]=Eμ×Usf(j)×Urζ(j)[(^Qjf(x)−g(^ζj(x)))2]+γξ(j)+γ′R(j)
where |γξ|∈Δ. Applying Lemma 6 via property (i) of pseudo-invertible reductions
Eν×UrR(j)×Usg(j)[^Rj(y)(^Qjg(y)−g(y))2]=Eμ×Usf(j)[(^Qjf(x)−f(x))2]+γ′ζ(j)+γξ(j)+γ′R(j)
Putting everything together
Eμ×Urg(j)×Urζ(j)[(^Pjf(x)−f(x))2]≤Eμ×Usf(j)[(^Qjf(x)−f(x))2]+δ′(j)
for δ′∈Δ.
Proposition 3
Consider a>1 and δ:[a,∞)→R≥0 a non-increasing function. Suppose that
∫∞aδ(x)xlogxdx<∞
Then
limx→∞(loglogx)δ(x)=0
Proof of Proposition 3
Assume to the contrary that there is ϵ>0 and an unbounded sequence {xi∈[a,∞)}i∈N s.t.
(loglogxi)δ(xi)≥ϵ
Define yi:=2√logxi. For any x∈[yi,xi] we have
δ(x)≥δ(xi)≥ϵloglogxi=ϵ2loglogyi≥ϵ2loglogx
Therefore
∫xiyiδ(x)xlogxdx≥ϵ2∫xiyidxxlogxloglogx=ϵ2(logloglogxi−logloglogyi)=ϵ2
Since we can choose an infinite number of non-overlapping intervals of the form [yi,xi], we reach the contradiction
∫∞aδ(x)xlogxdx=∞
Proposition 4
Consider a polynomial q:N→N. There is a function λq:N2→[0,1] s.t.
(i) ∀j∈N:∑i∈Nλq(j,i)=1
(ii)For any non-increasing function ϵ:N→[0,1] we have
ϵ(j)−∑i∈Nλq(j,i)ϵ(q(j)+i)∈Δ1ll
Proof of Proposition 4
Given polynomials q1,q2:N→N s.t.q1(j)≥q2(j) for j≫0, the proposition for q1 implies the proposition for q2 by setting
λq2(j,i):={λq1(j,i−q1(j)+q2(j))if i−q1(j)+q2(j)≥00if i−q1(j)+q2(j)<0
Therefore, it is enough to prove to proposition for polynomials of the form q(j)=jm for m>0.
We have
∫3mx=3ϵ(⌊x⌋)d(loglogx)≤∫3mx=3d(loglogx)=logm
For any M>0
∫3mx=3ϵ(⌊x⌋)d(loglogx)−∫Mmx=Mϵ(⌊x⌋)d(loglogx)≤logm
∫Mx=3ϵ(⌊x⌋)d(loglogx)−∫Mmx=3mϵ(⌊x⌋)d(loglogx)≤logm
∫Mx=3ϵ(⌊x⌋)d(loglogx)−∫Mx=3ϵ(⌊xm⌋)d(loglogx)≤logm
∫Mx=3(ϵ(⌊x⌋)−ϵ(⌊xm⌋))d(loglogx)≤logm
∫∞x=3(ϵ(⌊x⌋)−ϵ(⌊xm⌋))d(loglogx)≤logm
Since ⌊xm⌋≥⌊x⌋m we can choose λq satisfying condition (i) so that
j+1∫x=jϵ(⌊xm⌋)d(loglogx)=(loglog(j+1)−loglogj)∑iλq(j,i)ϵ(jm+i)
It follows that
∫∞x=3(ϵ(⌊x⌋)−∑iλq(⌊x⌋,i)ϵ(⌊x⌋m+i))d(loglogx)≤logm
∫∞3ϵ(⌊x⌋)−∑iλq(⌊x⌋,i)ϵ(⌊x⌋m+i)xlogxdx≤logm
Applying Proposition 3, we get the desired result.
Lemma 7
Consider (f,μ) a unidistributional estimation problem, ^P=(P,r,a), ^Q=(Q,s,b) (poly,log)-predictors. Suppose p:N→N a polynomial and δ∈Δ1ll are s.t.
∀i,j∈N:E[(^Pp(j)+i−f)2]≤E[(^Qj−f)2]+δ(j)
Then ∃δ′∈Δ1ll s.t.
E[(^Pj−f)2]≤E[(^Qj−f)2]+δ′(j)
Proof of Lemma 7
Define ϵ(j):=supk≥jE[(^Pk−f)2].
By Proposition 4 we have
~δ(j):=ϵ(j)−∑iλp(j,i)ϵ(p(j)+i)∈Δ1ll
ϵ(j)=∑iλp(j,i)ϵ(p(j)+i)+~δ(j)
ϵ(j)≤∑iλp(j,i)(E[(^Qj−f)2]+δ(j))+~δ(j)
ϵ(j)≤E[(^Qj−f)2]+δ(j)+~δ(j)
E[(^Pj−f)2]≤E[(^Qj−f)2]+δ(j)+~δ(j)
Proposition 5
Consider δ∈Δ1ll. Define δ′(j):=supk≥jδ(k). Then, δ′∈Δ1ll.
Proof of Proposition 5
Suppose α is s.t. limj→∞(loglogj)αδ(j)=0. We claim that limj→∞(loglogj)αδ′(j)=0. Assume to the contrary that for some ϵ>0 we have an unbounded sequence {ni}i∈N s.t. (loglogni)αδ′(ni)≥ϵ. We can then choose {mi}i∈N s.t.mi≥ni and (loglogni)αδ(mi)≥ϵ2. But this implies (loglogmi)αδ(mi)≥ϵ2 which is a contradiction.
Proof of Theorem 10
Consider ^P=(P,r,a) a (poly,log)-predictor. Choose p:N→N a non-constant polynomial s.t. evaluating Λ[G]p(j) involves running ^Pj until it halts “naturally” (such p exists because ^P runs in at most polynomial time and has at most logarithmic advice). Given i,j∈N, consider the execution of Λ[G]p(j)+i. The standard deviation of ϵ(^Pj) with respect to the internal coin tosses of Λ is at most (p(j)+i)−1. Applying Lemma 6 followed by Lemma 4, the expectation value is E[(^Pj−f)2]+γP where |γP|≤δ(p(j)+i) for δ∈Δ1ll. By Chebyshev’s inequality,
Pr[ϵ(^Pj)≥E[(^Pj−f)2]+δ(p(j)+i)+(p(j)+i)−12]≤(p(j)+i)−1
Hence
Pr[ϵ(Q∗)≥E[(^Pj−f)2]+δ(p(j)+i)+(p(j)+i)−12]≤(p(j)+i)−1
The standard deviation of ϵ(Q) for any Q is also at most (p(j)+i)−1. The expectation value is E[(evp(j)+i(Q)−f)2]+γQ where |γQ|≤δ(p(j)+i). Therefore
Pr[∃Q<p(j)+i:ϵ(Q)≤E[(evp(j)+i(Q)−f)2]−δ(p(j)+i)−(p(j)+i)−14]≤(p(j)+i)(p(j)+i)−32=(p(j)+i)−12
The extra p(j)+i factor comes from summing probabilities over p(j)+i programs. Combining we get
Pr[E[(evp(j)+i(Q∗)−f)2]≥E[(^Pj−f)2]+2δ(p(j)+i)+(p(j)+i)−12+(p(j)+i)−14]≤(p(j)+i)−1+(p(j)+i)−12
E[(Λ[G]p(j)+i−f)2]≤E[(^Pj−f)2]+2δ(p(j)+i)+(p(j)+i)−1+2(p(j)+i)−12+(p(j)+i)−14
By Proposition 5, we can assume δ is non-increasing without loss of generality. Therefore
E[(Λ[G]p(j)+i−f)2]≤E[(^Pj−f)2]+2δ(p(j))+p(j)−1+2p(j)−12+p(j)−14
Applying Lemma 7 we get the desired result.