harfe comments on The Learning-Theoretic Agenda: Status 2023

harfe Jul 10, 2023, 3:08 PM
9 points
0
I think the conjecture is also false in the case that utility functions map from $O^{ω}$ to $[0, 1]$ .

Let us consider the case of $A = {a_{1}, a_{2}}$ and $O = {o_{1}, o_{2}}$ . We use $U_{1} (o) = 1 - 2^{- k}$ , where $k$ is the largest integer such that $o$ starts with $o_{1}^{k}$ (and $U_{1} (o_{1}^{ω}) = 1$ ). As for $U_{2}$ , we use $U_{2} (o) = 1 - 3^{- k}$ , where $k$ is the largest integer such that $o$ starts with $o_{1}^{k}$ (and $U_{2} (o_{1}^{ω}) = 1$ ). Both $U_{1}$ and $U_{2}$ are computable, but they are not locally equivalent. Under reasonable assumptions on the Solomonoff prior, the policy $π$ that always picks action $a_{1}$ is the optimal policy for both $U_{1}$ and $U_{2}$ (see proof below).

Note that since the policy is computable and very simple, $g_{0} (π) = \infty$ is not true, and we have $g_{0} (π) = O (1)$ instead. I suspect that the issues are still present even with an additional $g_{0} (π) = \infty$ condition, but finding a concrete example with an uncomputable policy is challenging.

proof: Suppose that $U_{1}$ and $U_{2}$ are locally equivalent. Let $V$ be an open neighborhood of the point $x = o_{1}^{ω}$ and $α > 0$ , $β \in R$ be such that $U_{1} (y) = α U_{2} (y) + β$ for all $y \in V$ .

Since $x \in V$ , we have $1 = U_{1} (x) = α U_{2} (x) + β = α + β$ . Because $V$ is an open neighborhood of $o_{1}^{ω}$ , there is an integer $N$ such that $o_{1}^{n} o_{2}^{ω} \in V$ for all $n \geq N$ . For such $n \geq N$ , we have $1 - 2^{- n} = U_{1} (o_{1}^{n} o_{2}^{ω}) = α U_{2} (o_{1}^{n} o_{2}^{ω}) + β = α (1 - 3^{- n}) + β = α + β - α 3^{- n} = 1 - α 3^{- n} .$ This implies $α = (2 / 3)^{- n}$ . However, this is not possible for all $n \geq N$ . Thus, our assumption that $U_{1}$ and $U_{2}$ are locally equivalent was wrong.

Assumptions about the solomonoff prior: For all $n$ , the sequence of actions that produces the sequence of $o_{1}^{n}$ with the highest probability is $a_{1}^{n - 1}$ (recall that we start with observations in this setting). With this assumption, it can be seen that the policy that always picks action $a_{1}$ is among the best policies for both $U_{1}$ and $U_{2}$ .

I think this is actually a natural behaviour for a reasonable Solomonoff prior: It is natural to expect that $o_{1} a_{1} o_{1}$ is more likely than $o_{1} a_{2} o_{1}$ . It is natural to expect that the sequence of actions that leads to $o_{1}$ over $o_{2}$ has low complexity. Always picking $a_{1}$ is low complexity.

It is possible to construct an artificial UTM that ensures that “always take $a_{1}$ ” is the best policy for $U_{1}$ , $U_{2}$ : An UTM can be constructed such that the corresponding Solomonoff prior assigns ³⁄₄ probability to the program/environment “start with o_1. after action a_i, output o_i”. The rest of the probability mass gets distributed according to some other more natural UTM.

Then, for $U_{1}$ , in each situation with history $o_{1}^{n}$ the optimal policy has to pick $a_{1}$ (the actions outside of this history have no impact on the utility): With ³⁄₄ probability it will get utility of at least $1 - 2^{- (n + 1)}$ . And with $1 / 4$ probability at least $1 - 2^{- n}$ . Whereas, for the choice of $a_{2}$ , with probability $3 / 4$ it will have utility of $1 - 2^{- n}$ , and with probability $1 / 4$ it can get at most $1$ . We calculate $(1 - 2^{- (n + 1)}) 3 / 4 + (1 - 2^{- n}) 1 / 4 = 1 - 5 2^{- (n + 3)} 1 - 3 \cdot 2^{- (n + 2)} = (1 - 2^{- n}) 3 / 4 + 1 / 4$ , ie. taking action $a_{1}$ is the better choice.

Similarly, for $U_{2}$ , the optimal policy has to pick $a_{1}$ too in each situation with history $o_{1}^{n}$ . Here, the calculation looks like $(1 - 3^{- (n + 1)}) 3 / 4 + (1 - 3^{- n}) 1 / 4 = 1 - 3^{- n} / 21 - \cdot 3^{- n + 1} / 4 = (1 - 3^{- n}) 3 / 4 + 1 / 4$ .