abramdemski comments on Asymptotic Logical Uncertainty: Solomonoff Induction Inspired Approach

abramdemski 4 Jul 2015 23:26 UTC
LW: 1 AF: 1
AF
I agree, this version works.

To walk through it in a bit more detail:

Now we are only considering two sentence schemas, $ϕ_{s_{A (n)}}$ and $t_{n} := ϕ_{s_{n}} \leftrightarrow ϕ_{n_{A (n)}}$ . (Also, ignore the (rare) case where $n$ is an Ackermann number.)

I’ll call the Benford probability $B := \frac{1}{log 10}$ , and (as before) the $t_{n}$ probability $P := B^{2} + (1 - B)^{2}$ .

At the time when $S I A$ considers $t_{n}$ , we assume it does not give its sampled programs enough time to solve either $ϕ_{s_{n}}$ or $ϕ_{n_{A (n)}}$ . (This assumption is part of the problem setup; it seems likely that cases like this cannot be ruled out by a simple rule, though.) The best thing the programs can do is treat the $t_{n}$ like coin flips with probability $P$ .

At the time when $ϕ_{s_{A (n)}}$ is considered, the program has enough time to compute $ϕ_{s_{n}}$ (again as part of the problem setup). It can also remember what guess it made on $t_{n}$ . The best thing it can do now is to logically combine those to determine $ϕ_{s_{A (n)}}$ . This causes it to not treat $ϕ_{s_{A (n)}}$ like a random coin. For cases where $ϕ_{s_{n}} = t r u e$ , the population of sampled programs will guess $ϕ_{s_{A (n)}} = t r u e$ with frequency approaching $P$ . For cases where $ϕ_{s_{n}} = f a l s e$ , the frequency will be $1 - P$ .

This behavior is the optimal response to the problem as given to $S I A$ , but is suboptimal for what we actually wanted. The Bayes score of $S I A$ on the sub-sequence consisting of only the $ϕ_{s_{A (n)}}$ is suboptimal. It will average out to probability $B$ , but continue to be higher and lower for individual cases, without actually predicting those cases more effectively; $S I A$ is acting like it thinks there is a correlation between $ϕ_{s_{n}}$ and $ϕ_{n_{A (n)}}$ when there is none. (This is especially odd considering that $S I A$ isn’t even being asked to predict $ϕ_{s_{n}}$ in general, in this case!)

This is still not a proof, but it looks like it could be turned into one.

I’m hoping writing it out like this unpacks some of the mutual assumptions Scott and I share as a result of talking about this.