Scott Garrabrant comments on Asymptotic Logical Uncertainty: Solomonoff Induction Inspired Approach

Scott Garrabrant Jul 4, 2015, 9:15 PM
LW: 1 AF: 1
AF
I believe your concern comes from the fact that at the time the program has to assign a probability to $ϕ_{s_{A (n)}}$ , it has not only deduced the truth of $ϕ_{s_{n}}$ but it also earlier guessed at the truth value of $ϕ_{s_{n}}$ . When it guesses here, it loses some probability mass, but it can lose some of that probability mass in a way that is correlated to the answer it gave to $t_{n}$ . This way, it can still give the correct probability on $ϕ_{s_{A (n)}}$ .

Here is my fix: Instead of $L$ consider the case where we are trying to guess only sentences of the form $ϕ_{s_{A (n)}}$ and $t_{n}$ for some $n$ . Meaning we modify $L$ to reject any sentence not of that form. Both of these sub sequences are indistinguishable from coin flips with fixed probabilities. In this case, SIA will not get the correct probabilities on both subsequences, because it has an incentive to make its answers to $ϕ_{s_{A (n)}}$ match its answers to $t_{n}$ (not match when $ϕ_{s_{n}}$ is false), and any program that does not make them match will be trumped by one that will.

This does not mean that we have this property when we consider all of $L$ , but the code in no way depends on $E$ , and I see no reason to think that it will work for $L$ , but not the modified $L$ .
- abramdemski Jul 4, 2015, 11:26 PM
  LW: 1 AF: 1
  AF Parent
  I agree, this version works.
  
  To walk through it in a bit more detail:
  
  Now we are only considering two sentence schemas, $ϕ_{s_{A (n)}}$ and $t_{n} := ϕ_{s_{n}} \leftrightarrow ϕ_{n_{A (n)}}$ . (Also, ignore the (rare) case where $n$ is an Ackermann number.)
  
  I’ll call the Benford probability $B := \frac{1}{log 10}$ , and (as before) the $t_{n}$ probability $P := B^{2} + (1 - B)^{2}$ .
  
  At the time when $S I A$ considers $t_{n}$ , we assume it does not give its sampled programs enough time to solve either $ϕ_{s_{n}}$ or $ϕ_{n_{A (n)}}$ . (This assumption is part of the problem setup; it seems likely that cases like this cannot be ruled out by a simple rule, though.) The best thing the programs can do is treat the $t_{n}$ like coin flips with probability $P$ .
  
  At the time when $ϕ_{s_{A (n)}}$ is considered, the program has enough time to compute $ϕ_{s_{n}}$ (again as part of the problem setup). It can also remember what guess it made on $t_{n}$ . The best thing it can do now is to logically combine those to determine $ϕ_{s_{A (n)}}$ . This causes it to not treat $ϕ_{s_{A (n)}}$ like a random coin. For cases where $ϕ_{s_{n}} = t r u e$ , the population of sampled programs will guess $ϕ_{s_{A (n)}} = t r u e$ with frequency approaching $P$ . For cases where $ϕ_{s_{n}} = f a l s e$ , the frequency will be $1 - P$ .
  
  This behavior is the optimal response to the problem as given to $S I A$ , but is suboptimal for what we actually wanted. The Bayes score of $S I A$ on the sub-sequence consisting of only the $ϕ_{s_{A (n)}}$ is suboptimal. It will average out to probability $B$ , but continue to be higher and lower for individual cases, without actually predicting those cases more effectively; $S I A$ is acting like it thinks there is a correlation between $ϕ_{s_{n}}$ and $ϕ_{n_{A (n)}}$ when there is none. (This is especially odd considering that $S I A$ isn’t even being asked to predict $ϕ_{s_{n}}$ in general, in this case!)
  
  This is still not a proof, but it looks like it could be turned into one.
  
  I’m hoping writing it out like this unpacks some of the mutual assumptions Scott and I share as a result of talking about this.