gwern comments on Personal Evidence—Superstitions as Rational Beliefs

gwern 26 Mar 2013 14:02 UTC
−2 points

Likewise, the winner of the lottery observes the same number twice, which is some evidence for various crazy hypotheses where the selection of “I” necessarily coincides with the winner.

In my example of two worlds, the odds of observing the observed evidence is the same in both worlds and so there is no update.

What set of worlds are you postulating for your “two numbers” example? Because your example, as far as I understand it, doesn’t seem at all analogous.
- private_messaging 26 Mar 2013 17:12 UTC
  2 points
  Parent
  I’m talking specifically about supernatural explanations for you winning the lottery, I don’t see either why people opt for supernatural explanations for haunting.
  
  Suppose we do something like Solomonoff induction. Dealing with codes that match observations verbatim. There’s a theory that reads bits off the tape to produce the ticket number, then more bits to produce the lottery draw, and there’s a theory that reads bits off the tape and produces both numbers as equal. Suppose the lottery has the size of 2^20, about 1 million. Then the former theory will need 40 lucky bits to match the observation, whereas the latter theory will need only 20 lucky bits to match the observation. For mostly everyone the latter theory will be eliminated, except the lottery winner, for who it will linger, and now, with the required lucky bits, the difference in length between the theories will decrease by 20 bits. S.I. - using learning agent (AIXI and variations of it) which won the lottery will literally expect higher probability of victory on next lottery, because it didn’t eliminate various “I always win” hypotheses. edit: and indeed, given sufficiently big N, the extra code required for “I always win” hack will be smaller than log2(N) so it may well become the dominant hypothesis after a single victory. Things like S.I. are only guaranteed to be eventually correct for almost everyone; if there’s enough instances, the wrongmost ones can be arbitrarily wrong.
  
  At the end of the day it’s just how the agents learn—if you were constantly winning lotteries, at some point you would start believing you got supernatural powers, or MWI is true plus the consciousness preferentially transfers specifically to the happy winner, or the like. Any learning agent is subject to risk of learning wrong things.
  
  edit: more concise explanation: if you choose a person by some unknown method, and then they win the lottery, that’s distinct from you not choosing some person, then someone winning the lottery. Namely, in the former case you got evidence in favour of the hypothesis that “unknown method” picks lottery winners. For a lottery winner, their place in the world was chosen by some unknown method.
  - gwern 29 Mar 2013 20:54 UTC
    −2 points
    Parent
    So let’s see if I’m understanding you here.
    
    You treat a lottery output as a bitstring and ask about SI on it. We can imagine a completely naive agent with no previous observations; what will this ignorant predict? Well, it seems reasonable that one of the top predictions will be for the initial bitstring to be repeated; this seems OK by Occam’s razor (events often repeating are necessary for induction) and I understand that empirically investigating simple Turing machines that many (most? all?) terminating programs will repeat output. It will definitely rank the ‘sequence repeats’ hypotheses above that of possible PRNGs, or very complex physical theories encompassing atmospheric noise and balls dropping into baskets etc.
    
    So far, so good.
    
    I think I lose you when you go on to talk about inferring that you will always win and stuff like that. The repeating hypotheses aren’t contingent on who they happen to. If the particular bitstring emitted by the lottery had also included ‘...and this number was picked by Jain Farstrider’, then SI would seem to then also predict that this Jain will win the next one as well, by the same repeating logic. It certainly will not predict that the agent will win, and the hypothesis ‘the agent (usually) wins’ will drop.
    
    Remember that my trichotomy was that you need to either 1) invoke anthropics; 2) break Aumann via something like dishonesty/incompetence; or 3) you actually do have communicable knowledge.
    
    These SI musings doesn’t seem to invoke anthropics or break Aumannian requirements, and looking at them, they seem communicable. ‘AIXI-MC-MML*, why do you think Jain will win the lottery a second time?’ ‘[translated from minimum-message-length model+message] Well, he won it last time and since I am ignorant of everything in the world, it seems reasonable that he will win it again’. ‘Hmm, that’s a good point.’ And ditto if AIXI-MC-MML happened to be the beneficiary.
    
    * I bring up minimum-message length because Patrick Robotham is supposed to be working on a version of AIXI-MC using MML so one would be able to examine the model of the world(s) a program has devised so far and so one could potentially ask ‘why’ it is making the predictions it is. Having a comprehensible approximation of SI would be pretty convenient for discussing what SI would or would not do.
    - private_messaging 30 Mar 2013 17:55 UTC
      2 points
      Parent
      
      It will definitely rank the ‘sequence repeats’ hypotheses above that of possible PRNGs,
      
      It doesn’t need PRNGs. The least confusing description of S.I. is as following: the probability of a sequence S is the probability that an universal prefix Turing machine with 3 tapes: input tape which can only be read from, head only advanced in one direction, work tape which can be read from and written to, and is initialized with zeroes, and output tape that can only be written to, will output the sequence S when fed a never-ending string of random bits on the input tape.
      
      The head has such rule set that the program can be loaded via the input tape, and then the program can use the input tape as source of data. This is important because a program can then set up an interpreter emulating other Turing machine (which ensures a constant bound on difference between length of code for different machines).
      
      (We predict using conditional probability—if the machine outputs sequence matching the previous observations, what is the probability that it will produce specific future observations) .
      
      So if we are predicting, for example, perfect coin flips, an input string which begins with code that sets up the working tape so that it will subsequently relay random bits from input to the output, does the trick. This code requires the bits on the input tape to match the observation, meaning that for each observed bit, the length of the input string which has to be correct grows by 1 bit.
      
      Meanwhile a code that sets up the machine to output repeating zeroes does not require any more bits on the input tape to be correct. So when you are getting repeated zeroes, the code relaying random bits is being lowered in weight by factor of 2 with each observed bit, whereas the theory outputting zeroes stays the same (until, of course, you encounter a non zero and it is eliminated).
      
      For more information, see referenced papers in
      
      http://www.scholarpedia.org/article/Algorithmic_probability
      
      I think I lose you when you go on to talk about inferring that you will always win and stuff like that. The repeating hypotheses aren’t contingent on who they happen to. If the particular bitstring emitted by the lottery had also included ‘...and this number was picked by Jain Farstrider’, then SI would seem to then also predict that this Jain will win the next one as well, by the same repeating logic.
      
      You scratched your ticket and you seen a number. Correct codes have to match the number on the ticket and the number winning the lottery. Some use same string of input bits to match both, some use different pieces of input string.
      
      (I am assuming that S.I. can not precisely predict the lottery. Even assuming a completely deterministic universe, light from the distant stars, incoming cosmic rays, all of that incoming information ends up mixed in the grand hash of thermal noise and thermal fluctuations)
      
      edit: to make it clearer. Suppose that the lottery has 1000 decimal digits; you scratch one ticket; then later, the winning number is announced, and it matches your ticket. You will conclude that the lottery was rigged, with very good confidence, won’t you? In absence of some rather curious anthropic reasoning, existence or non existence of 10^1000 −1 other tickets, or other conscious players, is entirely irrelevant (and in presence of anthropics you have to figure out which ancestors of h. sapiens will change your answer and which won’t). With regards to Aumann’s agreement theorem, other people would agree that if they were in your shoes (shared the data and the priors) they’d arrive at same conclusions, so it is not at all violated.
- Eugine_Nier 28 Mar 2013 4:33 UTC
  0 points
  Parent
  The point is that if the lottery is biased it’s more likely to be biased in such a way that the same number repeats.