Wei Dai comments on Versions of AIXI can be arbitrarily stupid

Wei Dai 11 Aug 2015 4:35 UTC
14 points
Interesting paper, but I’m not sure this example is a good way to illustrate the result, since if someone actually built AIXI using the prior described in the OP, it will quickly learn that it’s not in Hell since it won’t actually receive ε reward for outputting “0”.

Here’s my attempt to construct a better example. Suppose you want to create an agent that qualifies as an AIXI but keeps just outputting “I am stupid” for a very long time. What you do is give it a prior which assigns ε weight to a “standard” universal prior, and rest of the weight to a Hell environment which returns exactly the same (distribution of) rewards and inputs as the “standard” prior for outputting “I am stupid.” and 0 reward forever if the AIXI ever does anything else. This prior still qualifies as “universal”.

This AIXI can’t update away from its initial belief in the Hell environment because it keeps outputting “I am stupid” for which the Hell environment is indistinguishable from the real environment. If in the real world you keep punishing it (give it 0 reward), I think eventually this AIXI will do something else because its expected reward for outputting “I am stupid” falls below ε so risking almost certainty of the 0 reward forever of Hell for the ε chance of getting a better outcome becomes worthwhile. But if ε is small enough it may be impossible to punish AIXI consistently enough (i.e., it could occasionally get a non-zero reward due to cosmic rays or quantum tunneling) to make this happen.

I think one could construct similar examples for UDT so the problem isn’t with AIXI’s design, but rather that a prior being “universal” isn’t “good enough” for decision making. We actually need to figure out what the “actual”, or “right”, or “correct” prior is. This seems to resolve one of my open problems.
- Stuart_Armstrong 11 Aug 2015 11:12 UTC
  4 points
  Parent
  
  it will quickly learn that it’s not in Hell since it won’t actually receive ε reward for outputting “0”.
  
  The example was meant to show that if it was in Heaven, it will behave as if it was in Hell (now that’s a theological point there ^_^ ). Your example is more general.
  
  The result of the paper is that as long as the AIXI gets a minimum non-zero average reward (essentially), you can make it follow that policy forever.
- Squark 11 Aug 2015 19:22 UTC
  3 points
  Parent
  As I discussed before, IMO the correct approach is not looking for the one “correct” prior since there is no such thing but specifying a “pure learning” phase in AI development. In the case of your example, we can imagine the operator overriding the agent’s controls and forcing it to produce various outputs in order to update away from Hell. Given a sufficiently long learning phase, all universal priors should converge to the same result (of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of “good” universal priors).
  - Wei Dai 11 Aug 2015 23:45 UTC
    4 points
    Parent
    
    As I discussed before, IMO the correct approach is not looking for the one “correct” prior since there is no such thing but specifying a “pure learning” phase in AI development.
    
    I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
    
    In the case of your example, we can imagine the operator overriding the agent’s controls and forcing it to produce various outputs in order to update away from Hell.
    
    How do you know when you’ve forced the agent to explore enough? What if the agent has a prior which assigns a large weight to an environment that’s indistinguishable from our universe, except that lots of good things happen if the sun gets blown up? It seems like the agent can’t update away from this during the training phase.
    
    (of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of “good” universal priors)
    
    So you think “universal” isn’t “good enough”, but something more specific (but perhaps not unique as in “the correct prior” or “the right prior for me”) is? Can you try to define it?
    - Squark 12 Aug 2015 19:28 UTC
      2 points
      Parent
      
      I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
      
      This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
      - Wei Dai 13 Aug 2015 5:25 UTC
        2 points
        Parent
        But in that case, why do we need a special “pure learning” period where you force the agent to explore? Wouldn’t any prior that would qualify as “the right prior for me” or “my actual prior” not favor any particular universe to such an extent that it prevents the agent from exploring in a reasonable way?
        
        To recap, if we give the agent a “good” prior, then the agent will naturally explore/exploit in an optimal way without being forced to. If we give it a “bad” prior, then forcing it to explore during a pure learning period won’t help (enough) because there could be environments in the bad prior that can’t be updated away during the pure learning period and cause disaster later. Maybe if we don’t know how to define a “good” prior but there are “semi-good” priors which we know will reliably converge to a “good” prior after a certain amount of forced exploration, then a pure learning phase would be useful, but nobody has proposed such a prior, AFAIK.
        Squark 13 Aug 2015 18:45 UTC
        2 points
        Parent
        If we find a mathematical formula describing the “subjectively correct” prior P and give it to the AI, the AI will still effectively use a different prior initially, namely the convolution of P with some kind of “logical uncertainty kernel”. IMO this means we still need a learning phase.
        Wei Dai 13 Aug 2015 20:57 UTC
        2 points
        Parent
        In the post you linked to, at the end you mention a proposed “fetus” stage where the agent receives no external inputs. Did you ever write the posts describing it in more detail? I have to say my initial reaction to that idea is also skeptical though. Human don’t have a fetus stage where we think/learn about math with external inputs deliberately blocked off. Why do artificial agents need it? If an agent couldn’t simultaneously learn about math and process external inputs, it seems like something must be wrong with the basic design which we should fix instead of work around.
        Squark 14 Aug 2015 18:20 UTC
        2 points
        Parent
        I didn’t develop the idea, and I’m still not sure whether it’s correct. I’m planning to get back to these questions once I’m ready to use the theory of optimal predictors to put everything on rigorous footing. So I’m not sure we really need to block the external inputs. However, note that the AI is in a sense more fragile than a human since the AI is capable of self-modifying in irreversible damaging ways.
- Houshalter 13 Aug 2015 5:40 UTC
  1 point
  Parent
  There is no such thing as an “actual” or “right” or “correct” prior. A lot of the arguments for frequentist statistical methods were that bayesians require a subjective prior, and there is no way to make priors not subjective.
  
  What would it even mean for there to be a universal prior? You only exist in this one universe. How good a prior is, is simply how much probability it assigns to this universe. You could try to find a prior empirically, by testing different priors and seeing how well they fit the data. But then you still need a prior over those priors.
  
  But we can still pick a reasonable prior. Like a uniform distribution over all possible LISP programs, biased towards simplicity. If you use this as your prior of priors, then any crazy prior you can think of should have some probability. Enough that a little evidence should cause it to become favored.
  - Wei Dai 13 Aug 2015 6:11 UTC
    0 points
    Parent
    
    What would it even mean for there to be a universal prior?
    
    I have a post that may better explain what I am looking for.
    
    You only exist in this one universe. How good a prior is, is simply how much probability it assigns to this universe.
    
    This seems to fall under position 1 or 2 in my post. Currently my credence is mostly distributed between positions 3 and 4 in that post. Reading it may give you a better idea of where I’m coming from.
    - Houshalter 14 Aug 2015 4:40 UTC
      1 point
      Parent
      Position 1 or 2 is correct. 3 isn’t coherent; what is “reality fluid” and how can things be more “real” than other things. Where do subjective beliefs come from in this model? 4 has nothing to do with probability theory. Values and utility functions don’t enter into it. Probability theory is about making predictions and doing statistics, not how much you care about different worlds which may or may not actually exist.
      
      I interpret probability as expectation. I want to make predictions about things. I want to maximize the probability I assign to the correct outcomes. If I multiply all the predictions I ever made together, I want that number to be as high as possible (predictions of the correct outcome, that is.) That would the probability I gave to the world. Or at least my observations of it.
      
      So then it doesn’t really matter what the numbers represent. Just that I want them to be as high as possible. When I make decisions based on the numbers using some decision theory/algorithm and utility function, the higher the numbers are, the better my results will be.
      
      I’m reminded of someone’s attempt to explain probability without using words like “likely”, “certain” or “frequency”, etc. It was basically an impossible task. If I was going to attempt that, I would say something like the previous two paragraphs. Saying things like “weights”, “reality fluid”, “measure”, “possible world”, etc, just pushes the meaning elsewhere.
      
      In any case, all of your definitions should be mathematically equivalent. They might have philosophical implications, but they should all produce the same results on any real world problems. Or at least I think they should. You aren’t disputing Bayes theorem or standard probability theory or anything?
      
      In that case the choice of prior should have the same consequences. And you still want to choose the prior that you think will assign the actual outcome the highest probability.