Vladimir_Nesov comments on Metaphilosophical Mysteries

Vladimir_Nesov 29 Jul 2010 20:32 UTC
0 points
Why would you want to make any predictions at all? Predictions are not directly about value. It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.
- Wei Dai 29 Jul 2010 20:41 UTC
  2 points
  Parent
  
  It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.
  
  I think that’s right. I was making the point about prediction because Eliezer still seems to believe that predictions of sensory experience is somehow fundamental, and I wanted to convince him that the universal prior is wrong even given that belief.
  - Vladimir_Nesov 29 Jul 2010 20:44 UTC
    1 point
    Parent
    Still, universal prior does seem to be a universal way of eliciting what the human concept of prediction (expectation, probability) is, to the limit of our ability to train such a device, for exactly the reasons Eliezer gives: whatever is the concept we use, it’s in there, among the programs universal prior weights.
    
    ETA: On the other hand, the concept thus reconstructed would be limited to talk about observations, and so won’t be a general concept, while human expectation is probably more general than that, and you’d need a general logical language to capture it (and a language of unknown expressive power to capture it faithfully).
    
    ETA2: Predictions might still be a necessary concept to express the decisions that agent makes, to connect formal statements with what the agent actually does, and so express what the agent actually does as formal statements. We might have to deal with reality because the initial implementation of FAI has to be constructed specifically in reality.
    What links here?
    Vladimir_Nesov's comment on Metaphilosophical Mysteries by Wei Dai (1 Aug 2010 8:01 UTC; 0 points)
    - Wei Dai 29 Jul 2010 21:04 UTC
      1 point
      Parent
      Umm… what about my argument that a human can represent their predictions symbolically like “P(next bit is 1)=i-th bit of BB(100)” instead of using numerals, and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this? Or in other words, the only reason the standard proofs of Solomonoff prediction’s optimality go through is that they assume predictions are represented using numerals?
      - timtyler 31 Jul 2010 21:40 UTC
        1 point
        Parent
        Re: “what about my argument that a human can [adapt its razor a little] and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this?”
        
        There are at least two things “Solomonoff predictor” could refer to:
        
        An intelligent agent with Solomonoff-based priors;
        
        An agent who is wired to use a Solomonoff-based razor on their sense inputs;
        
        A human is more like the first agent. The second agent is not really properly intelligent—and adapts poorly to new environments.
      - LucasSloan 29 Jul 2010 21:21 UTC
        1 point
        Parent
        Humans are (can be represented by) turing machines. All halting turing machines are incorporated in AIXI. Therefore, anything that humans can do to more effectively predict something than a “mere machine” is already incorporated into AIXI.
        
        More generally, anything you represent symbolically can be represented using binary strings. That’s how that string you wrote got to me in the first place. You converted the turing operations in your head into a string of symbols, a computer turned that into a string of digits, my computer turned it back into symbols and my brain used computable algorithms to make sense of them. What makes you think that any of this is impossible for AIXI?
        Wei Dai 29 Jul 2010 21:36 UTC
        5 points
        Parent
        Am I going crazy, or did you just basically repeat what Eliezer, Cyan, and Nesov said without addressing my point?
        
        Do you guys think that you understand my argument and that it’s wrong, or that it’s too confusing and I need to formulate it better, or what? Everyone just seems to be ignoring it and repeating the standard party line....
        
        ETA: Now reading the second part of your comment, which was added after my response.
        
        ETA2: Clearly I underestimated the inferential distance here, but I thought at least Eliezer and Nesov would get it, since they appear to understand the other part of my argument about the universal prior being wrong for decision making, and this seems to be a short step. I’ll try to figure out how to explain it better.
        LucasSloan 29 Jul 2010 21:40 UTC
        1 point
        Parent
        If 4 people all think you’re wrong for the same reason, either you’re wrong or you’re not explaining yourself. You seem to disbelieve the first, so try harder with the explaining.
        SilasBarta 29 Jul 2010 22:00 UTC
        1 point
        Parent
        Didn’t stop 23+ people from voting up his article … (21 now; I and someone else voted it down)
        LucasSloan 29 Jul 2010 22:08 UTC
        0 points
        Parent
        Well, people expect him to be making good points, even when they don’t understand him (ie, I don’t understand UDT fully, but it seems to be important). Also, he’s advocating further thinking, which is popular around here.
        SilasBarta 29 Jul 2010 22:22 UTC
        9 points
        Parent
        
        Well, people expect him to be making good points, even when they don’t understand him
        
        And I really, really wish people would stop doing that, whether it’s for Wei_Dai or anyone else you deem to be smart.
        
        Folks, you may think you’re doing us all a favor by voting someone up because they’re smart, but that policy has the effect of creating an information cascade, because it makes an inference bounce back, accumulating arbitrarily high support irrespective of its relationship to reality.
        
        The content of a post or comment should screen off any other information about its value [1], including who made it.
        
        [1] except in obvious cases like when someone is confirming that something is true about that person specifically
        NancyLebovitz 30 Jul 2010 2:10 UTC
        3 points
        Parent
        Seconded. Please only vote up posts you both understand and approve of.
        Expand this thread
        Wei Dai 30 Jul 2010 4:16 UTC
        4 points
        Parent
        
        Please only vote up posts you both understand and approve of.
        
        I agree, but would like to point out that I don’t see any evidence that people aren’t already doing this. As far as I can tell, Lucas was only speculating that people voted up my post based on the author. Several other of my recent posts have fairly low scores, for example. (All of them advocated further thinking as well, so I don’t think that’s it either.)
        Unknowns 1 Aug 2010 13:23 UTC
        2 points
        Parent
        The fact that AIXI can predict that a human would predict certain things, does not mean that AIXI can agree with those predictions.
        LucasSloan 1 Aug 2010 20:05 UTC
        −1 points
        Parent
        In the limit, even if that one human is the only thing in all of the hypotheses that AIXI has under consideration, AIXI will be predicting precisely as that human does.
      - ocr-fork 29 Jul 2010 22:02 UTC
        0 points
        Parent
        
        Umm… what about my argument that a human can represent their predictions symbolically like “P(next bit is 1)=i-th bit of BB(100)” instead of using numerals, and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this?
        
        BB(100) is computable. Am I missing something?
        Wei Dai 29 Jul 2010 22:11 UTC
        1 point
        Parent
        
        BB(100) is computable. Am I missing something?
        
        Maybe… by BB I mean the Busy Beaver function Σ as defined in this Wikipedia entry.
        ocr-fork 29 Jul 2010 22:14 UTC
        0 points
        Parent
        Right, and...
        
        A trivial but noteworthy fact is that every finite sequence of Σ values, such as Σ(0), Σ(1), Σ(2), …, Σ(n) for any given n, is computable, even though the infinite sequence Σ is not computable (see computable function examples.
        
        So why can’t the universal prior use it?
        Wei Dai 29 Jul 2010 22:30 UTC
        5 points
        Parent
        Sorry, I should have used BB(2^100) as the example. The universal prior assigns the number BB(2^100) a very small weight, because the only way to represent it computably is by giving a 2^100 state Turing machine. A human would assign it a much larger weight, referencing it by its short symbolic representation.
        
        Until I write up a better argument, you might want to (assuming you haven’t already) read this post where I gave a decision problem that a human does (infinitely) better than AIXI.
        LucasSloan 29 Jul 2010 22:41 UTC
        2 points
        Parent
        I don’t think I understood that fully, but there seems to be a problem with your theory. The human gets to start in the epistemically advantaged position of knowing that the game is based on a sequence of busy beavers and knowing that they are a very fast growing function. AIXI is prevent from knowing this information and has to start as if from a blank canvas. The reason we use a Occamian prior for AIXI is because we refuse to tailor it to a specific environment, if your logic is sound, then yes, it does do worse when it is dropped into an environment where it is paired with a human with an epistemic advantage, but it would beat the human across the space of possible worlds.
        
        Another problem you seem to have is to assume that the only hypothesis in the entire set that gives useful predictions is the hypothesis which is, in fact, correct. There are plenty of other function which correctly predict arbitrarily large numbers of 1′s, with much less complexity, which can give the overall probability weighting that AIXI is using a usefully correct model of its universe, if not a fully correct one.
        Wei Dai 29 Jul 2010 22:56 UTC
        4 points
        Parent
        How a human might come to believe, without being epistemically privileged, that a sequence is probably a sequence of busy beavers, is a deep problem, similar to the problem of distinguishing halting oracles from impostors. (At least one mathematical logician who has thought deeply about the latter problem thinks that it’s doable.)
        
        But in any case, the usual justification for AIXI (or adopting the universal prior) is that (asymptotically) it does as well as or better than any computable agent, even one that is epistemically privileged, as long as the environment is computable. Eliezer and others were claiming that it does as well as or better than any computable agent, even if the environment is not computable, and this is what my counter-example disproves.
        LucasSloan 29 Jul 2010 23:43 UTC
        0 points
        Parent
        So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe? Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
        Expand this thread
        Wei Dai 30 Jul 2010 0:30 UTC
        0 points
        Parent
        
        So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe?
        
        Yes.
        
        Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
        
        This seems pretty easy, given the same level of raw computing power available to AIXI (otherwise the human gets screwed in the majority of cases simply because he doesn’t have enough computing power).
        
        For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
        
        How to make an optimal decision algorithm (as opposed to just improving upon AIXI) is still an open problem.
        LucasSloan 30 Jul 2010 1:33 UTC
        0 points
        Parent
        
        For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
        
        This is what I dislike about your logic. You create a situation where (you think) AIXI fails, but you fail to take into account the likelihood of being in the situation versus being in a similar situation. I can easily see a human seeing a long series of ones, with some zeros at the beginning, saying “ah-ha, this must be the result of a sequence of busy beavers”, when all he’s actually seeing is 3^^^3 minus his telephone number or something. AIXI can lose in really improbable universes, because it’s designed to work in the space of universes, not some particular one. By modifying the rules, you can make it better in specific universes, but only by reducing its performance in similar seeming universes.
        ocr-fork 29 Jul 2010 23:12 UTC
        0 points
        Parent
        
        What about the agent using Solomonoff’s distribution? After seeing BB(1),...,BB(2^n), the algorithmic complexity of BB(1),...,BB(2^n) is sunk, so to speak. It will predict a higher expected payoff for playing 0 in any round i where the conditional complexity K(i | BB(1),...,BB(2^n)) < 100. This includes for example 2BB(2^n), 2BB(2^n)+1, BB(2^n)^2 * 3 + 4, BB(2^n)^^^3, etc. It will bet on 0 in these rounds (erroneously, since K(BB(2^(n+1)) | BB(2^n)) > 100 for large n), and therefore lose relative to a human.
        
        I don’t understand how the bolded part follows. The best explanation by round BB(2^n) would be “All 1′s except for the Busy Beaver numbers up to 2^n”, right?
        Expand this thread
        Wei Dai 29 Jul 2010 23:35 UTC
        1 point
        Parent
        Yes, that’s the most probable explanation according to the Solomonoff prior, but AIXI doesn’t just use the most probable explanation to make decisions, it uses all computable explanations that haven’t been contradicted by its input yet. For example, “All 1′s except for the Busy Beaver numbers up to 2^n and 2BB(2^n)” is only slightly less likely than “All 1′s except for the Busy Beaver numbers up to 2^n” and is compatible with its input so far. The conditional probability of that explanation, given what it has seen, is high enough that it would bet on 0 at round 2BB(2^n), whereas the human wouldn’t.
        ocr-fork 30 Jul 2010 0:23 UTC
        0 points
        Parent
        Oh.
        
        I feel stupid now.
        
        EDIT: Wouldn’t it also break even by predicting the next Busy Beaver number? “All 1′s except for BB(1...2^n+1)” is also only slightly less likely. EDIT: I feel more stupid.
        Wei Dai 30 Jul 2010 0:40 UTC
        0 points
        Parent
        The next number in the sequence is BB(2^(n+1)), not BB(2^n+1).
        
        ETA: In case more explanation is needed, it takes O(2^n) more bits to computably describe BB(2^(n+1)), even if you already have BB(2^n). (It might take O(2^n) more bits to describe BB(2^n+1) as well, but I wasn’t sure so I used BB(2^(n+1)) in my example instead.)
        
        Since K(BB(2^(n+1)) | BB(2^n)) > 100 for large n, AIXI actually will not bet on 0 when BB(2^(n+1) comes around, and all those 0s that it does bet on are simply “wasted”.
        ocr-fork 30 Jul 2010 1:28 UTC
        0 points
        Parent
        
        it might take O(2^n) more bits to describe BB(2^n+1) as well, but I wasn’t sure so I used BB(2^(n+1)) in my example instead.
        
        You can find it by emulating the Busy Beaver.
        timtyler 31 Jul 2010 22:13 UTC
        −1 points
        Parent
        BB(100) is computable—and BB(2^100) is computable too :-(
  - timtyler 31 Jul 2010 21:33 UTC
    −1 points
    Parent
    Surely predictions of sensory experience are pretty fundamental. To understand the consequences of your actions, you have to be able to make “what-if” predictions.
- timtyler 31 Jul 2010 21:30 UTC
  0 points
  Parent
  Re: “It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.”
  
  You can hardly steer yourself effectively into the future if you don’t have an understanding of the consequences of your actions.
  - Vladimir_Nesov 1 Aug 2010 8:01 UTC
    0 points
    Parent
    
    You can hardly steer yourself effectively into the future if you don’t have an understanding of the consequences of your actions.
    
    Yes, it might be necessary exactly for that purpose (though consequences don’t reside just in the “future”), but I don’t understand this well enough to decide either way.
    - timtyler 1 Aug 2010 8:51 UTC
      −6 points
      Parent
      I checked with the dictionary. It had:
      
      the effect, result, or outcome of something occurring earlier: The accident was the consequence of reckless driving.
      
      an act or instance of following something as an effect, result, or outcome.
      
      http://dictionary.reference.com/browse/consequence
      
      Consequences not being in the future seems to be a curious concept to me—though I understand that Feynman dabbled with the idea on sub-microscopic scales.
      - Vladimir_Nesov 1 Aug 2010 8:59 UTC
        0 points
        Parent
        I think we’ve got it covered with Newcomb’s Problem (consequences in the past) and Counterfactual Mugging (consequences in another possible world). And there is still greater generality with logical consequences.
        timtyler 1 Aug 2010 9:09 UTC
        2 points
        Parent
        FWIW, I wouldn’t classify Newcomb’s Problem as having to do with “consequences in the past” or Counterfactual Mugging as having to do with “consequences in another possible world”.
        
        For me, “consequences” refers to the basic cause-and-effect relationship—and consequences always take place downstream.
        
        Anticipating something doesn’t really mean that the future is causally affecting the past. If you deconstruct anticipation, it is all actually based on current and previous knowledge.
        Vladimir_Nesov 1 Aug 2010 9:18 UTC
        2 points
        Parent
        You are arguing definitions (with the use of a dictionary, no less!). The notion of consequences useful for decision theory is a separate idea from causality of physics.
        timtyler 1 Aug 2010 9:27 UTC
        2 points
        Parent
        Is “consequences” really a good term for what you are talking about?
        
        It seems as though it is likely to cause confusion to me.
        
        Does anyone else use the term in this way?