Cole Wyeth comments on Would AIXI protect itself?

Cole Wyeth 1 Sep 2024 18:15 UTC
1 point
0
I am confused that this has been heavily downvoted, it seems to be straightforwardly true insofar as it goes. While it doesn’t address the fundamental problems of embeddedness for AIXI, and the methods described in the comment would not suffice to teach AIXI to protect its brain in the limit of unlimited capabilities, it seems quite plausible that an AIXI approximation developing in a relatively safe environment with pain sensors, repaired if it causes harm to its actuators, would have a better chance at learning to protect itself in practice. In fact, I have argued that with a careful definition of AIXI’s off-policy behavior, this kind of care may actually be sufficient to teach it to avoid damaging its brain as well.
- habryka 1 Sep 2024 21:26 UTC
  2 points
  0
  Parent
  I think in the original formulation, this indeed would not do anything (because AIXI is deeply cartesian about information about itself).
  I haven’t looked into the off-policy behavior definition that you suggested in your post.
  - Cole Wyeth 1 Sep 2024 21:58 UTC
    1 point
    0
    Parent
    Since both objections have been pointers to the definition, I think it’s worth noting that I am quite familiar with the definition(s) of AIXI; I’ve read both of Hutter’s books, the second one several times as it was drafted.
    
    Perhaps there is some confusion here about the boundaries of an AIXI implementation. This is a little hard to talk about because we are interested in “what AIXI would do if...” but in fact the embeddedness questions only make sense for AIXI implemented in our world, which would require it to be running on physical hardware, which means in some sense it must be an approximation (though perhaps we can assume that it is a close enough approximation it behaves almost exactly like AIXI). I am visualizing AIXI running inside a robot body. Then it is perfectly possible for AIXI to form accurate beliefs about its body, though in some harder-to-understand sense it can’t represent the possibility that it is running on the robots hardware. AIXI’s cameras would show its robot body doing things when it took internal actions—if the results damaged the actuators AIXI would have more trouble getting reward, so would avoid similar actions in the future (this is why repairs and some hand-holding before it understood the environment might be helpful). Similarly, pain signals could be communicated to AIXI as negative (or lowered positive) rewards, and it would rapidly learn to avoid them. It’s possible that an excellent AIXI approximation (with a reasonable choice of UTM for its prior) would rapidly figure out what was going on and wouldn’t need any of these precautions to learn to protect its body—but it seems clear to me that they would at least improve AIXI’s chances of success early in life.
    
    With that said, the prevailing wisdom that AIXI would not protect its brain may well be correct, which is why I suggested the off-policy version. This flaw would probably lead to AIXI destroying itself eventually, if it became powerful enough to plan around its pain signals. What I object to is only the dismissal/disagreement with @moridinamael’s comment, though it seems to me to be directionally correct and not to make overly strong claims.
    - habryka 1 Sep 2024 22:19 UTC
      2 points
      0
      Parent
      Yeah, I think that’s a reasonable complaint about the voting.
      My best guess is you are probably steelmanning moridinamael’s comment too much. I think there is a common cognitive attractor where people confuse both AIs in general and especially idealized reasoners with human child-rearing, and LessWrong has a lot of (justified) antibodies against that cognitive attractor.
      I genuinely don’t know whether that was also the generator of moridinamael’s comment. It’s plausible he was a false-positive on the site’s “this person isn’t sufficiently distinguishing between AI cognition and human cognition” detector, but I am broadly in favor of having that detector and having it lead to content being downvoted (we really get a lot of comments where people talk about “raising an AI” in ways that really doesn’t understand the reasons why humans need to be raised, and the different dynamics of knowledge transfer between AI systems, and where people confusedly think that “raising AI systems like children” will somehow teach them to be moral).