Three anonymous AIXI specialists commented on this post and its follow-up. Here’s a response from one of them:
The original post contains three main claims that I write something about below. They are not really about the incomputability of AIXI in the sense that it cannot be run, but says that even if that would be fine that there are problems. The reasons they bring up computable versions are different and relate to being in the hypothesis class. I reject the need for that for the discussed questions and, therefore, do not mention it. There is also some mixing of the notions of Solomonoff induction and AIXI. The agent that is brought up is AIXI while the guarantees mentioned are for sequence prediction.
Setting: AIXI is run on some new fantastic chip in a robot in the world. The robot has some sensors by which it makes observations and actuators affected by AIXI’s actions. There is also a reward chip in the robot that e.g. through a radio signal receives numerical rewards.
It is claimed that AIXI cannot learn that it exists and can die and will do stupid things because of this. The author places some importance on the fact that there should be no more observations but there is no real distinction between that and receiving a fixed observation “black” or “nothing” and a fixed reward which can be the worst possible or perhaps neutral (all computable options are considered possible by AIXI). That we refer to some scenarios as death is actually a bit arbitrary. What AIXI cannot do is to fully understand its own workings. This seems to also apply to human agents, to completely grasp every detail of our complex minds I believe is beyond our capacity. To fully understand one self, seems impossibility for every agent though I do not have a formalization of this claim (Goedel somehow?not sure). What we can do is to understand how our observations (including rewards) behave depending on previous observations and actions. AIXI is doing precisely that. That AIXI might not have learnt that dropping an anvil on itself is bad before it does it, is something that applies to any learning agent including humans. It does not depend on knowing one’s own algorithm.
AIXI might learn that the best ways to get maximal reward is to hack its reward signal, i.e. like humans doing drugs. Indeed it is acknowledged in the post that from AIXI’s point of view this is fine, at least if the robot is not soon destroyed by it. AIXI might or might not have learnt something about how likely destruction is, by having drawn conclusions about how the world works and how observations and rewards are affected over time by its actions. The same is true for any agent. The important question raised is if the designer that controls the external reward signal can still get it to do what it wants. Initially they will, because like adults teach children, they can teach AIXI by giving very bad reward (and by deceiving) if it tries to explore something of that sort or anything dangerous and it might over time also be taught how to do some dangerous things safely. AIXI is not guaranteed eventual near-optimality in every possible environment which is actually a blessing from this perspective, so there are strategies to make this work indefinitely but it might be very hard in practice over long periods of time. However, unlike human teens, AIXI is not more explorative than what is strictly rational so perhaps the task is easier than parenting. It might be possible to control until it has sufficiently settled on a view of the world that coincides with what it sees but would not if it performed other actions. The more explorative versions like optimistic-AIXI would be much worse to control. Reversing the definition of optimistic-AIXI to pessimistic-AIXI might actually define an agent that is really easy to control while still very capable, though requiring being pushed around a bit to do something.
It is claimed that AIXI cannot reason about modifications of that chip which it runs on. The AIXI formulation assumes that the actions will always be according to the chosen policy. However, if we modify environment A after time 718 such that choosing action a_1 is having the effect of a_2 instead, then we have defined another environment. AIXI will be of the opinion it took action a_1, it might just be that the right arm instead of the left is stretched out and AIXI observes this. If the environment can only perform a computational modification of the chip in the sense that it results in a computable transformation of the actions, then this is just another computable environment. Such transformations are implicitly part of the hypothesis class. Which hypothesis is the true one has not changed. If the transformation is completely incomputable, then this cannot be understood. This also applies to computable agents. The same reasoning applies to any changes to its sensors, if the transformation of the observations is computable AIXI is fine. If they are not, everyone is screwed. When it comes to upgrades, there are none needed for the fantastic chip that can compute the completely incomputable, but finding out that it can change the rest of the robot is clearly possible. In this paragraph it is useful to think of AIXI as a policy chosen at time t=0 and then followed. Thus the AIXI agent should be able to upgrade the rest of the robot without issue, unless the reward designer is intentionally trying to keep it from doing so.
The commenter also writes, “The original post is cleaner but also more naive as the author now acknowledges. Some of the answers that I give seems to have been provided to the author in some form since his new dialogue contains things relating to them. I write mine as a response to the original post since I believe that one should state the refutations and clarifications first before muddling stuff further.” The follow-up mock dialogue can be found here.
Three anonymous AIXI specialists commented on this post and its follow-up. Here’s a response from one of them:
The commenter also writes, “The original post is cleaner but also more naive as the author now acknowledges. Some of the answers that I give seems to have been provided to the author in some form since his new dialogue contains things relating to them. I write mine as a response to the original post since I believe that one should state the refutations and clarifications first before muddling stuff further.” The follow-up mock dialogue can be found here.