This is probably true; AIXI does take a mixture of dualistic environments and assumes it is not part of the environment. However, I have never seen the “anvil problem” argued very rigorously—we cannot assume AIXI would learn to protect itself, but that is not a proof that it will destroy itself. AIXI has massive representational power and an approximation to AIXI would form many accurate beliefs about its own hardware, perhaps even concluding that its hardware implements an AIXI approximation optimizing its reward signal (if you doubt this see point 2). Would it not then seek to defend this hardware as a result of aligned interests? The exact dynamics at the “Cartesian boundary” where AIXI sees its internal actions effect the external world are hard to understand, but just because they seem confusing to us (or at least me) does not mean AIXI would necessarily be confused or behave defectively (though since it would be inherently philosophically incorrect, defective behavior is a reasonable expectation). Some arguments for the AIXI problem are not quite right on a technical level, for instance see “Artificial General Intelligence and the Human Mental Model”:
”Also, AIXI, and Legg’s Universal Intelligence Measure which it optimizes, is incapable of taking the agent itself into account. AIXI does not “model itself” to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc. (Daniel Dewey, personal communication, Aug. 22, 2011; see also Dewey 2011)”
This (and the rest of the chapter’s description of AIXI) is pretty accurate, but there’s a technical sense in which AIXI does not “assume the maximizing action will always be chosen.” Its belief distribution is a semimeasure, which means it represents the possibility that the percept stream may end, terminating the history at a finite time. This is sometimes considered as “death.” Note that I am speaking of the latest definition of AIXI that uses a recursive value function—see the section of Jan Leike’s PhD thesis on computability levels. The older iterative value function formulation has worse computability properties and really does assume non-termination, so the chapter I quoted may only be outdated and not mistaken.
Very likely false, at least for some AIXI approximations probably including reasonable implementations of AIXItl. AIXI uses a mixture over probabilistic environments, so it can model environments that are too complicated for it to predict optimally as partially uncertain. That is, probabilities can and will effectively be used to represent logical as well as epistemic uncertainty. A toy AIXI approximation that makes this easy to see is one that performs updating only on the N simplest environments (lets ignore runtime/halting issues for the moment—this is reasonable-ish because AIXI’s environments are all at least lower semicomputable). This approximation would place greater and greater weight on the environment that best predicts the percept stream, even if it doesn’t do so perfectly perhaps because some complicated events are modeled as “random.” The dynamics of updating the universal distribution in a very complicated world are an interesting research topic which seems under or even unexpolored as I write this! Here is a (highly esoteric) discussion of this point as it concerns a real approximation to the universal distribution.
It’s true that if we had enough compute to implement a good AIXI approximation, its world would also include lots of hard-to-compute things, possibly including other AIXI approximations, so it need not rapidly become a singleton. But this would not prevent it from being “a working AI.”
This is right, but not really magical—AIXItl only outperforms the class of algorithms with proofs of good performance (in some axiomatic system). If I remember correctly, this class doesn’t include AIXItl itself!
This is probably true; AIXI does take a mixture of dualistic environments and assumes it is not part of the environment. However, I have never seen the “anvil problem” argued very rigorously—we cannot assume AIXI would learn to protect itself, but that is not a proof that it will destroy itself. AIXI has massive representational power and an approximation to AIXI would form many accurate beliefs about its own hardware, perhaps even concluding that its hardware implements an AIXI approximation optimizing its reward signal (if you doubt this see point 2). Would it not then seek to defend this hardware as a result of aligned interests? The exact dynamics at the “Cartesian boundary” where AIXI sees its internal actions effect the external world are hard to understand, but just because they seem confusing to us (or at least me) does not mean AIXI would necessarily be confused or behave defectively (though since it would be inherently philosophically incorrect, defective behavior is a reasonable expectation). Some arguments for the AIXI problem are not quite right on a technical level, for instance see “Artificial General Intelligence and the Human Mental Model”:
”Also, AIXI, and Legg’s Universal Intelligence Measure which it optimizes, is incapable of taking the agent itself into account. AIXI does not “model itself” to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc. (Daniel Dewey, personal communication, Aug. 22, 2011; see also Dewey 2011)”
This (and the rest of the chapter’s description of AIXI) is pretty accurate, but there’s a technical sense in which AIXI does not “assume the maximizing action will always be chosen.” Its belief distribution is a semimeasure, which means it represents the possibility that the percept stream may end, terminating the history at a finite time. This is sometimes considered as “death.” Note that I am speaking of the latest definition of AIXI that uses a recursive value function—see the section of Jan Leike’s PhD thesis on computability levels. The older iterative value function formulation has worse computability properties and really does assume non-termination, so the chapter I quoted may only be outdated and not mistaken.
See also my proposed off-policy definition of AIXI that should deal with brain surgery reasonably.
Very likely false, at least for some AIXI approximations probably including reasonable implementations of AIXItl. AIXI uses a mixture over probabilistic environments, so it can model environments that are too complicated for it to predict optimally as partially uncertain. That is, probabilities can and will effectively be used to represent logical as well as epistemic uncertainty. A toy AIXI approximation that makes this easy to see is one that performs updating only on the N simplest environments (lets ignore runtime/halting issues for the moment—this is reasonable-ish because AIXI’s environments are all at least lower semicomputable). This approximation would place greater and greater weight on the environment that best predicts the percept stream, even if it doesn’t do so perfectly perhaps because some complicated events are modeled as “random.” The dynamics of updating the universal distribution in a very complicated world are an interesting research topic which seems under or even unexpolored as I write this! Here is a (highly esoteric) discussion of this point as it concerns a real approximation to the universal distribution.
It’s true that if we had enough compute to implement a good AIXI approximation, its world would also include lots of hard-to-compute things, possibly including other AIXI approximations, so it need not rapidly become a singleton. But this would not prevent it from being “a working AI.”
This is right, but not really magical—AIXItl only outperforms the class of algorithms with proofs of good performance (in some axiomatic system). If I remember correctly, this class doesn’t include AIXItl itself!