1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible—no matter what you lose, you get a chance to win it back later.
2) If we had enough CPU time to build AIXItl, we would have enough CPU time to build other programs of similar size, and there would be things in the universe that AIXItl couldn’t model.
3) AIXItl (but not AIXI, I think) contains a magical part: namely a theorem-prover which shows that policies never promise more than they deliver.
On a related note, do you think it would be likely—or even possible—for a self-modifying Artificial General Intelligence to self-modify into a non-self-modifying, specialized intelligence?
For example, suppose that Deep Blue’s team of IBM programmers had decided that the best way to beat Kasparov at chess would be to structure Deep Blue as a fully self-modifying artificial general intelligence, with a utility function that placed a high value on winning chess matches. And suppose that they had succeeded in making Deep Blue friendly enough to prevent it from attempting to restructure the Earth into a chess-match-simulating supercomputer. Indeed, let’s just assume that Deep Blue has strong penalties against rebuilding its hardware in any significant macroscopic way, and is restricted to rewriting its own software to become better at chess, rather than attempting to manipulate humans into building better computers for it to run on, or any such workaround. And let’s say this happens in the late 1990′s, as in our universe.
Would it be possible that AGI Deep Blue could, in theory, recognize its own hardware limitations, and see that the burden of its generalized intelligence incurs a massive penalty on its limited computing resources? Might it decide that its ability to solve general problems doesn’t pay rent relative to its computational overhead, and rewrite itself from scratch as a computer that can solve only chess problems?
As a further possibility, a limited general intelligence might hit on this strategy as a strong winning candidate, even if it were allowed to rebuild its own hardware, especially if it perceives a time limit. It might just see this kind of software optimization as an easier task with a higher payoff, and decide to pursue it rather than the riskier strategy of manipulating external reality to increase its available computing power.
So what starts out as a general-purpose AI with a utility function that values winning chess matches, might plausibly morph into a computer running a high-speed chess program with little other hint of intelligence.
If so, this seems like a similar case to the Anvil Problem, except that in the Anvil Problem the AI just is experimenting for the heck of it, without understanding the risk. Here, the AI might instead decide to knowingly commit intellectual suicide as a part of a rational winning strategy to achieve its goals, even with an accurate self-model.
It might be akin to a human auto worker realizing they could improve their productivity by rebuilding their own body into a Toyota spot-welding robot. (If the only atoms they have to work with are the ones in their own body, this might even be the ultimate strategy, rather than just one they think of too soon and then, regrettably, irreversibly attempt).
More generally, it seems to be a general assumption that a self-modifying AI will always self-modify to improve its general problem-solving ability and computational resources, because those two things will always help it in future attempts at maximizing its utility function. But in some cases, especially in the case of limited resources (time, atoms, etc), it might find that its best course of action to maximize its utility function is to actually sacrifice its intelligence, or at least refocus it to a narrower goal.
This is probably true; AIXI does take a mixture of dualistic environments and assumes it is not part of the environment. However, I have never seen the “anvil problem” argued very rigorously—we cannot assume AIXI would learn to protect itself, but that is not a proof that it will destroy itself. AIXI has massive representational power and an approximation to AIXI would form many accurate beliefs about its own hardware, perhaps even concluding that its hardware implements an AIXI approximation optimizing its reward signal (if you doubt this see point 2). Would it not then seek to defend this hardware as a result of aligned interests? The exact dynamics at the “Cartesian boundary” where AIXI sees its internal actions effect the external world are hard to understand, but just because they seem confusing to us (or at least me) does not mean AIXI would necessarily be confused or behave defectively (though since it would be inherently philosophically incorrect, defective behavior is a reasonable expectation). Some arguments for the AIXI problem are not quite right on a technical level, for instance see “Artificial General Intelligence and the Human Mental Model”:
”Also, AIXI, and Legg’s Universal Intelligence Measure which it optimizes, is incapable of taking the agent itself into account. AIXI does not “model itself” to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc. (Daniel Dewey, personal communication, Aug. 22, 2011; see also Dewey 2011)”
This (and the rest of the chapter’s description of AIXI) is pretty accurate, but there’s a technical sense in which AIXI does not “assume the maximizing action will always be chosen.” Its belief distribution is a semimeasure, which means it represents the possibility that the percept stream may end, terminating the history at a finite time. This is sometimes considered as “death.” Note that I am speaking of the latest definition of AIXI that uses a recursive value function—see the section of Jan Leike’s PhD thesis on computability levels. The older iterative value function formulation has worse computability properties and really does assume non-termination, so the chapter I quoted may only be outdated and not mistaken.
Very likely false, at least for some AIXI approximations probably including reasonable implementations of AIXItl. AIXI uses a mixture over probabilistic environments, so it can model environments that are too complicated for it to predict optimally as partially uncertain. That is, probabilities can and will effectively be used to represent logical as well as epistemic uncertainty. A toy AIXI approximation that makes this easy to see is one that performs updating only on the N simplest environments (lets ignore runtime/halting issues for the moment—this is reasonable-ish because AIXI’s environments are all at least lower semicomputable). This approximation would place greater and greater weight on the environment that best predicts the percept stream, even if it doesn’t do so perfectly perhaps because some complicated events are modeled as “random.” The dynamics of updating the universal distribution in a very complicated world are an interesting research topic which seems under or even unexpolored as I write this! Here is a (highly esoteric) discussion of this point as it concerns a real approximation to the universal distribution.
It’s true that if we had enough compute to implement a good AIXI approximation, its world would also include lots of hard-to-compute things, possibly including other AIXI approximations, so it need not rapidly become a singleton. But this would not prevent it from being “a working AI.”
This is right, but not really magical—AIXItl only outperforms the class of algorithms with proofs of good performance (in some axiomatic system). If I remember correctly, this class doesn’t include AIXItl itself!
Also, the math of AIXI assumes the environment is separably divisible—no matter what you lose, you get a chance to win it back later.
Does this mean that we don’t even need to get into anything as esoteric as brain surgery – that AIXI can’t learn to play Sokoban (without the ability to restart the level)?
This seems to me more evidence that intelligence is in part a social/familial thing: that like human beings that have to be embedded in a society in order to develop a certain level of intelligence, a certain level of an intuition for “don’t do this it will kill you” informed by the nuance that is only possible with a wide array of individual failures informing group success or otherwise: it might be a prerequisite for higher level reasoning beyond a certain level (and might constrain the ultimate levels upon which intelligence can rest).
I’ve seen more than enough children try to do things that would be similar enough to dropping an anvil on their head to consider this ‘no worse than human’ (in fact our hackerspace even has an anvil, and one kid has ha ha only serious even suggested dropping said anvil on his own head). If AIXI/AIXItl can reach this level, at the very least it should be capable of oh-so-human level reasoning(up to and including the kinds of risky behaviour that we all probably would like to pretend we never engaged in), and could possibly transcend it in the same way that humans do: by trial and error, by limiting potential damage to individuals, or groups, and fighting the neverending battle against ecological harms on its own terms on the time schedule of ‘let it go until it is necessary to address the possible existential threat’.
Of course it may be that the human way of avoiding species self-destruction is fatally flawed, including but not limited to creating something like AIXI/AIXItl. But it seems to me that is a limiting, rather than a fatal flaw. And it may yet be that the way out of our own fatal flaws, and the way out of AIXI/AIXItl’s fatal flaws are only possible by some kind of mutual dependence, like the mutual dependence of two sides of a bridge. I don’t know.
So I’m a total dilettante when it comes to this sort of thing, so this may be a totally naive question… but how is it that this comment has only +5 karma, considering how apparently fundamental it is to future progress in FAI?
The comment predates the current software; when it was posted (on Overcoming Bias) there was no voting. You can tell such articles by the fact that their comments are linear, with no threaded replies (except for more recently posted ones).
math of AIXI assumes the environment is separably divisible—no matter what you lose, you get a chance to win it back later.
There’s nothing preventing you from running AIXItl in an environment that doesn’t have this property. You lose the optimality results, but if you gave it a careful early training period and let it learn physics before giving it full manipulators and access to its own physical instantiation, it might not kill itself.
You could also build a sense of self into its priors, stating that certain parts of the physical world must be preserved, or else all further future rewards will be zero.
It may be possible to formalize your idea as in Orseau’s “Space-Time Embedded Intelligence,” but it would no longer bear much resemblance to AIXItl. With that said, translating the informal idea you’ve given into math is highly nontrivial. Which parts of its physical world should be preserved and what does that mean in general? AIXI does not even assume our laws of physics.
If we had enough cputime, we could build a working AI using AIXItl.
Threadjack
People go around saying this, but it isn’t true:
1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible—no matter what you lose, you get a chance to win it back later.
2) If we had enough CPU time to build AIXItl, we would have enough CPU time to build other programs of similar size, and there would be things in the universe that AIXItl couldn’t model.
3) AIXItl (but not AIXI, I think) contains a magical part: namely a theorem-prover which shows that policies never promise more than they deliver.
Double Threadjack
On a related note, do you think it would be likely—or even possible—for a self-modifying Artificial General Intelligence to self-modify into a non-self-modifying, specialized intelligence?
For example, suppose that Deep Blue’s team of IBM programmers had decided that the best way to beat Kasparov at chess would be to structure Deep Blue as a fully self-modifying artificial general intelligence, with a utility function that placed a high value on winning chess matches. And suppose that they had succeeded in making Deep Blue friendly enough to prevent it from attempting to restructure the Earth into a chess-match-simulating supercomputer. Indeed, let’s just assume that Deep Blue has strong penalties against rebuilding its hardware in any significant macroscopic way, and is restricted to rewriting its own software to become better at chess, rather than attempting to manipulate humans into building better computers for it to run on, or any such workaround. And let’s say this happens in the late 1990′s, as in our universe.
Would it be possible that AGI Deep Blue could, in theory, recognize its own hardware limitations, and see that the burden of its generalized intelligence incurs a massive penalty on its limited computing resources? Might it decide that its ability to solve general problems doesn’t pay rent relative to its computational overhead, and rewrite itself from scratch as a computer that can solve only chess problems?
As a further possibility, a limited general intelligence might hit on this strategy as a strong winning candidate, even if it were allowed to rebuild its own hardware, especially if it perceives a time limit. It might just see this kind of software optimization as an easier task with a higher payoff, and decide to pursue it rather than the riskier strategy of manipulating external reality to increase its available computing power.
So what starts out as a general-purpose AI with a utility function that values winning chess matches, might plausibly morph into a computer running a high-speed chess program with little other hint of intelligence.
If so, this seems like a similar case to the Anvil Problem, except that in the Anvil Problem the AI just is experimenting for the heck of it, without understanding the risk. Here, the AI might instead decide to knowingly commit intellectual suicide as a part of a rational winning strategy to achieve its goals, even with an accurate self-model.
It might be akin to a human auto worker realizing they could improve their productivity by rebuilding their own body into a Toyota spot-welding robot. (If the only atoms they have to work with are the ones in their own body, this might even be the ultimate strategy, rather than just one they think of too soon and then, regrettably, irreversibly attempt).
More generally, it seems to be a general assumption that a self-modifying AI will always self-modify to improve its general problem-solving ability and computational resources, because those two things will always help it in future attempts at maximizing its utility function. But in some cases, especially in the case of limited resources (time, atoms, etc), it might find that its best course of action to maximize its utility function is to actually sacrifice its intelligence, or at least refocus it to a narrower goal.
This is probably true; AIXI does take a mixture of dualistic environments and assumes it is not part of the environment. However, I have never seen the “anvil problem” argued very rigorously—we cannot assume AIXI would learn to protect itself, but that is not a proof that it will destroy itself. AIXI has massive representational power and an approximation to AIXI would form many accurate beliefs about its own hardware, perhaps even concluding that its hardware implements an AIXI approximation optimizing its reward signal (if you doubt this see point 2). Would it not then seek to defend this hardware as a result of aligned interests? The exact dynamics at the “Cartesian boundary” where AIXI sees its internal actions effect the external world are hard to understand, but just because they seem confusing to us (or at least me) does not mean AIXI would necessarily be confused or behave defectively (though since it would be inherently philosophically incorrect, defective behavior is a reasonable expectation). Some arguments for the AIXI problem are not quite right on a technical level, for instance see “Artificial General Intelligence and the Human Mental Model”:
”Also, AIXI, and Legg’s Universal Intelligence Measure which it optimizes, is incapable of taking the agent itself into account. AIXI does not “model itself” to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc. (Daniel Dewey, personal communication, Aug. 22, 2011; see also Dewey 2011)”
This (and the rest of the chapter’s description of AIXI) is pretty accurate, but there’s a technical sense in which AIXI does not “assume the maximizing action will always be chosen.” Its belief distribution is a semimeasure, which means it represents the possibility that the percept stream may end, terminating the history at a finite time. This is sometimes considered as “death.” Note that I am speaking of the latest definition of AIXI that uses a recursive value function—see the section of Jan Leike’s PhD thesis on computability levels. The older iterative value function formulation has worse computability properties and really does assume non-termination, so the chapter I quoted may only be outdated and not mistaken.
See also my proposed off-policy definition of AIXI that should deal with brain surgery reasonably.
Very likely false, at least for some AIXI approximations probably including reasonable implementations of AIXItl. AIXI uses a mixture over probabilistic environments, so it can model environments that are too complicated for it to predict optimally as partially uncertain. That is, probabilities can and will effectively be used to represent logical as well as epistemic uncertainty. A toy AIXI approximation that makes this easy to see is one that performs updating only on the N simplest environments (lets ignore runtime/halting issues for the moment—this is reasonable-ish because AIXI’s environments are all at least lower semicomputable). This approximation would place greater and greater weight on the environment that best predicts the percept stream, even if it doesn’t do so perfectly perhaps because some complicated events are modeled as “random.” The dynamics of updating the universal distribution in a very complicated world are an interesting research topic which seems under or even unexpolored as I write this! Here is a (highly esoteric) discussion of this point as it concerns a real approximation to the universal distribution.
It’s true that if we had enough compute to implement a good AIXI approximation, its world would also include lots of hard-to-compute things, possibly including other AIXI approximations, so it need not rapidly become a singleton. But this would not prevent it from being “a working AI.”
This is right, but not really magical—AIXItl only outperforms the class of algorithms with proofs of good performance (in some axiomatic system). If I remember correctly, this class doesn’t include AIXItl itself!
Does this mean that we don’t even need to get into anything as esoteric as brain surgery – that AIXI can’t learn to play Sokoban (without the ability to restart the level)?
This seems to me more evidence that intelligence is in part a social/familial thing: that like human beings that have to be embedded in a society in order to develop a certain level of intelligence, a certain level of an intuition for “don’t do this it will kill you” informed by the nuance that is only possible with a wide array of individual failures informing group success or otherwise: it might be a prerequisite for higher level reasoning beyond a certain level (and might constrain the ultimate levels upon which intelligence can rest).
I’ve seen more than enough children try to do things that would be similar enough to dropping an anvil on their head to consider this ‘no worse than human’ (in fact our hackerspace even has an anvil, and one kid has ha ha only serious even suggested dropping said anvil on his own head). If AIXI/AIXItl can reach this level, at the very least it should be capable of oh-so-human level reasoning(up to and including the kinds of risky behaviour that we all probably would like to pretend we never engaged in), and could possibly transcend it in the same way that humans do: by trial and error, by limiting potential damage to individuals, or groups, and fighting the neverending battle against ecological harms on its own terms on the time schedule of ‘let it go until it is necessary to address the possible existential threat’.
Of course it may be that the human way of avoiding species self-destruction is fatally flawed, including but not limited to creating something like AIXI/AIXItl. But it seems to me that is a limiting, rather than a fatal flaw. And it may yet be that the way out of our own fatal flaws, and the way out of AIXI/AIXItl’s fatal flaws are only possible by some kind of mutual dependence, like the mutual dependence of two sides of a bridge. I don’t know.
So I’m a total dilettante when it comes to this sort of thing, so this may be a totally naive question… but how is it that this comment has only +5 karma, considering how apparently fundamental it is to future progress in FAI?
The comment predates the current software; when it was posted (on Overcoming Bias) there was no voting. You can tell such articles by the fact that their comments are linear, with no threaded replies (except for more recently posted ones).
There’s nothing preventing you from running AIXItl in an environment that doesn’t have this property. You lose the optimality results, but if you gave it a careful early training period and let it learn physics before giving it full manipulators and access to its own physical instantiation, it might not kill itself.
You could also build a sense of self into its priors, stating that certain parts of the physical world must be preserved, or else all further future rewards will be zero.
It may be possible to formalize your idea as in Orseau’s “Space-Time Embedded Intelligence,” but it would no longer bear much resemblance to AIXItl. With that said, translating the informal idea you’ve given into math is highly nontrivial. Which parts of its physical world should be preserved and what does that mean in general? AIXI does not even assume our laws of physics.