I guess you mean something like “by not being able to think about itself, AIXI acts as if it assumed that it cannot be changed in the future”.
That is correct.
AIXI does not understand that the machine with letters “AIXI” is itself, but it could still predict that the machine can be changed, and that the changed machine will have impact on the rest of the universe.
If AIXI notices the existence of a machine labeled “AIXI” in the universe that happens to act a lot like itself, then yes, it will be able to reason about changes to that machine. However, since it cannot identify itself with such a machine, it will act as if it will be able to act independently of that machine in the future, without any changes to its own source code.
However, since it cannot identify itself with such a machine, it will act as if it will be able to act independently of that machine in the future, without any changes to its own source code.
And this is where I always get hung up. Okay, you’re a new audience (I think), so one more time:
The claim is that AIXI cannot identify itself with its “noticeable” self, and will therefore predict that it will be able to act independently of that machine in the future. Since the part of AIXI that predicts the future is Solomonoff induction (i.e., a probability distribution over all terminating universe-modeling programs), for the claim to be true, it must be the case that either:
no (terminating) universe-modeling program can make that identification; xor,
such universe-modeling programs can never accumulate enough probability mass to dominate AIXI’s predictions.
Which do you assert is the case? (No one in this conversation thinks #1 is the case, right?)
Have I missed something? It’s entirely possible, since entire MIRI-fulls of smart people make the claim… but no one has been able to explain what I’ve missed.
It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier. I don’t see any reason that it should be impossible to construct a universe-modeling program that can make the identification (in fact, I’m under the impression that MIRI intends to do something like this), but it would not be one of the hypotheses that AIXI is capable of considering. AIXI does not consider all hypotheses about the agent-environment system; instead, it uses a fixed model for its own internal operation and the interaction between itself and its environment, and considers all computable hypotheses about the internal operation of the environment.
I don’t see any reason that it should be impossible to construct a universe-modeling program that can make the identification (in fact, I’m under the impression that MIRI intends to do something like this)
Me neither! That’s kinda my point—that plus the fact that
it would not be one of the hypotheses that AIXI is capable of considering… AIXI does not consider all hypotheses about the agent-environment system; instead, it… [only] considers all computable hypotheses about the internal operation of the environment
does not seem sufficient to me to demonstrate that there are rather simple facts about the physical instantiation of the AIXI bot that it is incapable of learning.
AIXI uses a fixed, unmodifiable expectimax expression to calculate rewards and choose the next action; fine. Some of the programs that the Solomonoff induction probability distribution encompasses do things like simulate an embodied bot running around an environment making predictions and taking actions arbitrarily close to what an AIXI bot would do in that situation, and then return the predicted environment state time series to the expectimax expression. As AIXI takes action and makes observations, more and more probability mass is going to concentrate on the subset of the programs that make accurate predictions, and since by assumption there really is a AIXI bot taking action in the real environment, the Solomonoff induction part will learn that fact.
There’s a bit of an issue with the “arbitrarily close” part, but all that means is that Solomonoff induction can’t learn that there’s an agent around that implements Solomonoff induction exactly. That, in and of itself, is no bar to Solomonoff induction placing high probability on the set of universe-models in which its observations are and will continue to be identical to that of some simulated embodied agent in a simulated environment. Another conceptual hiccup is that Solomonoff induction will continue to simulate the environment past the simulated destruction of its avatar, but a bit of reflection reveals that this is no more mysterious than a human’s ability to make predictions about the world past the point of his or her death.
On a practical level, obviously we don’t want an actual AI to have to infer the fact of its embodiment using observed data; we want it to start off knowing that it is physically instantiated. But here’s why I continue to raise this point: I’m in a position where many people who work on FAI full time—who I have heretofore considered FAI experts, or as close to such as exists currently—assert a claim that I think is bogus. Either (1) I am wrong and stupid; or (2) they are wrong and stupid. Even after all of the above argument, on the outside view I consider 1 more likely, but no one has been able to rule out 2 to my satisfaction yet and it’s making me nervous. I’d much rather find myself in a world where I’ve made a silly error about strong AI than one where MIRI researchers have overlooked something obvious.
Either (1) I am wrong and stupid; or (2) they are wrong and stupid.
People can all be smart and still fail to resolve a disagreement. A lot of disagreement among smart people can be seen right here on LW. Or on SSC, or OB, and I expect wherever rational people are gathered. Or consider Bayesians and Frequentists, or Pearl vs. Rubin.
Smart people don’t disagree less often, they just run into more advanced things to disagree about. There’s probably an Umeshism in there. If we all agree, we’re not thinking about hard enough problems, maybe. Or the better the four wheel drive, the further out you get stuck.
I don’t disagree with what you’ve written, but I don’t think it applies here. The situation is very asymmetric: unlike Nate Soares and Robby Bensinger, this isn’t my area of expertise—or at least, I’m not a MIRI-affiliated researcher writing posts about it on LW. Any objection I can raise ought to have been thought of and dispensed with already. I really do think that it must be the case that either I’m being particularly obtuse and crankish, or I’m mistaken about the caliber of the work on this topic.
I agree with everything you said before the last paragraph.
But here’s why I continue to raise this point: I’m in a position where many people who work on FAI full time—who I have heretofore considered FAI experts, or as close to such as exists currently
Just because somebody works on a subject full time and claim expertise on it, it doesn’t mean that they necessarily have any real expertise. Think of theologicians as the textbook example.
Either (1) I am wrong and stupid; or (2) they are wrong and stupid.
I think this dichotomy is too strong.
People can be smart and still suffer from groupthink when there are significant social incentives to conform. MIRI technical work hasn’t been recognized so far by anyone in the academic community who does research in these specific topics and is independent from them. Also, MIRI is not a random sample of high-IQ people. Membership in MIRI is largely due to self-selection based on things which include beliefs strongly correlated with the stuff they work on (e.g. “Lobian obstacles”, etc.)
So we have the observation that MIRI endorses counterintuitive beliefs about certain mathematical constructions and generally fails at persuading expert outsiders despite their detailed technical arguments. Explanation (1) is that these counterintuitive beliefs are true but MIRI people are poor at communicating them, explanation (2) is that these beliefs are false and MIRI people believe them because of groupthink and/or prior biases and errors that were selected by their self-selection group formation process.
I believe that (2) is more likely, but even if (1) is actually true it is useful to challenge MIRI until they can come up with a strong understandable argument. My suggestion to them is to try to publish in peer reviewed conferences or preferably journals. Interaction with referees will likely settle the question.
Even after all of the above argument, on the outside view I consider 1 more likely, but no one has been able to rule out 2 to my satisfaction yet and it’s making me nervous. I’d much rather find myself in a world where I’ve made a silly error about strong AI than one where MIRI researchers have overlooked something obvious.
All of this is fair—the problem may simply be that I had unrealistically lofty expectations of MIRI’s recent hires. The only note of doubt I can sound is that I know that so8res and Rob Bensinger are getting this idea from EY, and I’m willing to credit him with enough acuity to have thought of, and disposed of, any objection that I might come up with.
(I have made no claims here about whether I believe an embodied AIXI could get a good-enough approximation of a universe including itself using Solomonoff induction. Please refrain from putting words into my mouth, and from projecting your disagreements with others onto me.)
Ok, I’ll concede that AIXI does consider hypotheses in which the environment contains a computable approximation to AIXI and in the near future, the universe will start ignoring AIXI and paying attention to the approximation’s output in the same way it had previously been paying attention to AIXI’s output. Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself. AIXI does well in situations in which it interacts with a computable environment through its input and output channels and nothing else, but that doesn’t mean that it will do well in an environment such that there exists a different environment that interacts with AIXI only through its input an output channels, and looks kind of like the actual environment if you squint at it.
Another conceptual hiccup is that Solomonoff induction will continue to simulate the environment past the simulated destruction of its avatar, but a bit of reflection reveals that this is no more mysterious than a human’s ability to make predictions about the world past the point of his or her death.
Acutally, it is weirder than that, because AIXI considers what decisions it will make after its “avatar” is destroyed. Most humans know it doesn’t work that way.
Ok, I’ll concede that AIXI does consider hypotheses in which the environment contains a computable approximation to AIXI and in the near future, the universe will start ignoring AIXI and paying attention to the approximation’s output in the same way it had previously been paying attention to AIXI’s output. Counting that as “identifying itself with the approximation” seems overly generous,
It’s not that AIXI thinks that “the universe will start ignoring AIXI”—the Solomonoff induction part starts by giving weight to an infinite set of models in which AIXI’s actions have no effect whatsoever. It’s that AIXI is learning that there’s this agent running around the universe doing stuff and the universe is responding to it. The identification part happens because the program specifies that the set of bits in the simulated agent’s input registers is the predicted observation stream.
I still don’t see any reason that AIXI would end up considering such a hypothesis likely
Because hypotheses of smaller K-complexity have failed to predict the observation stream. (Not that this is a claim whose truth I’m asserting—just that this is the only reason that Solomonoff induction ever considers an hypothesis likely. I leave open the possibility that a K-simpler universe model that does not defeat Cartesian dualism might exist.)
Acutally, it is weirder than that, because AIXI considers what decisions it will make after its “avatar” is destroyed. Most humans know it doesn’t work that way.
AIXI learns, e.g., that the simulated agent has an actuator, and that all of the effects of the simulated agent’s decisions are mediated through the actuator. It can also predict that if the actuator is destroyed, then the simulated agent’s decisions stop having effects. That’s really all that’s necessary.
It’s not that AIXI thinks that “the universe will start ignoring AIXI”—the Solomonoff induction part starts by giving weight to an infinite set of models in which AIXI’s actions have no effect whatsoever. It’s that AIXI is learning that there’s this agent running around the universe doing stuff and the universe is responding to it. The identification part happens because the program specifies that the set of bits in the simulated agent’s input registers is the predicted observation stream.
Hypotheses in which AIXI’s actions already have no effect on the environment are useless for action guidance; all actions have the same utility.
Because hypotheses of smaller K-complexity have failed to predict the observation stream.
Well yes, I know that is how Solomonoff induction works. But the (useless for action guidance) hypothesis you just suggested is ridiculously high K-complexity, and the hypothesis I suggested has even higher K-complexity. Even worse: these are actually families of hypotheses, parameterized by the the AIXI approximation algorithm being used (and in the case of the hypothesis I suggested, also the time-step on which the switch occurs), and as the number of observations increases, the required accuracy of the AIXI approximation, and thus its K-complexity, also increases. I’m skeptical that this sort of thing could ever end up as a leading hypothesis.
So I have responses, but they’re moot—I found the Cartesian boundary.
Hypotheses in which AIXI’s actions already have no effect on the environment are useless for action guidance; all actions have the same utility.
Fortunately they get falsified and zeroed out right away.
I’m skeptical that this sort of thing could ever end up as a leading hypothesis.
The leading hypothesis has to not get falsified; what you’ve described is the bare minimum required for a Solomonoff inductor to account for an AIXI agent in the environment.
It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier.
Not really.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”. Technically, the original agent still exists in the model, but since it output has been disconnected it doesn’t directly affect the world after that specific time. The only way it has to continue to exert an influence, is by affecting the “successor” before the disconnection occurs.
These world models are clearly part of the hypothesis space, so the problem is whether the induction process can assign them a sufficient probability mass. This is non-trivial because scenarios which allow self-modification are usually non-ergodic (e.g. they allow you to drop an anvil on your head), and AIXI, like most reinforcement learning paradigms, is provably optimal only under an assumption of ergodicity. But if you can get around non-ergodicity issues (e.g. by bootstrapping the agent with self-preservation instincts and/or giving it a guardian which prevents it from harming itself until it has learned enough), there is in principle no reason why AIXI-like agent should be incapable of reasoning about self-modification.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”.
That’s true, but in all such hypotheses, the successor is computable, and AIXI is not computable, so the successor agent cannot itself be AIXI.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
...
Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself.
The simple answer is: because these models accurately predict the observations after self-modification actions are performed. Of course, the problem is not that simple, because of the non-ergodicity issues I’ve discussed before: when you fiddle with your source code without knowing what you are doing is easy to accidentally ‘drop an anvil’ on yourself. But this is a hard problem without any simple solution IMHO.
The simple answer is: because these models accurately predict the observations after self-modification actions are performed.
For that to be true, the environment has to keep sending AIXI the same signals that it sends the approximation even after it stops paying attention to AIXI’s output. Even in that case, the fact that this model correctly predicts future observations doesn’t help at all. Prior to self-modifying, AIXI does not have access to information about what it will observe after self-modifying.
I agree with you that the non-ergodicity issues don’t have any simple solution. I haven’t been making a big deal about non-ergodicity because there don’t exist any agents that perform optimally in all non-ergodic environments (since one computable environment can permanently screw you for doing one thing, and another computable environment can permanently screw you for doing anything else), so it’s not a problem specific to AIXI-like agents, and AIXI actually seems like it should act fairly reasonably in non-ergodic computable environments separated from the agent by a Cartesian barrier, given the information available to it.
Then I don’t think we actually disagree. I mean, it was well known that the AIXI proof of optimality requred ergodicity, since the original Hutter’s paper.
AIXI does expectimax to decide upon actions, and works under the assumption that after the current action, the next action will also be decided by expectimax. That’s built into the source code.
Now, maybe you could “fake” a change to this assumption with a world-program that throws away AIXI’s output channel, and substitutes the action that would be taken by the modified AIXI. Of course, since AIXI itself is uncomputable, for any nontrivial modification that is not just a transformation of AIXI’s output, but leaves AIXI still uncomputable, this program doesn’t exist.
For AIXItl you may have the same problem manifest in the form of a program that simulates AIXItl “taking too long”. Not sure about that.
Either way, it’s not enough for such world programs to “ever” accumulate enough probability mass to dominate AIXI’s predictions. They’d have to dominate AIXI’s predictions before any modification event has been observed to be of any use to AIXI in deciding how to self-modify.
I’m not trying to claim that AIXI is a good model in which to explore self-modification. My issue isn’t on the agent-y side at all—it’s on the learning side. It has been put forward that there are facts about the world that AIXI is incapable of learning, even though humans are quite capable of learning them. (I’m assuming here that the environment is sufficiently information-rich that these facts are within reach.) To be more specific, the claim is that humans can learn facts about the observable universe that Solomonoff induction can’t. To me, this claim seems to imply that human learning is not computable, and this implication makes my brain emit, “Error! Error! Does not compute!”
This is the place where an equation could be more convincing than verbal reasoning.
To be honest, I probably wouldn’t understand the equation, so someone else would have to check it. But I feel that this is one of those situations (similar to group selectionism example), where humans can trust their reasonable sounding words, but the math could show otherwise.
I am not saying that you are wrong, at this moment I am just confused. Maybe it’s obvious, and it’s my ignorance. I don’t know, and probably won’t spend enough time to find out, so it’s unfair to demand an answer to my question. But I think the advantage of AIXI is that is a relatively simple (well, relatively to other AIs) mathematical model, so claims about what AI can or cannot do should be accompanied by equations. (And if I am completely wrong and the answer is really obvious, then perhaps the equation shouldn’t be complicated.) Also, sometimes the devil is in the details, and writing the equation could make those details explicit.
oi (observations) and ri (rewards) are the signals sent from the environment to AIXI, and ai (actions) are AIXI’s outputs. Notice that future ai are predicted by picking the one that would maximize expected reward through timestep m, just like AIXI does, and there is no summation over possible ways that the environment could make AIXI output actions computed some other way, like there is for oi and ri.
That is correct.
If AIXI notices the existence of a machine labeled “AIXI” in the universe that happens to act a lot like itself, then yes, it will be able to reason about changes to that machine. However, since it cannot identify itself with such a machine, it will act as if it will be able to act independently of that machine in the future, without any changes to its own source code.
And this is where I always get hung up. Okay, you’re a new audience (I think), so one more time:
The claim is that AIXI cannot identify itself with its “noticeable” self, and will therefore predict that it will be able to act independently of that machine in the future. Since the part of AIXI that predicts the future is Solomonoff induction (i.e., a probability distribution over all terminating universe-modeling programs), for the claim to be true, it must be the case that either:
no (terminating) universe-modeling program can make that identification; xor,
such universe-modeling programs can never accumulate enough probability mass to dominate AIXI’s predictions.
Which do you assert is the case? (No one in this conversation thinks #1 is the case, right?)
Have I missed something? It’s entirely possible, since entire MIRI-fulls of smart people make the claim… but no one has been able to explain what I’ve missed.
It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier. I don’t see any reason that it should be impossible to construct a universe-modeling program that can make the identification (in fact, I’m under the impression that MIRI intends to do something like this), but it would not be one of the hypotheses that AIXI is capable of considering. AIXI does not consider all hypotheses about the agent-environment system; instead, it uses a fixed model for its own internal operation and the interaction between itself and its environment, and considers all computable hypotheses about the internal operation of the environment.
Me neither! That’s kinda my point—that plus the fact that
does not seem sufficient to me to demonstrate that there are rather simple facts about the physical instantiation of the AIXI bot that it is incapable of learning.
AIXI uses a fixed, unmodifiable expectimax expression to calculate rewards and choose the next action; fine. Some of the programs that the Solomonoff induction probability distribution encompasses do things like simulate an embodied bot running around an environment making predictions and taking actions arbitrarily close to what an AIXI bot would do in that situation, and then return the predicted environment state time series to the expectimax expression. As AIXI takes action and makes observations, more and more probability mass is going to concentrate on the subset of the programs that make accurate predictions, and since by assumption there really is a AIXI bot taking action in the real environment, the Solomonoff induction part will learn that fact.
There’s a bit of an issue with the “arbitrarily close” part, but all that means is that Solomonoff induction can’t learn that there’s an agent around that implements Solomonoff induction exactly. That, in and of itself, is no bar to Solomonoff induction placing high probability on the set of universe-models in which its observations are and will continue to be identical to that of some simulated embodied agent in a simulated environment. Another conceptual hiccup is that Solomonoff induction will continue to simulate the environment past the simulated destruction of its avatar, but a bit of reflection reveals that this is no more mysterious than a human’s ability to make predictions about the world past the point of his or her death.
On a practical level, obviously we don’t want an actual AI to have to infer the fact of its embodiment using observed data; we want it to start off knowing that it is physically instantiated. But here’s why I continue to raise this point: I’m in a position where many people who work on FAI full time—who I have heretofore considered FAI experts, or as close to such as exists currently—assert a claim that I think is bogus. Either (1) I am wrong and stupid; or (2) they are wrong and stupid. Even after all of the above argument, on the outside view I consider 1 more likely, but no one has been able to rule out 2 to my satisfaction yet and it’s making me nervous. I’d much rather find myself in a world where I’ve made a silly error about strong AI than one where MIRI researchers have overlooked something obvious.
People can all be smart and still fail to resolve a disagreement. A lot of disagreement among smart people can be seen right here on LW. Or on SSC, or OB, and I expect wherever rational people are gathered. Or consider Bayesians and Frequentists, or Pearl vs. Rubin.
Smart people don’t disagree less often, they just run into more advanced things to disagree about. There’s probably an Umeshism in there. If we all agree, we’re not thinking about hard enough problems, maybe. Or the better the four wheel drive, the further out you get stuck.
I don’t disagree with what you’ve written, but I don’t think it applies here. The situation is very asymmetric: unlike Nate Soares and Robby Bensinger, this isn’t my area of expertise—or at least, I’m not a MIRI-affiliated researcher writing posts about it on LW. Any objection I can raise ought to have been thought of and dispensed with already. I really do think that it must be the case that either I’m being particularly obtuse and crankish, or I’m mistaken about the caliber of the work on this topic.
I agree with everything you said before the last paragraph.
Just because somebody works on a subject full time and claim expertise on it, it doesn’t mean that they necessarily have any real expertise. Think of theologicians as the textbook example.
I think this dichotomy is too strong.
People can be smart and still suffer from groupthink when there are significant social incentives to conform. MIRI technical work hasn’t been recognized so far by anyone in the academic community who does research in these specific topics and is independent from them.
Also, MIRI is not a random sample of high-IQ people. Membership in MIRI is largely due to self-selection based on things which include beliefs strongly correlated with the stuff they work on (e.g. “Lobian obstacles”, etc.)
So we have the observation that MIRI endorses counterintuitive beliefs about certain mathematical constructions and generally fails at persuading expert outsiders despite their detailed technical arguments.
Explanation (1) is that these counterintuitive beliefs are true but MIRI people are poor at communicating them, explanation (2) is that these beliefs are false and MIRI people believe them because of groupthink and/or prior biases and errors that were selected by their self-selection group formation process.
I believe that (2) is more likely, but even if (1) is actually true it is useful to challenge MIRI until they can come up with a strong understandable argument.
My suggestion to them is to try to publish in peer reviewed conferences or preferably journals. Interaction with referees will likely settle the question.
Beware wishful thinking :)
All of this is fair—the problem may simply be that I had unrealistically lofty expectations of MIRI’s recent hires. The only note of doubt I can sound is that I know that so8res and Rob Bensinger are getting this idea from EY, and I’m willing to credit him with enough acuity to have thought of, and disposed of, any objection that I might come up with.
(I have made no claims here about whether I believe an embodied AIXI could get a good-enough approximation of a universe including itself using Solomonoff induction. Please refrain from putting words into my mouth, and from projecting your disagreements with others onto me.)
I was referring to this.
Ok, I’ll concede that AIXI does consider hypotheses in which the environment contains a computable approximation to AIXI and in the near future, the universe will start ignoring AIXI and paying attention to the approximation’s output in the same way it had previously been paying attention to AIXI’s output. Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself. AIXI does well in situations in which it interacts with a computable environment through its input and output channels and nothing else, but that doesn’t mean that it will do well in an environment such that there exists a different environment that interacts with AIXI only through its input an output channels, and looks kind of like the actual environment if you squint at it.
Acutally, it is weirder than that, because AIXI considers what decisions it will make after its “avatar” is destroyed. Most humans know it doesn’t work that way.
It’s not that AIXI thinks that “the universe will start ignoring AIXI”—the Solomonoff induction part starts by giving weight to an infinite set of models in which AIXI’s actions have no effect whatsoever. It’s that AIXI is learning that there’s this agent running around the universe doing stuff and the universe is responding to it. The identification part happens because the program specifies that the set of bits in the simulated agent’s input registers is the predicted observation stream.
Because hypotheses of smaller K-complexity have failed to predict the observation stream. (Not that this is a claim whose truth I’m asserting—just that this is the only reason that Solomonoff induction ever considers an hypothesis likely. I leave open the possibility that a K-simpler universe model that does not defeat Cartesian dualism might exist.)
AIXI learns, e.g., that the simulated agent has an actuator, and that all of the effects of the simulated agent’s decisions are mediated through the actuator. It can also predict that if the actuator is destroyed, then the simulated agent’s decisions stop having effects. That’s really all that’s necessary.
Hypotheses in which AIXI’s actions already have no effect on the environment are useless for action guidance; all actions have the same utility.
Well yes, I know that is how Solomonoff induction works. But the (useless for action guidance) hypothesis you just suggested is ridiculously high K-complexity, and the hypothesis I suggested has even higher K-complexity. Even worse: these are actually families of hypotheses, parameterized by the the AIXI approximation algorithm being used (and in the case of the hypothesis I suggested, also the time-step on which the switch occurs), and as the number of observations increases, the required accuracy of the AIXI approximation, and thus its K-complexity, also increases. I’m skeptical that this sort of thing could ever end up as a leading hypothesis.
So I have responses, but they’re moot—I found the Cartesian boundary.
Fortunately they get falsified and zeroed out right away.
The leading hypothesis has to not get falsified; what you’ve described is the bare minimum required for a Solomonoff inductor to account for an AIXI agent in the environment.
Not really.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”.
Technically, the original agent still exists in the model, but since it output has been disconnected it doesn’t directly affect the world after that specific time. The only way it has to continue to exert an influence, is by affecting the “successor” before the disconnection occurs.
These world models are clearly part of the hypothesis space, so the problem is whether the induction process can assign them a sufficient probability mass. This is non-trivial because scenarios which allow self-modification are usually non-ergodic (e.g. they allow you to drop an anvil on your head), and AIXI, like most reinforcement learning paradigms, is provably optimal only under an assumption of ergodicity.
But if you can get around non-ergodicity issues (e.g. by bootstrapping the agent with self-preservation instincts and/or giving it a guardian which prevents it from harming itself until it has learned enough), there is in principle no reason why AIXI-like agent should be incapable of reasoning about self-modification.
That’s true, but in all such hypotheses, the successor is computable, and AIXI is not computable, so the successor agent cannot itself be AIXI.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
...
The simple answer is: because these models accurately predict the observations after self-modification actions are performed.
Of course, the problem is not that simple, because of the non-ergodicity issues I’ve discussed before: when you fiddle with your source code without knowing what you are doing is easy to accidentally ‘drop an anvil’ on yourself. But this is a hard problem without any simple solution IMHO.
For that to be true, the environment has to keep sending AIXI the same signals that it sends the approximation even after it stops paying attention to AIXI’s output. Even in that case, the fact that this model correctly predicts future observations doesn’t help at all. Prior to self-modifying, AIXI does not have access to information about what it will observe after self-modifying.
I agree with you that the non-ergodicity issues don’t have any simple solution. I haven’t been making a big deal about non-ergodicity because there don’t exist any agents that perform optimally in all non-ergodic environments (since one computable environment can permanently screw you for doing one thing, and another computable environment can permanently screw you for doing anything else), so it’s not a problem specific to AIXI-like agents, and AIXI actually seems like it should act fairly reasonably in non-ergodic computable environments separated from the agent by a Cartesian barrier, given the information available to it.
Then I don’t think we actually disagree.
I mean, it was well known that the AIXI proof of optimality requred ergodicity, since the original Hutter’s paper.
AIXI does expectimax to decide upon actions, and works under the assumption that after the current action, the next action will also be decided by expectimax. That’s built into the source code.
Now, maybe you could “fake” a change to this assumption with a world-program that throws away AIXI’s output channel, and substitutes the action that would be taken by the modified AIXI. Of course, since AIXI itself is uncomputable, for any nontrivial modification that is not just a transformation of AIXI’s output, but leaves AIXI still uncomputable, this program doesn’t exist.
For AIXItl you may have the same problem manifest in the form of a program that simulates AIXItl “taking too long”. Not sure about that.
Either way, it’s not enough for such world programs to “ever” accumulate enough probability mass to dominate AIXI’s predictions. They’d have to dominate AIXI’s predictions before any modification event has been observed to be of any use to AIXI in deciding how to self-modify.
It might be possible?
I’m not trying to claim that AIXI is a good model in which to explore self-modification. My issue isn’t on the agent-y side at all—it’s on the learning side. It has been put forward that there are facts about the world that AIXI is incapable of learning, even though humans are quite capable of learning them. (I’m assuming here that the environment is sufficiently information-rich that these facts are within reach.) To be more specific, the claim is that humans can learn facts about the observable universe that Solomonoff induction can’t. To me, this claim seems to imply that human learning is not computable, and this implication makes my brain emit, “Error! Error! Does not compute!”
This is the place where an equation could be more convincing than verbal reasoning.
To be honest, I probably wouldn’t understand the equation, so someone else would have to check it. But I feel that this is one of those situations (similar to group selectionism example), where humans can trust their reasonable sounding words, but the math could show otherwise.
I am not saying that you are wrong, at this moment I am just confused. Maybe it’s obvious, and it’s my ignorance. I don’t know, and probably won’t spend enough time to find out, so it’s unfair to demand an answer to my question. But I think the advantage of AIXI is that is a relatively simple (well, relatively to other AIs) mathematical model, so claims about what AI can or cannot do should be accompanied by equations. (And if I am completely wrong and the answer is really obvious, then perhaps the equation shouldn’t be complicated.) Also, sometimes the devil is in the details, and writing the equation could make those details explicit.
Just look at the AIXI equation itself:
.oi (observations) and ri (rewards) are the signals sent from the environment to AIXI, and ai (actions) are AIXI’s outputs. Notice that future ai are predicted by picking the one that would maximize expected reward through timestep m, just like AIXI does, and there is no summation over possible ways that the environment could make AIXI output actions computed some other way, like there is for oi and ri.