It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier. I don’t see any reason that it should be impossible to construct a universe-modeling program that can make the identification (in fact, I’m under the impression that MIRI intends to do something like this), but it would not be one of the hypotheses that AIXI is capable of considering. AIXI does not consider all hypotheses about the agent-environment system; instead, it uses a fixed model for its own internal operation and the interaction between itself and its environment, and considers all computable hypotheses about the internal operation of the environment.
I don’t see any reason that it should be impossible to construct a universe-modeling program that can make the identification (in fact, I’m under the impression that MIRI intends to do something like this)
Me neither! That’s kinda my point—that plus the fact that
it would not be one of the hypotheses that AIXI is capable of considering… AIXI does not consider all hypotheses about the agent-environment system; instead, it… [only] considers all computable hypotheses about the internal operation of the environment
does not seem sufficient to me to demonstrate that there are rather simple facts about the physical instantiation of the AIXI bot that it is incapable of learning.
AIXI uses a fixed, unmodifiable expectimax expression to calculate rewards and choose the next action; fine. Some of the programs that the Solomonoff induction probability distribution encompasses do things like simulate an embodied bot running around an environment making predictions and taking actions arbitrarily close to what an AIXI bot would do in that situation, and then return the predicted environment state time series to the expectimax expression. As AIXI takes action and makes observations, more and more probability mass is going to concentrate on the subset of the programs that make accurate predictions, and since by assumption there really is a AIXI bot taking action in the real environment, the Solomonoff induction part will learn that fact.
There’s a bit of an issue with the “arbitrarily close” part, but all that means is that Solomonoff induction can’t learn that there’s an agent around that implements Solomonoff induction exactly. That, in and of itself, is no bar to Solomonoff induction placing high probability on the set of universe-models in which its observations are and will continue to be identical to that of some simulated embodied agent in a simulated environment. Another conceptual hiccup is that Solomonoff induction will continue to simulate the environment past the simulated destruction of its avatar, but a bit of reflection reveals that this is no more mysterious than a human’s ability to make predictions about the world past the point of his or her death.
On a practical level, obviously we don’t want an actual AI to have to infer the fact of its embodiment using observed data; we want it to start off knowing that it is physically instantiated. But here’s why I continue to raise this point: I’m in a position where many people who work on FAI full time—who I have heretofore considered FAI experts, or as close to such as exists currently—assert a claim that I think is bogus. Either (1) I am wrong and stupid; or (2) they are wrong and stupid. Even after all of the above argument, on the outside view I consider 1 more likely, but no one has been able to rule out 2 to my satisfaction yet and it’s making me nervous. I’d much rather find myself in a world where I’ve made a silly error about strong AI than one where MIRI researchers have overlooked something obvious.
Either (1) I am wrong and stupid; or (2) they are wrong and stupid.
People can all be smart and still fail to resolve a disagreement. A lot of disagreement among smart people can be seen right here on LW. Or on SSC, or OB, and I expect wherever rational people are gathered. Or consider Bayesians and Frequentists, or Pearl vs. Rubin.
Smart people don’t disagree less often, they just run into more advanced things to disagree about. There’s probably an Umeshism in there. If we all agree, we’re not thinking about hard enough problems, maybe. Or the better the four wheel drive, the further out you get stuck.
I don’t disagree with what you’ve written, but I don’t think it applies here. The situation is very asymmetric: unlike Nate Soares and Robby Bensinger, this isn’t my area of expertise—or at least, I’m not a MIRI-affiliated researcher writing posts about it on LW. Any objection I can raise ought to have been thought of and dispensed with already. I really do think that it must be the case that either I’m being particularly obtuse and crankish, or I’m mistaken about the caliber of the work on this topic.
I agree with everything you said before the last paragraph.
But here’s why I continue to raise this point: I’m in a position where many people who work on FAI full time—who I have heretofore considered FAI experts, or as close to such as exists currently
Just because somebody works on a subject full time and claim expertise on it, it doesn’t mean that they necessarily have any real expertise. Think of theologicians as the textbook example.
Either (1) I am wrong and stupid; or (2) they are wrong and stupid.
I think this dichotomy is too strong.
People can be smart and still suffer from groupthink when there are significant social incentives to conform. MIRI technical work hasn’t been recognized so far by anyone in the academic community who does research in these specific topics and is independent from them. Also, MIRI is not a random sample of high-IQ people. Membership in MIRI is largely due to self-selection based on things which include beliefs strongly correlated with the stuff they work on (e.g. “Lobian obstacles”, etc.)
So we have the observation that MIRI endorses counterintuitive beliefs about certain mathematical constructions and generally fails at persuading expert outsiders despite their detailed technical arguments. Explanation (1) is that these counterintuitive beliefs are true but MIRI people are poor at communicating them, explanation (2) is that these beliefs are false and MIRI people believe them because of groupthink and/or prior biases and errors that were selected by their self-selection group formation process.
I believe that (2) is more likely, but even if (1) is actually true it is useful to challenge MIRI until they can come up with a strong understandable argument. My suggestion to them is to try to publish in peer reviewed conferences or preferably journals. Interaction with referees will likely settle the question.
Even after all of the above argument, on the outside view I consider 1 more likely, but no one has been able to rule out 2 to my satisfaction yet and it’s making me nervous. I’d much rather find myself in a world where I’ve made a silly error about strong AI than one where MIRI researchers have overlooked something obvious.
All of this is fair—the problem may simply be that I had unrealistically lofty expectations of MIRI’s recent hires. The only note of doubt I can sound is that I know that so8res and Rob Bensinger are getting this idea from EY, and I’m willing to credit him with enough acuity to have thought of, and disposed of, any objection that I might come up with.
(I have made no claims here about whether I believe an embodied AIXI could get a good-enough approximation of a universe including itself using Solomonoff induction. Please refrain from putting words into my mouth, and from projecting your disagreements with others onto me.)
Ok, I’ll concede that AIXI does consider hypotheses in which the environment contains a computable approximation to AIXI and in the near future, the universe will start ignoring AIXI and paying attention to the approximation’s output in the same way it had previously been paying attention to AIXI’s output. Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself. AIXI does well in situations in which it interacts with a computable environment through its input and output channels and nothing else, but that doesn’t mean that it will do well in an environment such that there exists a different environment that interacts with AIXI only through its input an output channels, and looks kind of like the actual environment if you squint at it.
Another conceptual hiccup is that Solomonoff induction will continue to simulate the environment past the simulated destruction of its avatar, but a bit of reflection reveals that this is no more mysterious than a human’s ability to make predictions about the world past the point of his or her death.
Acutally, it is weirder than that, because AIXI considers what decisions it will make after its “avatar” is destroyed. Most humans know it doesn’t work that way.
Ok, I’ll concede that AIXI does consider hypotheses in which the environment contains a computable approximation to AIXI and in the near future, the universe will start ignoring AIXI and paying attention to the approximation’s output in the same way it had previously been paying attention to AIXI’s output. Counting that as “identifying itself with the approximation” seems overly generous,
It’s not that AIXI thinks that “the universe will start ignoring AIXI”—the Solomonoff induction part starts by giving weight to an infinite set of models in which AIXI’s actions have no effect whatsoever. It’s that AIXI is learning that there’s this agent running around the universe doing stuff and the universe is responding to it. The identification part happens because the program specifies that the set of bits in the simulated agent’s input registers is the predicted observation stream.
I still don’t see any reason that AIXI would end up considering such a hypothesis likely
Because hypotheses of smaller K-complexity have failed to predict the observation stream. (Not that this is a claim whose truth I’m asserting—just that this is the only reason that Solomonoff induction ever considers an hypothesis likely. I leave open the possibility that a K-simpler universe model that does not defeat Cartesian dualism might exist.)
Acutally, it is weirder than that, because AIXI considers what decisions it will make after its “avatar” is destroyed. Most humans know it doesn’t work that way.
AIXI learns, e.g., that the simulated agent has an actuator, and that all of the effects of the simulated agent’s decisions are mediated through the actuator. It can also predict that if the actuator is destroyed, then the simulated agent’s decisions stop having effects. That’s really all that’s necessary.
It’s not that AIXI thinks that “the universe will start ignoring AIXI”—the Solomonoff induction part starts by giving weight to an infinite set of models in which AIXI’s actions have no effect whatsoever. It’s that AIXI is learning that there’s this agent running around the universe doing stuff and the universe is responding to it. The identification part happens because the program specifies that the set of bits in the simulated agent’s input registers is the predicted observation stream.
Hypotheses in which AIXI’s actions already have no effect on the environment are useless for action guidance; all actions have the same utility.
Because hypotheses of smaller K-complexity have failed to predict the observation stream.
Well yes, I know that is how Solomonoff induction works. But the (useless for action guidance) hypothesis you just suggested is ridiculously high K-complexity, and the hypothesis I suggested has even higher K-complexity. Even worse: these are actually families of hypotheses, parameterized by the the AIXI approximation algorithm being used (and in the case of the hypothesis I suggested, also the time-step on which the switch occurs), and as the number of observations increases, the required accuracy of the AIXI approximation, and thus its K-complexity, also increases. I’m skeptical that this sort of thing could ever end up as a leading hypothesis.
So I have responses, but they’re moot—I found the Cartesian boundary.
Hypotheses in which AIXI’s actions already have no effect on the environment are useless for action guidance; all actions have the same utility.
Fortunately they get falsified and zeroed out right away.
I’m skeptical that this sort of thing could ever end up as a leading hypothesis.
The leading hypothesis has to not get falsified; what you’ve described is the bare minimum required for a Solomonoff inductor to account for an AIXI agent in the environment.
It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier.
Not really.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”. Technically, the original agent still exists in the model, but since it output has been disconnected it doesn’t directly affect the world after that specific time. The only way it has to continue to exert an influence, is by affecting the “successor” before the disconnection occurs.
These world models are clearly part of the hypothesis space, so the problem is whether the induction process can assign them a sufficient probability mass. This is non-trivial because scenarios which allow self-modification are usually non-ergodic (e.g. they allow you to drop an anvil on your head), and AIXI, like most reinforcement learning paradigms, is provably optimal only under an assumption of ergodicity. But if you can get around non-ergodicity issues (e.g. by bootstrapping the agent with self-preservation instincts and/or giving it a guardian which prevents it from harming itself until it has learned enough), there is in principle no reason why AIXI-like agent should be incapable of reasoning about self-modification.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”.
That’s true, but in all such hypotheses, the successor is computable, and AIXI is not computable, so the successor agent cannot itself be AIXI.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
...
Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself.
The simple answer is: because these models accurately predict the observations after self-modification actions are performed. Of course, the problem is not that simple, because of the non-ergodicity issues I’ve discussed before: when you fiddle with your source code without knowing what you are doing is easy to accidentally ‘drop an anvil’ on yourself. But this is a hard problem without any simple solution IMHO.
The simple answer is: because these models accurately predict the observations after self-modification actions are performed.
For that to be true, the environment has to keep sending AIXI the same signals that it sends the approximation even after it stops paying attention to AIXI’s output. Even in that case, the fact that this model correctly predicts future observations doesn’t help at all. Prior to self-modifying, AIXI does not have access to information about what it will observe after self-modifying.
I agree with you that the non-ergodicity issues don’t have any simple solution. I haven’t been making a big deal about non-ergodicity because there don’t exist any agents that perform optimally in all non-ergodic environments (since one computable environment can permanently screw you for doing one thing, and another computable environment can permanently screw you for doing anything else), so it’s not a problem specific to AIXI-like agents, and AIXI actually seems like it should act fairly reasonably in non-ergodic computable environments separated from the agent by a Cartesian barrier, given the information available to it.
Then I don’t think we actually disagree. I mean, it was well known that the AIXI proof of optimality requred ergodicity, since the original Hutter’s paper.
It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier. I don’t see any reason that it should be impossible to construct a universe-modeling program that can make the identification (in fact, I’m under the impression that MIRI intends to do something like this), but it would not be one of the hypotheses that AIXI is capable of considering. AIXI does not consider all hypotheses about the agent-environment system; instead, it uses a fixed model for its own internal operation and the interaction between itself and its environment, and considers all computable hypotheses about the internal operation of the environment.
Me neither! That’s kinda my point—that plus the fact that
does not seem sufficient to me to demonstrate that there are rather simple facts about the physical instantiation of the AIXI bot that it is incapable of learning.
AIXI uses a fixed, unmodifiable expectimax expression to calculate rewards and choose the next action; fine. Some of the programs that the Solomonoff induction probability distribution encompasses do things like simulate an embodied bot running around an environment making predictions and taking actions arbitrarily close to what an AIXI bot would do in that situation, and then return the predicted environment state time series to the expectimax expression. As AIXI takes action and makes observations, more and more probability mass is going to concentrate on the subset of the programs that make accurate predictions, and since by assumption there really is a AIXI bot taking action in the real environment, the Solomonoff induction part will learn that fact.
There’s a bit of an issue with the “arbitrarily close” part, but all that means is that Solomonoff induction can’t learn that there’s an agent around that implements Solomonoff induction exactly. That, in and of itself, is no bar to Solomonoff induction placing high probability on the set of universe-models in which its observations are and will continue to be identical to that of some simulated embodied agent in a simulated environment. Another conceptual hiccup is that Solomonoff induction will continue to simulate the environment past the simulated destruction of its avatar, but a bit of reflection reveals that this is no more mysterious than a human’s ability to make predictions about the world past the point of his or her death.
On a practical level, obviously we don’t want an actual AI to have to infer the fact of its embodiment using observed data; we want it to start off knowing that it is physically instantiated. But here’s why I continue to raise this point: I’m in a position where many people who work on FAI full time—who I have heretofore considered FAI experts, or as close to such as exists currently—assert a claim that I think is bogus. Either (1) I am wrong and stupid; or (2) they are wrong and stupid. Even after all of the above argument, on the outside view I consider 1 more likely, but no one has been able to rule out 2 to my satisfaction yet and it’s making me nervous. I’d much rather find myself in a world where I’ve made a silly error about strong AI than one where MIRI researchers have overlooked something obvious.
People can all be smart and still fail to resolve a disagreement. A lot of disagreement among smart people can be seen right here on LW. Or on SSC, or OB, and I expect wherever rational people are gathered. Or consider Bayesians and Frequentists, or Pearl vs. Rubin.
Smart people don’t disagree less often, they just run into more advanced things to disagree about. There’s probably an Umeshism in there. If we all agree, we’re not thinking about hard enough problems, maybe. Or the better the four wheel drive, the further out you get stuck.
I don’t disagree with what you’ve written, but I don’t think it applies here. The situation is very asymmetric: unlike Nate Soares and Robby Bensinger, this isn’t my area of expertise—or at least, I’m not a MIRI-affiliated researcher writing posts about it on LW. Any objection I can raise ought to have been thought of and dispensed with already. I really do think that it must be the case that either I’m being particularly obtuse and crankish, or I’m mistaken about the caliber of the work on this topic.
I agree with everything you said before the last paragraph.
Just because somebody works on a subject full time and claim expertise on it, it doesn’t mean that they necessarily have any real expertise. Think of theologicians as the textbook example.
I think this dichotomy is too strong.
People can be smart and still suffer from groupthink when there are significant social incentives to conform. MIRI technical work hasn’t been recognized so far by anyone in the academic community who does research in these specific topics and is independent from them.
Also, MIRI is not a random sample of high-IQ people. Membership in MIRI is largely due to self-selection based on things which include beliefs strongly correlated with the stuff they work on (e.g. “Lobian obstacles”, etc.)
So we have the observation that MIRI endorses counterintuitive beliefs about certain mathematical constructions and generally fails at persuading expert outsiders despite their detailed technical arguments.
Explanation (1) is that these counterintuitive beliefs are true but MIRI people are poor at communicating them, explanation (2) is that these beliefs are false and MIRI people believe them because of groupthink and/or prior biases and errors that were selected by their self-selection group formation process.
I believe that (2) is more likely, but even if (1) is actually true it is useful to challenge MIRI until they can come up with a strong understandable argument.
My suggestion to them is to try to publish in peer reviewed conferences or preferably journals. Interaction with referees will likely settle the question.
Beware wishful thinking :)
All of this is fair—the problem may simply be that I had unrealistically lofty expectations of MIRI’s recent hires. The only note of doubt I can sound is that I know that so8res and Rob Bensinger are getting this idea from EY, and I’m willing to credit him with enough acuity to have thought of, and disposed of, any objection that I might come up with.
(I have made no claims here about whether I believe an embodied AIXI could get a good-enough approximation of a universe including itself using Solomonoff induction. Please refrain from putting words into my mouth, and from projecting your disagreements with others onto me.)
I was referring to this.
Ok, I’ll concede that AIXI does consider hypotheses in which the environment contains a computable approximation to AIXI and in the near future, the universe will start ignoring AIXI and paying attention to the approximation’s output in the same way it had previously been paying attention to AIXI’s output. Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself. AIXI does well in situations in which it interacts with a computable environment through its input and output channels and nothing else, but that doesn’t mean that it will do well in an environment such that there exists a different environment that interacts with AIXI only through its input an output channels, and looks kind of like the actual environment if you squint at it.
Acutally, it is weirder than that, because AIXI considers what decisions it will make after its “avatar” is destroyed. Most humans know it doesn’t work that way.
It’s not that AIXI thinks that “the universe will start ignoring AIXI”—the Solomonoff induction part starts by giving weight to an infinite set of models in which AIXI’s actions have no effect whatsoever. It’s that AIXI is learning that there’s this agent running around the universe doing stuff and the universe is responding to it. The identification part happens because the program specifies that the set of bits in the simulated agent’s input registers is the predicted observation stream.
Because hypotheses of smaller K-complexity have failed to predict the observation stream. (Not that this is a claim whose truth I’m asserting—just that this is the only reason that Solomonoff induction ever considers an hypothesis likely. I leave open the possibility that a K-simpler universe model that does not defeat Cartesian dualism might exist.)
AIXI learns, e.g., that the simulated agent has an actuator, and that all of the effects of the simulated agent’s decisions are mediated through the actuator. It can also predict that if the actuator is destroyed, then the simulated agent’s decisions stop having effects. That’s really all that’s necessary.
Hypotheses in which AIXI’s actions already have no effect on the environment are useless for action guidance; all actions have the same utility.
Well yes, I know that is how Solomonoff induction works. But the (useless for action guidance) hypothesis you just suggested is ridiculously high K-complexity, and the hypothesis I suggested has even higher K-complexity. Even worse: these are actually families of hypotheses, parameterized by the the AIXI approximation algorithm being used (and in the case of the hypothesis I suggested, also the time-step on which the switch occurs), and as the number of observations increases, the required accuracy of the AIXI approximation, and thus its K-complexity, also increases. I’m skeptical that this sort of thing could ever end up as a leading hypothesis.
So I have responses, but they’re moot—I found the Cartesian boundary.
Fortunately they get falsified and zeroed out right away.
The leading hypothesis has to not get falsified; what you’ve described is the bare minimum required for a Solomonoff inductor to account for an AIXI agent in the environment.
Not really.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”.
Technically, the original agent still exists in the model, but since it output has been disconnected it doesn’t directly affect the world after that specific time. The only way it has to continue to exert an influence, is by affecting the “successor” before the disconnection occurs.
These world models are clearly part of the hypothesis space, so the problem is whether the induction process can assign them a sufficient probability mass. This is non-trivial because scenarios which allow self-modification are usually non-ergodic (e.g. they allow you to drop an anvil on your head), and AIXI, like most reinforcement learning paradigms, is provably optimal only under an assumption of ergodicity.
But if you can get around non-ergodicity issues (e.g. by bootstrapping the agent with self-preservation instincts and/or giving it a guardian which prevents it from harming itself until it has learned enough), there is in principle no reason why AIXI-like agent should be incapable of reasoning about self-modification.
That’s true, but in all such hypotheses, the successor is computable, and AIXI is not computable, so the successor agent cannot itself be AIXI.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
...
The simple answer is: because these models accurately predict the observations after self-modification actions are performed.
Of course, the problem is not that simple, because of the non-ergodicity issues I’ve discussed before: when you fiddle with your source code without knowing what you are doing is easy to accidentally ‘drop an anvil’ on yourself. But this is a hard problem without any simple solution IMHO.
For that to be true, the environment has to keep sending AIXI the same signals that it sends the approximation even after it stops paying attention to AIXI’s output. Even in that case, the fact that this model correctly predicts future observations doesn’t help at all. Prior to self-modifying, AIXI does not have access to information about what it will observe after self-modifying.
I agree with you that the non-ergodicity issues don’t have any simple solution. I haven’t been making a big deal about non-ergodicity because there don’t exist any agents that perform optimally in all non-ergodic environments (since one computable environment can permanently screw you for doing one thing, and another computable environment can permanently screw you for doing anything else), so it’s not a problem specific to AIXI-like agents, and AIXI actually seems like it should act fairly reasonably in non-ergodic computable environments separated from the agent by a Cartesian barrier, given the information available to it.
Then I don’t think we actually disagree.
I mean, it was well known that the AIXI proof of optimality requred ergodicity, since the original Hutter’s paper.