You may be right that AIXI can be thought of as an instance of CDT. Hutter himself cites “sequential decision theory” from a 1957 paper which certainly predates CDT, but CDT is general enough that SDT could probably fit into its formalism. (Like EDT can be considered an instance of CDT with the causal probability function set to be the same as the epistemic probability function.) I guess I hadn’t considered AIXI as a serious candidate due to its other major problems.
The first one is the claim that AIXI wouldn’t have a proper understanding of its body because its thoughts are defined mathematically. This is just wrong, IMO; my refutation, for a machine that’s similar enough to AIXI for this issue to work the same, is here. Nobody has engaged me in serious conversation about that, so I don’t know how well it will stand up. (If I’m right on this, then I’ve seen Eliezer, Tim Tyler, and you make the same error. What other false consensuses do we have?)
The second one is fixed if we do the tweak I mentioned in the grandparent of this comment.
If you take the fix described above for the second one, what’s left of the third one is the claim that instantaneous human (or AI) experience is too nuanced to fit in a single cell of a Turing machine. According to the original paper, page 8, the symbols on the reward tape are drawn from an alphabet R of arbitrary but fixed size. All you need is a very large alphabet and this one goes away.
I agree with the facts asserted in Tyler’s fourth problem, but I do not agree that it is a problem. He’s saying that Kolmogorov complexity is ill-defined because the programming language used is undefined. I agree that rational agents might disagree on priors because they’re using different programming languages to represent their explanations. In general, a problem may have multiple solutions. Practical solutions to the problems we’re faced with will require making indefensible arbitrary choices of one potential solution over another. Picking the programming language for priors is going to be one of those choices.
The first one is the claim that AIXI wouldn’t have a proper understanding of its body because its thoughts are defined mathematically. This is just wrong, IMO; my refutation, for a machine that’s similar enough to AIXI for this issue to work the same, is here.
I don’t see how your refutation applies to AIXI. Let me just try to explain in detail why I think AIXI will not properly protect its body. Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P. AIXI has a probability distribution not over universes, but instead over environments where an environment is a TM whose output tape is AIXI’s input tape and whose input tape is AIXI’s output tape. What’s the simplest environment that fits AIXI’s past inputs/outputs? Presumably it’s E = P plus some additional pieces of code that injects E’s inputs into where AIXI’s physical output ports are located in the universe (that is, overrides the universe’s natural evolution using E’s inputs), and extracts E’s outputs from where AIXI’s physical input ports are located.
What happens when AIXI considers an action that destroys its physical body in the universe computed by P? As long as the input/output ports are not also destroyed, AIXI would expect that the environment E (with its “supernatural” injection/extraction code) will continue to receive its outputs and provide it with inputs.
Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P.
An implementation of AIXI would be fairly complex. If P is too simple, then AIXI could not really have a body in the universe, so it would be correct in guessing that some irregularity in the laws of physics was causing its behaviors to be spliced into the behavior of the world.
However, if AIXI has observed enough of the inner workings of other similar machines, or enough of the laws of physics in general, or enough of its own inner workings, the simplest model will be that AIXI’s outputs really do emerge from the laws of physics in the real universe, since we are assuming that that is indeed the case and that Kolmogorov induction eventually works. At that point, imagining that AIXI’s behaviors are a consequence of a bunch of exceptions to the laws of physics is just extra complexity and won’t be part of the simplest hypothesis. It will be part of some less likely hypotheses, and the AI would have to take that risk into account when deciding whether to self-improve.
Tim, I think you’re probably not getting my point about the distinction between our concept of a computable universe, and AIXI’s formal concept of a computable environment. AIXI requires that the environment be a TM whose inputs match AIXI’s past outputs and whose outputs match AIXI’s past inputs. A candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
The candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
I agree that the candidate environment will need to have code to handle the inputs. However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
Even if the AI can only partially predict its own behavior based on the behavior of the hardware it observes in the world, it can use that information to more efficiently encode its outputs in the candidate environment, so it can have some understanding of its position in the world even without being able to perfectly predict its own behavior from first principles.
If the AI manages to destroy itself, it will expect its outputs to be disconnected from the world and have no consequences, since anything else would violate its expectations about the laws of physics.
This back-and-forth appears to be useless. I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
If a candidate environment has no special code to inject AIXI’s outputs, then when AIXI computes expected utilities, it will find that all actions have equal utility in that environment, so that environment will play no role in its decisions.
I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
Ok, but try not to destroy the world while you’re at it. :) Also, please take a closer look at UDT first. Again, I think there’s a strong possibility that you’ll end up thinking “why did I waste my time defending CDT/AIXI?”
FYI, generating reward values internally—instead of them being observed in the environment—makes no difference whatsoever to the wirehead problem.
AIXI digging into its brains with its own mining claws is quite plausible. It won’t reason as you suggest—since it has no idea that it is instantiated in the real world. So, its exploratory mining claws may plunge in. Hopefully it will get suitably negatively reinforced for that—though much will depend on which part of its brain it causes damage too. It could find that ripping out its own inhibition circuits is very rewarding.
A larger set of symbols for rewards makes no difference—since the reward signal is a scalar. If you compare with an animal, that has millions of pain sensors that operate in parallel. The animal is onto something there—something to do with a-priori knowledge about the common causes of pain. Having lots of pain sensors has positive aspects—e.g. it saves you experimenting to figure out what hurts.
As for the reference machine issue, I do say: “This problem is also not very serious.”
Not very serious unless you are making claims about your agent being “the most intelligent unbiased agent possible”. Then this kind of thing starts to make a difference...
A larger set of symbols for rewards makes no difference—since the reward signal is a scalar. If you compare with an animal, that has millions of pain sensors that operate in parallel. The animal is onto something there—something to do with a-priori knowledge about the common causes of pain. Having lots of pain sensors has positive aspects—e.g. it saves you experimenting to figure out what hurts.
You can encode 16 64 bit integers in a 1024 bit integer. The scalar/parallel distinction is bogus.
(Edit: I original wrote “5 32 bit integers” when I meant “2**5 32 bit integers”. Changed to “16 64 bit integers” because “32 32 bit integers” looked too much like a typo.)
Not very serious unless you are making claims about your agent being “the most intelligent unbiased agent possible”. Then this kind of thing starts to make a difference...
Strawman argument. The only claim made is that it’s the most intelligent up to a constant factor, and a bunch of other conditions are thrown in. When Hutter’s involved, you can bet that some of the constant factors are large compared to the size of the universe.
You can encode 5 32 bit integers in a 1024 bit integer. The scalar/parallel distinction is bogus.
Er, not if you are adding the rewards together and maximising the results, you can’t! That is exactly what happens to the rewards used by AIXI.
Not very serious unless you are making claims about your agent being “the most intelligent unbiased agent possible”. Then this kind of thing starts to make a difference...
Strawman argument. The only claim made is that it’s the most intelligent up to a constant factor, and a bunch of other conditions are thrown in.
Actually Hutter says this sort of thing all over the place (I was quoting him above) - and it seems pretty irritating and misleading to me. I’m not saying the claims he makes in the fine print are wrong, but rather that the marketing headlines are misleading.
You can encode 5 32 bit integers in a 1024 bit integer. The scalar/parallel distinction is bogus.
Er, not if you are adding the rewards together and maximising the results, you can’t! That is exactly what happens to the rewards used by AIXI.
You’re right there, I’m confusing AIXI with another design I’ve been working with in a similar idiom. For AIXI to work, you have to combine together all the environmental stuff and compute a utility, make the code for doing the combining part of the environment (not the AI), and then use that resulting utility as the input to AIXI.
You may be right that AIXI can be thought of as an instance of CDT. Hutter himself cites “sequential decision theory” from a 1957 paper which certainly predates CDT, but CDT is general enough that SDT could probably fit into its formalism. (Like EDT can be considered an instance of CDT with the causal probability function set to be the same as the epistemic probability function.) I guess I hadn’t considered AIXI as a serious candidate due to its other major problems.
Four problems are listed there.
The first one is the claim that AIXI wouldn’t have a proper understanding of its body because its thoughts are defined mathematically. This is just wrong, IMO; my refutation, for a machine that’s similar enough to AIXI for this issue to work the same, is here. Nobody has engaged me in serious conversation about that, so I don’t know how well it will stand up. (If I’m right on this, then I’ve seen Eliezer, Tim Tyler, and you make the same error. What other false consensuses do we have?)
The second one is fixed if we do the tweak I mentioned in the grandparent of this comment.
If you take the fix described above for the second one, what’s left of the third one is the claim that instantaneous human (or AI) experience is too nuanced to fit in a single cell of a Turing machine. According to the original paper, page 8, the symbols on the reward tape are drawn from an alphabet R of arbitrary but fixed size. All you need is a very large alphabet and this one goes away.
I agree with the facts asserted in Tyler’s fourth problem, but I do not agree that it is a problem. He’s saying that Kolmogorov complexity is ill-defined because the programming language used is undefined. I agree that rational agents might disagree on priors because they’re using different programming languages to represent their explanations. In general, a problem may have multiple solutions. Practical solutions to the problems we’re faced with will require making indefensible arbitrary choices of one potential solution over another. Picking the programming language for priors is going to be one of those choices.
I don’t see how your refutation applies to AIXI. Let me just try to explain in detail why I think AIXI will not properly protect its body. Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P. AIXI has a probability distribution not over universes, but instead over environments where an environment is a TM whose output tape is AIXI’s input tape and whose input tape is AIXI’s output tape. What’s the simplest environment that fits AIXI’s past inputs/outputs? Presumably it’s E = P plus some additional pieces of code that injects E’s inputs into where AIXI’s physical output ports are located in the universe (that is, overrides the universe’s natural evolution using E’s inputs), and extracts E’s outputs from where AIXI’s physical input ports are located.
What happens when AIXI considers an action that destroys its physical body in the universe computed by P? As long as the input/output ports are not also destroyed, AIXI would expect that the environment E (with its “supernatural” injection/extraction code) will continue to receive its outputs and provide it with inputs.
Does that make sense?
(Responding out of order)
Yes, but it makes some unreasonable assumptions.
An implementation of AIXI would be fairly complex. If P is too simple, then AIXI could not really have a body in the universe, so it would be correct in guessing that some irregularity in the laws of physics was causing its behaviors to be spliced into the behavior of the world.
However, if AIXI has observed enough of the inner workings of other similar machines, or enough of the laws of physics in general, or enough of its own inner workings, the simplest model will be that AIXI’s outputs really do emerge from the laws of physics in the real universe, since we are assuming that that is indeed the case and that Kolmogorov induction eventually works. At that point, imagining that AIXI’s behaviors are a consequence of a bunch of exceptions to the laws of physics is just extra complexity and won’t be part of the simplest hypothesis. It will be part of some less likely hypotheses, and the AI would have to take that risk into account when deciding whether to self-improve.
Tim, I think you’re probably not getting my point about the distinction between our concept of a computable universe, and AIXI’s formal concept of a computable environment. AIXI requires that the environment be a TM whose inputs match AIXI’s past outputs and whose outputs match AIXI’s past inputs. A candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
I agree that the candidate environment will need to have code to handle the inputs. However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
Even if the AI can only partially predict its own behavior based on the behavior of the hardware it observes in the world, it can use that information to more efficiently encode its outputs in the candidate environment, so it can have some understanding of its position in the world even without being able to perfectly predict its own behavior from first principles.
If the AI manages to destroy itself, it will expect its outputs to be disconnected from the world and have no consequences, since anything else would violate its expectations about the laws of physics.
This back-and-forth appears to be useless. I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
If a candidate environment has no special code to inject AIXI’s outputs, then when AIXI computes expected utilities, it will find that all actions have equal utility in that environment, so that environment will play no role in its decisions.
Ok, but try not to destroy the world while you’re at it. :) Also, please take a closer look at UDT first. Again, I think there’s a strong possibility that you’ll end up thinking “why did I waste my time defending CDT/AIXI?”
FYI, generating reward values internally—instead of them being observed in the environment—makes no difference whatsoever to the wirehead problem.
AIXI digging into its brains with its own mining claws is quite plausible. It won’t reason as you suggest—since it has no idea that it is instantiated in the real world. So, its exploratory mining claws may plunge in. Hopefully it will get suitably negatively reinforced for that—though much will depend on which part of its brain it causes damage too. It could find that ripping out its own inhibition circuits is very rewarding.
A larger set of symbols for rewards makes no difference—since the reward signal is a scalar. If you compare with an animal, that has millions of pain sensors that operate in parallel. The animal is onto something there—something to do with a-priori knowledge about the common causes of pain. Having lots of pain sensors has positive aspects—e.g. it saves you experimenting to figure out what hurts.
As for the reference machine issue, I do say: “This problem is also not very serious.”
Not very serious unless you are making claims about your agent being “the most intelligent unbiased agent possible”. Then this kind of thing starts to make a difference...
You can encode 16 64 bit integers in a 1024 bit integer. The scalar/parallel distinction is bogus.
(Edit: I original wrote “5 32 bit integers” when I meant “2**5 32 bit integers”. Changed to “16 64 bit integers” because “32 32 bit integers” looked too much like a typo.)
Strawman argument. The only claim made is that it’s the most intelligent up to a constant factor, and a bunch of other conditions are thrown in. When Hutter’s involved, you can bet that some of the constant factors are large compared to the size of the universe.
Er, not if you are adding the rewards together and maximising the results, you can’t! That is exactly what happens to the rewards used by AIXI.
Actually Hutter says this sort of thing all over the place (I was quoting him above) - and it seems pretty irritating and misleading to me. I’m not saying the claims he makes in the fine print are wrong, but rather that the marketing headlines are misleading.
You’re right there, I’m confusing AIXI with another design I’ve been working with in a similar idiom. For AIXI to work, you have to combine together all the environmental stuff and compute a utility, make the code for doing the combining part of the environment (not the AI), and then use that resulting utility as the input to AIXI.