You can cast this out formally, mathematically, and say that if you start with a uniform prior, [Model 2] seems to be simpler.
There are some issues: you could say that A can’t really model itself, tautologically, because it only has so much power.
But if there are other agents in the world similar to A, lets call them B, C, etc, which are about as hard to model as A.
A is still going to need a good approximate model using physics of B and C, and therefore of its own behaviour.
So unless you do something special and go out of your way, it’s going to be similar to model A.
[15:02]
And even if it can’t predict exactly what it’s going to do, you can still say the reductionist model it knows is accurate.
You believe, as a mathematical fact, that the predictions of the model are in accordance with reality.
So any AI that is sophisticated to come up with Model 2 will adopt it over Model 1.
We could talk about the sort of philosophical difficulties that would stop A from being able to realize that, but for now let’s just assume that Model 2 wins.
So, what does A do if it believes Model 2?
This is like asking, “What would you do if you believe that your choices have no effect on the world and you’re just bringing in physics to see what you do?”
[15:59]
In the expectation [in the previously given algorithm A], it’s setting the kth action equal to a.
But in Model 2, reality doesn’t care what it’s equal to- it just going to keep running physics.
It’s like it’s ignoring these outputs, instead of putting them on the output wire.
So in Model 2, it doesn’t matter what A does, physics is just going to keep running ahead.
And A is thinking about its interventions in this very specific way- it’s taking the world model its learned, the mapping from outputs to inputs, and seeing what would happen if you fed it a different output.
But in Model 2, we’re ignoring A’s outputs entirely.
So if you’re only using this particular strategy of looking at what different possible worlds, the worlds aren’t different at all, because reality is ignoring what you say and just using physics to generate inputs.
So, A’s decisions have no consequences.
[17:06]
So that means that, for decision-making purposes, A can ignore Model 2, because in Model 2 it doesn’t matter what A does.
So, if you were uncertain about whether reductionism was true or not, and you were using this sort of algorithm to choose between actions, you could just ignore the possibility that reductionism was true, and so I can concentrate on the non-reducationism possibility.
The problem is that there’s these other models: eg Model 3, which is just like Model 2, except for outputs “1001010101”- some random, hardwired exception to Model 2.
So in this model it does matter what A does.
And there are perhaps more natural forms of this, where reductionism is true, and everything is physical, but I’m still going to be rewarded if I make the right decisions.
[18:15]
So in this exception, perhaps something good happens to A, perhaps something bad happens.
So, we’ve discarded Model 2 as irrelevant, but both Model 3 and Model 1 are still relevant.
The evidence we have so far can’t distinguish between any of these three models.
In Model 1, your decision matters in the intended way; in Model 2, your decision doesn’t matter; and in Model 3, your decision matters in this completely arbitrary way.
(There’s actually a whole, huge class of Model 3-like models.)
[19:10]
So we’d like you to believe Model 1 instead of Model 3, but the problem is that the part of Model 1 where it says “takes values on out” magically does surgery on the world to cause the output wire to agree with it.
That model’s actually really complicated.
Given that physics is already producing the charges on the wire, this clause isn’t supported by evidence, but is also really mechanically complicated.
It’s complicated to write down the program that looks in the universe, finds the output wire, and changes its value.
[20:00]
Whereas Model 3 isn’t that complicated- if you’re using a normal human programming language, stuff like this is quite straight forward- you just have one if statement.
So Model 3 just requires about one extra line of code, while Model 1 requires a whole lot of extra code.
So, since Model 3 is simpler, A winds up believing that reductionism is true, such that its actions are meaningless, and then it starts behaving completely erratically.
So, this is just a simple description of what’s wrong with trying to adapt AIXI to the world- it’s not really compatible with Occam’s Razor and normal inductive reasoning.
[21:00]
So, then we get to decision theory.
So, what do you do, as a human?
I don’t know about your philosophical orientation, but I assume that you are onboard with reductionism.
[Listener: Yeah.]
And yet you still decide that you should make choices in a certain way; you desire outcomes, and you make choices that systematically lead to those outcomes.
So, the question is, what is a good algorithmic solution?
In some sense you understand that your actions have no consequences, that everything that you’re going to do is already fixed.
This would be even more severe is you were an AI and you knew your own source code, because there’s a fact of the matter as to whether you would output zero or one, and no matter what you do, you couldn’t affect this, it’s just frozen.
So how do you reason in this context?
[21:57]
There’s a very simple answer.
So, in UDT [Updateless Decision Theory] you have uncertainty.
You are A, and you know your own source code. (Or, at least, you have a distribution over what your brain looks like; you know what it is, or you know how would would learn.)
But even once you know this, you are still uncertain about what you’re going to do.
You don’t know whether you’ll output zero or one- and you can’t know, in some strong sense.
So, you have uncertainty about your own output, even though it’s a mathematical fact.
[22:58]
So, in light of this, you have some distribution over all of these unknown facts about the world- a joint distribution over a bunch of things, including what you end up doing, and the utility you get.
[24:08]
Like A, you use a function that you’re aware of.
And there’s a known mathematical fact of the matter about how the universe evolves.
So there’s also a mathematical fact of the matter about what your utility is.
It is, in some sense, immutable, in the same way that your actions are immutable.
So you have this joint distribution over what you actually do, and what your utility is, and other stuff.
You have uncertainty; your uncertainty is correlated.
[25:04]
So you could know that if a=0, U=0, and if a=1, U=1.
And you could know this even if you didn’t know what a was.
A reasonable way to approach this is to take the action such that, once you condition on learning that you took that action, your expected utility is as high as possible.
(This is essentially Evidential Decision Theory [EDT].)
The only distinction to Updateless Decision Theory is that you treat [the actions and utilities] as mathematical facts.
So instead of saying “the action that you took,” where you is some vague indexical, we look at your code and condition on that mathematical algorithm that you’re instantiating outputs this.
[26:06]
And conditioned on that mathematical facts being true, you can reason about these other mathematical facts.
I guess that was really EDT, and I’ll need some more discussion to explain why EDT has some problems, and why the best way to get around those problems is to talk about this algorithm that’s outputting the action.
About the vague “you,” in EDT.
I guess there’s two natural ways to go.
One is to clarify why we’ve defined things the way we have, and another is to try to formalize it- because this isn’t formalized yet, and both EDT and UDT are going to run into a lot of problems when we try to formalize them.
[27:06]
So, the common place that EDT fails is something like the Smoking Lesion Problem.
Suppose there are two classes of agents, one of which likes candy, and always dies early, and the other of which likes candy less.
So then you get situations where, if you learn that you’ve eaten some candy, you can infer that you’re the sort of agent that likes candy, and therefore the sort of agent that will die early.
This seems incorrect, in the sense that your decision to take the candy doesn’t really have any influence on whether or not you die early.
[28:00]
The point is that by passing to this context, where you look at the algorithm we’re using directly, you don’t get any more information.
This is a good explanation of UDT, its relationship to EDT, and why UDT solves the Smoking Lesion Problem when EDT doesn’t. The Smoking Lesion Problem is actually a central motivation for UDT. (The story is that I was reading about the Smoking Lesion Problem, and thought something like “EDT would solve this correctly if the agent knew its own source code and conditioned on that first before deciding its action”, which doesn’t quite work, but helped me to later formulate UDT). I’ve been lazy about writing a post on this, so I’m glad Paul has figured it out and Randaly has put it in written form. (None of the previous discussions of the Smoking Lesion Problem I’ve seen on LW seems to have really “gotten it”.)
Second half of the transcript:
You can cast this out formally, mathematically, and say that if you start with a uniform prior, [Model 2] seems to be simpler. There are some issues: you could say that A can’t really model itself, tautologically, because it only has so much power. But if there are other agents in the world similar to A, lets call them B, C, etc, which are about as hard to model as A. A is still going to need a good approximate model using physics of B and C, and therefore of its own behaviour. So unless you do something special and go out of your way, it’s going to be similar to model A.
[15:02]
And even if it can’t predict exactly what it’s going to do, you can still say the reductionist model it knows is accurate. You believe, as a mathematical fact, that the predictions of the model are in accordance with reality. So any AI that is sophisticated to come up with Model 2 will adopt it over Model 1. We could talk about the sort of philosophical difficulties that would stop A from being able to realize that, but for now let’s just assume that Model 2 wins. So, what does A do if it believes Model 2? This is like asking, “What would you do if you believe that your choices have no effect on the world and you’re just bringing in physics to see what you do?”
[15:59]
In the expectation [in the previously given algorithm A], it’s setting the kth action equal to a. But in Model 2, reality doesn’t care what it’s equal to- it just going to keep running physics. It’s like it’s ignoring these outputs, instead of putting them on the output wire. So in Model 2, it doesn’t matter what A does, physics is just going to keep running ahead. And A is thinking about its interventions in this very specific way- it’s taking the world model its learned, the mapping from outputs to inputs, and seeing what would happen if you fed it a different output. But in Model 2, we’re ignoring A’s outputs entirely. So if you’re only using this particular strategy of looking at what different possible worlds, the worlds aren’t different at all, because reality is ignoring what you say and just using physics to generate inputs. So, A’s decisions have no consequences.
[17:06]
So that means that, for decision-making purposes, A can ignore Model 2, because in Model 2 it doesn’t matter what A does. So, if you were uncertain about whether reductionism was true or not, and you were using this sort of algorithm to choose between actions, you could just ignore the possibility that reductionism was true, and so I can concentrate on the non-reducationism possibility. The problem is that there’s these other models: eg Model 3, which is just like Model 2, except for outputs “1001010101”- some random, hardwired exception to Model 2. So in this model it does matter what A does. And there are perhaps more natural forms of this, where reductionism is true, and everything is physical, but I’m still going to be rewarded if I make the right decisions.
[18:15]
So in this exception, perhaps something good happens to A, perhaps something bad happens. So, we’ve discarded Model 2 as irrelevant, but both Model 3 and Model 1 are still relevant. The evidence we have so far can’t distinguish between any of these three models. In Model 1, your decision matters in the intended way; in Model 2, your decision doesn’t matter; and in Model 3, your decision matters in this completely arbitrary way. (There’s actually a whole, huge class of Model 3-like models.)
[19:10]
So we’d like you to believe Model 1 instead of Model 3, but the problem is that the part of Model 1 where it says “takes values on out” magically does surgery on the world to cause the output wire to agree with it. That model’s actually really complicated. Given that physics is already producing the charges on the wire, this clause isn’t supported by evidence, but is also really mechanically complicated. It’s complicated to write down the program that looks in the universe, finds the output wire, and changes its value.
[20:00]
Whereas Model 3 isn’t that complicated- if you’re using a normal human programming language, stuff like this is quite straight forward- you just have one if statement. So Model 3 just requires about one extra line of code, while Model 1 requires a whole lot of extra code. So, since Model 3 is simpler, A winds up believing that reductionism is true, such that its actions are meaningless, and then it starts behaving completely erratically. So, this is just a simple description of what’s wrong with trying to adapt AIXI to the world- it’s not really compatible with Occam’s Razor and normal inductive reasoning.
[21:00]
So, then we get to decision theory. So, what do you do, as a human? I don’t know about your philosophical orientation, but I assume that you are onboard with reductionism. [Listener: Yeah.] And yet you still decide that you should make choices in a certain way; you desire outcomes, and you make choices that systematically lead to those outcomes. So, the question is, what is a good algorithmic solution? In some sense you understand that your actions have no consequences, that everything that you’re going to do is already fixed. This would be even more severe is you were an AI and you knew your own source code, because there’s a fact of the matter as to whether you would output zero or one, and no matter what you do, you couldn’t affect this, it’s just frozen. So how do you reason in this context?
[21:57]
There’s a very simple answer. So, in UDT [Updateless Decision Theory] you have uncertainty. You are A, and you know your own source code. (Or, at least, you have a distribution over what your brain looks like; you know what it is, or you know how would would learn.) But even once you know this, you are still uncertain about what you’re going to do. You don’t know whether you’ll output zero or one- and you can’t know, in some strong sense. So, you have uncertainty about your own output, even though it’s a mathematical fact.
[22:58]
So, in light of this, you have some distribution over all of these unknown facts about the world- a joint distribution over a bunch of things, including what you end up doing, and the utility you get.
[24:08]
Like A, you use a function that you’re aware of. And there’s a known mathematical fact of the matter about how the universe evolves. So there’s also a mathematical fact of the matter about what your utility is. It is, in some sense, immutable, in the same way that your actions are immutable. So you have this joint distribution over what you actually do, and what your utility is, and other stuff. You have uncertainty; your uncertainty is correlated.
[25:04]
So you could know that if a=0, U=0, and if a=1, U=1. And you could know this even if you didn’t know what a was. A reasonable way to approach this is to take the action such that, once you condition on learning that you took that action, your expected utility is as high as possible. (This is essentially Evidential Decision Theory [EDT].) The only distinction to Updateless Decision Theory is that you treat [the actions and utilities] as mathematical facts. So instead of saying “the action that you took,” where you is some vague indexical, we look at your code and condition on that mathematical algorithm that you’re instantiating outputs this.
[26:06]
And conditioned on that mathematical facts being true, you can reason about these other mathematical facts. I guess that was really EDT, and I’ll need some more discussion to explain why EDT has some problems, and why the best way to get around those problems is to talk about this algorithm that’s outputting the action. About the vague “you,” in EDT. I guess there’s two natural ways to go. One is to clarify why we’ve defined things the way we have, and another is to try to formalize it- because this isn’t formalized yet, and both EDT and UDT are going to run into a lot of problems when we try to formalize them.
[27:06]
So, the common place that EDT fails is something like the Smoking Lesion Problem. Suppose there are two classes of agents, one of which likes candy, and always dies early, and the other of which likes candy less. So then you get situations where, if you learn that you’ve eaten some candy, you can infer that you’re the sort of agent that likes candy, and therefore the sort of agent that will die early. This seems incorrect, in the sense that your decision to take the candy doesn’t really have any influence on whether or not you die early.
[28:00]
The point is that by passing to this context, where you look at the algorithm we’re using directly, you don’t get any more information.
[sentence fragment]
This is a good explanation of UDT, its relationship to EDT, and why UDT solves the Smoking Lesion Problem when EDT doesn’t. The Smoking Lesion Problem is actually a central motivation for UDT. (The story is that I was reading about the Smoking Lesion Problem, and thought something like “EDT would solve this correctly if the agent knew its own source code and conditioned on that first before deciding its action”, which doesn’t quite work, but helped me to later formulate UDT). I’ve been lazy about writing a post on this, so I’m glad Paul has figured it out and Randaly has put it in written form. (None of the previous discussions of the Smoking Lesion Problem I’ve seen on LW seems to have really “gotten it”.)
Thanks a lot for doing this!