Yeah, in ASP the predictor looks inside the agent. But the problem still seems “fair” in a certain sense, because the predictor’s prediction is logically tied to the agent’s decision. A stupid agent that one-boxed no matter what would get rewarded, and a stupid agent that two-boxed would get punished. In other words, I’d still like a decision theory that does well on some suitably defined class of ASP-like problems, even if that class is wider than the class of “TDT-fair” problems that Eliezer envisioned. Of course we need a lot of progress to precisely define such classes of problems, too.
In other words, I’d still like a decision theory that does well on some suitably defined class of ASP-like problems, even if that class is wider than the class of “TDT-fair” problems that Eliezer envisioned. Of course we need a lot of progress to precisely define such classes of problems, too.
It would be useful to have list of problems that TDT can handle, a list that current specifications of UDT can handle and a list that are still in the grey area of not quite resolved. Among other things that would make the difference between TDT and UDT far more intuitively clear!
The class of problems which are well-posed in this type system is exactly the class of problems that would not change if you gave the agent a chance to self-modify in between receiving the problem statement and the start of the world program. Problems outside this class are neither fair nor solvable in principle nor interesting.
I work under the assumption that the problem statement is a world program or a prior over world programs. Maybe it’s a bad assumption. Can you suggest a better one?
I use that same assumption, with only the slight caveat that I keep the world program and the preference function separate for clarity’s sake, and you need both, but I often see them combined into one function.
The big difference, I think, is that the way I do it, the world program doesn’t get a decision theory directly as input; instead, the world program’s source is given to the decision theory, the decision theory outputs a strategy, and then the strategy is given to the world program as input. This is a better match for how we normally talk about decision theory problems, and prevents a lot of shennanigans.
Of course the world program shouldn’t get the decision theory as input! In the formulation I always use, the world program doesn’t have any inputs, it’s a computation with no arguments that returns a utility value. You live in the world, so the world program contains your decision theory as a subroutine :-)
That’s very wrong. You’ve given up the distinctions between question and answer, and between answer and solution procedure. This also lets in stupid problems, like “you get a dollar iff the source code to your decision theory is prime”, which I would classify as not a decision theory problem at all.
Yeah, you can’t control arbitrary world programs that way, but remember that our world really is like that: a human mind runs within the same universe that it’s trying to control. One idea would be to define a class of “fair” world programs (perhaps those that can be factorized in appropriate ways) and care only about solving those. But I guess I’d better wait for your writeup, because I don’t understand what kind of formalism you prefer.
This also lets in stupid problems, like “you get a dollar iff the source code to your decision theory is prime”, which I would classify as not a decision theory problem at all.
In this case, the outcome is the same for every action you make. A decision theory can still consider this problem, and rightfully conclude that it’s irrelevant which action it takes (or maybe it infers that jumping on a right foot has a slightly higher chance of making Dark Lords of the Matrix to have written your source code so that it’s prime). I don’t see how this is a problem.
But who does the capturing? (To push the analogy one step up, I know of no solvable question that doesn’t admit an answer written on a sheet of paper.)
See the section on “utility functions” in this post. UDT/ADT seeks exactly to infer functional dependencies that can then be used in the usual manner (at least, this is one way to look at what’s going on). It is intended to solve the very problem which you argue must always be solvable.
My examples of explicit dependence bias show that there are wrong and right ways of parsing the outcome as depending on agent’s decision. If we are guaranteed to be given a right parsing, then that part is indeed covered, and there is no need to worry about the wrong ones. Believing that a right parsing merely exists doesn’t particularly help in finding one.
So I guess the problem you are having with UDT is that you assume that the problem of finding a correct explicit dependence of outcome on action is already solved, and we have a function World ready specified in a way that doesn’t in any way implicitly depend on agent’s actions. But UDT is intended to solve the problem while not making this assumption, and instead tries to find an unbiased dependence on its own. Since you assume the goal of UDT as a prerequisite in your thinking about decision theory, you don’t see the motivation for UDT, which indeed there would be none had we assumed this problem solved.
We’re talking past each other, and this back-and-forth conversation isn’t going anywhere because we’re starting from very different definitions. Let’s restart this conversation after I’ve finished the post that builds up the definitions from scratch.
At least address this concern, which suggests that our difficulty is probably an easy one of technical confusion, and not of communicating intuitive understanding of relevance of studying a certain question.
Yeah, in ASP the predictor looks inside the agent. But the problem still seems “fair” in a certain sense, because the predictor’s prediction is logically tied to the agent’s decision. A stupid agent that one-boxed no matter what would get rewarded, and a stupid agent that two-boxed would get punished. In other words, I’d still like a decision theory that does well on some suitably defined class of ASP-like problems, even if that class is wider than the class of “TDT-fair” problems that Eliezer envisioned. Of course we need a lot of progress to precisely define such classes of problems, too.
It would be useful to have list of problems that TDT can handle, a list that current specifications of UDT can handle and a list that are still in the grey area of not quite resolved. Among other things that would make the difference between TDT and UDT far more intuitively clear!
The class of problems which are well-posed in this type system is exactly the class of problems that would not change if you gave the agent a chance to self-modify in between receiving the problem statement and the start of the world program. Problems outside this class are neither fair nor solvable in principle nor interesting.
I work under the assumption that the problem statement is a world program or a prior over world programs. Maybe it’s a bad assumption. Can you suggest a better one?
I use that same assumption, with only the slight caveat that I keep the world program and the preference function separate for clarity’s sake, and you need both, but I often see them combined into one function.
The big difference, I think, is that the way I do it, the world program doesn’t get a decision theory directly as input; instead, the world program’s source is given to the decision theory, the decision theory outputs a strategy, and then the strategy is given to the world program as input. This is a better match for how we normally talk about decision theory problems, and prevents a lot of shennanigans.
Of course the world program shouldn’t get the decision theory as input! In the formulation I always use, the world program doesn’t have any inputs, it’s a computation with no arguments that returns a utility value. You live in the world, so the world program contains your decision theory as a subroutine :-)
That’s very wrong. You’ve given up the distinctions between question and answer, and between answer and solution procedure. This also lets in stupid problems, like “you get a dollar iff the source code to your decision theory is prime”, which I would classify as not a decision theory problem at all.
Yeah, you can’t control arbitrary world programs that way, but remember that our world really is like that: a human mind runs within the same universe that it’s trying to control. One idea would be to define a class of “fair” world programs (perhaps those that can be factorized in appropriate ways) and care only about solving those. But I guess I’d better wait for your writeup, because I don’t understand what kind of formalism you prefer.
In this case, the outcome is the same for every action you make. A decision theory can still consider this problem, and rightfully conclude that it’s irrelevant which action it takes (or maybe it infers that jumping on a right foot has a slightly higher chance of making Dark Lords of the Matrix to have written your source code so that it’s prime). I don’t see how this is a problem.
But the world is really like that. It is essential to attain this.
I don’t know of any important aspect of the world that can’t be captured in question-answer-procedure format. Could you give an example?
But who does the capturing? (To push the analogy one step up, I know of no solvable question that doesn’t admit an answer written on a sheet of paper.)
See the section on “utility functions” in this post. UDT/ADT seeks exactly to infer functional dependencies that can then be used in the usual manner (at least, this is one way to look at what’s going on). It is intended to solve the very problem which you argue must always be solvable.
My examples of explicit dependence bias show that there are wrong and right ways of parsing the outcome as depending on agent’s decision. If we are guaranteed to be given a right parsing, then that part is indeed covered, and there is no need to worry about the wrong ones. Believing that a right parsing merely exists doesn’t particularly help in finding one.
So I guess the problem you are having with UDT is that you assume that the problem of finding a correct explicit dependence of outcome on action is already solved, and we have a function World ready specified in a way that doesn’t in any way implicitly depend on agent’s actions. But UDT is intended to solve the problem while not making this assumption, and instead tries to find an unbiased dependence on its own. Since you assume the goal of UDT as a prerequisite in your thinking about decision theory, you don’t see the motivation for UDT, which indeed there would be none had we assumed this problem solved.
We’re talking past each other, and this back-and-forth conversation isn’t going anywhere because we’re starting from very different definitions. Let’s restart this conversation after I’ve finished the post that builds up the definitions from scratch.
At least address this concern, which suggests that our difficulty is probably an easy one of technical confusion, and not of communicating intuitive understanding of relevance of studying a certain question.