This whole thing seems like an artifact of failing to draw a boundary between decision theories and strategies. In my own work, I have for deterministic problems:
Phrased this way, the predictor is part of the World function, which means it is only allowed to simulate a Strategy, not a DecisionTheory, and so the problem as stated is ill-posed. This structure is required for decision theory in general to have correct answers, because otherwise you could construct a problem for any decision theory, no matter how ridiculous, which only that decision theory can win at.
This structure is required for decision theory in general to have correct answers, because otherwise you could construct a problem for any decision theory, no matter how ridiculous, which only that decision theory can win at.
I disagree. An agent can use any considerations whatsoever in making its decisions, and these considerations can refer to the world, or its own algorithm, or to the way the world depends on agent’s algorithm, or to the way the dependence of the world on agent’s algorithm depends on agent’s decision in a counterfactual world.
You can object that it’s not fair to pose before the agent problems that ask for recognition of facts outside some predefined class of non-ridiculous facts, but asking about which situations we are allowed to present before an agent is a wrong perspective. It is wrong because making an agent with certain characteristics automatically determines the facts of its success or failure in all of the possible scenarios, fair, unfair, plausible, and ridiculous.
So the only consideration that is allowed to dictate which considerations we are allowed to ignore is agent’s own preference. If the agent doesn’t care about influence of some fact, then it can ignore it. Typically, we won’t be able to formally point out any class of facts to which the agent is guaranteed to be even in principle indifferent. And so decision theory must not be typed.
(You can see a certain analogy with not demanding particular kind of proof. The agent is not allowed to reject my argument that a certain action is desirable, or undesirable, on the basis of the considerations I refer to not belonging to a particular privileged class, unless it really doesn’t care about those considerations or any or their logical consequences (according to agent’s normative theory of inference). See also explicit dependence bias.)
I think you’ve misunderstood just what restrictions this type schema imposes on problems. Could you provide a specific example of something you think it excludes, that it shouldn’t?
The ASP problem described in the OP is just such an example. Are you perhaps not convinced that it represents an aspect of important real-world problems?
What’s World? What do you mean by “World: Strategy->Outcome”? The problem is that if World is a function, it’s given by some syntactic specification, and the agent, being part of the world, can control which function this syntactic specification refers to, or which value this function has for a particular argument, in a way unrelated to its type, to passing of this Strategy thing as its argument. This is a typical example of explicit dependence bias: you declare that World (as a mathematical structure) depends on nothing, but it really does.
See example “world3” in my post for an example with Newcomb’s problem. There, World is given by a program “world3″ that takes agent’s action as parameter, but also depends on agent’s action implicitly in a way that isn’t captured by the interface of the function. The function in fact shows that two-boxing dominates one-boxing, and one can only see that it’s really worse by recognizing that function’s value when given agent’s actual action as parameter, is controlled by agent’s action in a way that makes one-boxing preferable. And so only seeing program “world3” as specifying a function is unhelpful.
It means “World is the type of function that takes a Strategy as input and returns an Outcomes.”
In the linked article, you’ve defined world, world2, and world3 as things that’re not actually functions; they have unbound references to agent, which are parameters in disguise. You then show that if you mix parameters-as-unbound-references with real parameters, you can get confused into thinking they’re independent. Which jus means you shouldn’t use unbound references.
Which just means you shouldn’t use unbound references.
How do you know that a given syntactic specification of a function doesn’t “use unbound references”? Presence of a logical dependence on agent’s action looks like something algorithmically impossible to detect. You can of course focus on a class of syntactic specifications that are known to not depend on agent’s action (for example, GLUTs), but this is too restrictive for a FAI-grade decision theory that can handle the actual world, and ignores the problem of whether the process that specified that GLUT could itself logically depend on agent’s action. (The very setup of a thought experiment could be controlled by an agent acting inside it, for example, for strange enough or sensitive-to-detail enough thought experiments.)
How do you know that a given syntactic specification of a function doesn’t “use unbound references”?
Give it as input to a compiler and see if it gives an error message or not. Or apply the same trivial procedure compilers use: read it and look for a definition for every symbol that is not itself the left hand side of a definition.
Give it as input to a compiler and see if it gives an error message or not. Or apply the same trivial procedure compilers use: read it and look for a definition for every symbol that is not itself the left hand side of a definition.
It doesn’t refer to any symbols, see in particular the difference between “world” and “world2″, and notice that “world3” doesn’t refer to “action()”, but instead to “action2()”, which you can assume to be a copy-paste of “action()”’s source code, with all the symbols renamed.
Ok, I think I see where our formalizations differ. In the formalization I’m using, the decision theory produces a strategy, which is a function that’s given as an argument to the world program. The world program invokes the strategy zero or more times, each time passing in some arguments that give whatever information is available to the agent at some point, and getting back a (real or predicted) decision. The world program is completely self-contained; other than through the argument it receives, it may not contain references to the agent’s choices at all. The strategy is similarly self-contained; it receives no information about the world except through the arguments the world program passes to it. Then separately from that, a “decision theory” is a function that takes a symbolic representation of a world program, and returns a symbolic representation of a strategy.
Ultimately, this amounts to a refactoring; results that hold in one system still hold in the other, if you map the definitions appropriately. However, I’ve found that structuring problems this way makes the theory easier to build on, and makes underspecified problems easier to notice.
The world program is completely self-contained; other than through the argument it receives, it may not contain references to the agent’s choices at all.
Can you formalize this requirement? If I copy agent’s code, rename all symbols, obfuscate it, simulate its execution in a source code interpreter that runs in a hardware emulator running on an emulated linux box running on javascript inside a browser running on Windows running on a hardware simulator implemented (and then obfuscated again) in the same language as the world program, and insert this thing in the world program (along with a few billion people and a planet and a universe), how can you possibly make sure that there is no dependence?
Can you formalize this requirement? If I copy agent’s code … and insert this thing in the world program, how can you possibly make sure that there is no dependence?
You don’t get to do that, because when you’re writing World, the Strategy hasn’t been determined yet. Think of it as a challenge-response protocol; World is a challenge, and Strategy is a response. You can still do agent-copying, but you have to enlarge the scope of World to include the rules by which that copying was done, or else you get unrelated agents instead of copies.
To copy agent’s code, you don’t need to know strategy. World naturally changes if you change it, and the strategy might change as well if you run the agent on a changed world, but agent’s code is still the same, and you know this code. The new world will only depend on the new strategy, not the old one, but now we have a world that depends on its agent’s strategy, and you won’t be able to find how it does, if you don’t already know.
In any case, all this copying is irrelevant, because the point is that there can exist very convoluted worlds that depend on agent’s action, but it’s not feasible to know that they do or how they do. And we don’t get to choose the real world.
Yeah, in ASP the predictor looks inside the agent. But the problem still seems “fair” in a certain sense, because the predictor’s prediction is logically tied to the agent’s decision. A stupid agent that one-boxed no matter what would get rewarded, and a stupid agent that two-boxed would get punished. In other words, I’d still like a decision theory that does well on some suitably defined class of ASP-like problems, even if that class is wider than the class of “TDT-fair” problems that Eliezer envisioned. Of course we need a lot of progress to precisely define such classes of problems, too.
In other words, I’d still like a decision theory that does well on some suitably defined class of ASP-like problems, even if that class is wider than the class of “TDT-fair” problems that Eliezer envisioned. Of course we need a lot of progress to precisely define such classes of problems, too.
It would be useful to have list of problems that TDT can handle, a list that current specifications of UDT can handle and a list that are still in the grey area of not quite resolved. Among other things that would make the difference between TDT and UDT far more intuitively clear!
The class of problems which are well-posed in this type system is exactly the class of problems that would not change if you gave the agent a chance to self-modify in between receiving the problem statement and the start of the world program. Problems outside this class are neither fair nor solvable in principle nor interesting.
I work under the assumption that the problem statement is a world program or a prior over world programs. Maybe it’s a bad assumption. Can you suggest a better one?
I use that same assumption, with only the slight caveat that I keep the world program and the preference function separate for clarity’s sake, and you need both, but I often see them combined into one function.
The big difference, I think, is that the way I do it, the world program doesn’t get a decision theory directly as input; instead, the world program’s source is given to the decision theory, the decision theory outputs a strategy, and then the strategy is given to the world program as input. This is a better match for how we normally talk about decision theory problems, and prevents a lot of shennanigans.
Of course the world program shouldn’t get the decision theory as input! In the formulation I always use, the world program doesn’t have any inputs, it’s a computation with no arguments that returns a utility value. You live in the world, so the world program contains your decision theory as a subroutine :-)
That’s very wrong. You’ve given up the distinctions between question and answer, and between answer and solution procedure. This also lets in stupid problems, like “you get a dollar iff the source code to your decision theory is prime”, which I would classify as not a decision theory problem at all.
Yeah, you can’t control arbitrary world programs that way, but remember that our world really is like that: a human mind runs within the same universe that it’s trying to control. One idea would be to define a class of “fair” world programs (perhaps those that can be factorized in appropriate ways) and care only about solving those. But I guess I’d better wait for your writeup, because I don’t understand what kind of formalism you prefer.
This also lets in stupid problems, like “you get a dollar iff the source code to your decision theory is prime”, which I would classify as not a decision theory problem at all.
In this case, the outcome is the same for every action you make. A decision theory can still consider this problem, and rightfully conclude that it’s irrelevant which action it takes (or maybe it infers that jumping on a right foot has a slightly higher chance of making Dark Lords of the Matrix to have written your source code so that it’s prime). I don’t see how this is a problem.
But who does the capturing? (To push the analogy one step up, I know of no solvable question that doesn’t admit an answer written on a sheet of paper.)
See the section on “utility functions” in this post. UDT/ADT seeks exactly to infer functional dependencies that can then be used in the usual manner (at least, this is one way to look at what’s going on). It is intended to solve the very problem which you argue must always be solvable.
My examples of explicit dependence bias show that there are wrong and right ways of parsing the outcome as depending on agent’s decision. If we are guaranteed to be given a right parsing, then that part is indeed covered, and there is no need to worry about the wrong ones. Believing that a right parsing merely exists doesn’t particularly help in finding one.
So I guess the problem you are having with UDT is that you assume that the problem of finding a correct explicit dependence of outcome on action is already solved, and we have a function World ready specified in a way that doesn’t in any way implicitly depend on agent’s actions. But UDT is intended to solve the problem while not making this assumption, and instead tries to find an unbiased dependence on its own. Since you assume the goal of UDT as a prerequisite in your thinking about decision theory, you don’t see the motivation for UDT, which indeed there would be none had we assumed this problem solved.
We’re talking past each other, and this back-and-forth conversation isn’t going anywhere because we’re starting from very different definitions. Let’s restart this conversation after I’ve finished the post that builds up the definitions from scratch.
At least address this concern, which suggests that our difficulty is probably an easy one of technical confusion, and not of communicating intuitive understanding of relevance of studying a certain question.
This whole thing seems like an artifact of failing to draw a boundary between decision theories and strategies. In my own work, I have for deterministic problems:
Phrased this way, the predictor is part of the World function, which means it is only allowed to simulate a Strategy, not a DecisionTheory, and so the problem as stated is ill-posed. This structure is required for decision theory in general to have correct answers, because otherwise you could construct a problem for any decision theory, no matter how ridiculous, which only that decision theory can win at.
I disagree. An agent can use any considerations whatsoever in making its decisions, and these considerations can refer to the world, or its own algorithm, or to the way the world depends on agent’s algorithm, or to the way the dependence of the world on agent’s algorithm depends on agent’s decision in a counterfactual world.
You can object that it’s not fair to pose before the agent problems that ask for recognition of facts outside some predefined class of non-ridiculous facts, but asking about which situations we are allowed to present before an agent is a wrong perspective. It is wrong because making an agent with certain characteristics automatically determines the facts of its success or failure in all of the possible scenarios, fair, unfair, plausible, and ridiculous.
So the only consideration that is allowed to dictate which considerations we are allowed to ignore is agent’s own preference. If the agent doesn’t care about influence of some fact, then it can ignore it. Typically, we won’t be able to formally point out any class of facts to which the agent is guaranteed to be even in principle indifferent. And so decision theory must not be typed.
(You can see a certain analogy with not demanding particular kind of proof. The agent is not allowed to reject my argument that a certain action is desirable, or undesirable, on the basis of the considerations I refer to not belonging to a particular privileged class, unless it really doesn’t care about those considerations or any or their logical consequences (according to agent’s normative theory of inference). See also explicit dependence bias.)
I think you’ve misunderstood just what restrictions this type schema imposes on problems. Could you provide a specific example of something you think it excludes, that it shouldn’t?
The ASP problem described in the OP is just such an example. Are you perhaps not convinced that it represents an aspect of important real-world problems?
What’s World? What do you mean by “World: Strategy->Outcome”? The problem is that if World is a function, it’s given by some syntactic specification, and the agent, being part of the world, can control which function this syntactic specification refers to, or which value this function has for a particular argument, in a way unrelated to its type, to passing of this Strategy thing as its argument. This is a typical example of explicit dependence bias: you declare that World (as a mathematical structure) depends on nothing, but it really does.
See example “world3” in my post for an example with Newcomb’s problem. There, World is given by a program “world3″ that takes agent’s action as parameter, but also depends on agent’s action implicitly in a way that isn’t captured by the interface of the function. The function in fact shows that two-boxing dominates one-boxing, and one can only see that it’s really worse by recognizing that function’s value when given agent’s actual action as parameter, is controlled by agent’s action in a way that makes one-boxing preferable. And so only seeing program “world3” as specifying a function is unhelpful.
It means “World is the type of function that takes a Strategy as input and returns an Outcomes.”
In the linked article, you’ve defined world, world2, and world3 as things that’re not actually functions; they have unbound references to agent, which are parameters in disguise. You then show that if you mix parameters-as-unbound-references with real parameters, you can get confused into thinking they’re independent. Which jus means you shouldn’t use unbound references.
How do you know that a given syntactic specification of a function doesn’t “use unbound references”? Presence of a logical dependence on agent’s action looks like something algorithmically impossible to detect. You can of course focus on a class of syntactic specifications that are known to not depend on agent’s action (for example, GLUTs), but this is too restrictive for a FAI-grade decision theory that can handle the actual world, and ignores the problem of whether the process that specified that GLUT could itself logically depend on agent’s action. (The very setup of a thought experiment could be controlled by an agent acting inside it, for example, for strange enough or sensitive-to-detail enough thought experiments.)
Give it as input to a compiler and see if it gives an error message or not. Or apply the same trivial procedure compilers use: read it and look for a definition for every symbol that is not itself the left hand side of a definition.
It doesn’t refer to any symbols, see in particular the difference between “world” and “world2″, and notice that “world3” doesn’t refer to “action()”, but instead to “action2()”, which you can assume to be a copy-paste of “action()”’s source code, with all the symbols renamed.
Ok, I think I see where our formalizations differ. In the formalization I’m using, the decision theory produces a strategy, which is a function that’s given as an argument to the world program. The world program invokes the strategy zero or more times, each time passing in some arguments that give whatever information is available to the agent at some point, and getting back a (real or predicted) decision. The world program is completely self-contained; other than through the argument it receives, it may not contain references to the agent’s choices at all. The strategy is similarly self-contained; it receives no information about the world except through the arguments the world program passes to it. Then separately from that, a “decision theory” is a function that takes a symbolic representation of a world program, and returns a symbolic representation of a strategy.
Ultimately, this amounts to a refactoring; results that hold in one system still hold in the other, if you map the definitions appropriately. However, I’ve found that structuring problems this way makes the theory easier to build on, and makes underspecified problems easier to notice.
Can you formalize this requirement? If I copy agent’s code, rename all symbols, obfuscate it, simulate its execution in a source code interpreter that runs in a hardware emulator running on an emulated linux box running on javascript inside a browser running on Windows running on a hardware simulator implemented (and then obfuscated again) in the same language as the world program, and insert this thing in the world program (along with a few billion people and a planet and a universe), how can you possibly make sure that there is no dependence?
You don’t get to do that, because when you’re writing World, the Strategy hasn’t been determined yet. Think of it as a challenge-response protocol; World is a challenge, and Strategy is a response. You can still do agent-copying, but you have to enlarge the scope of World to include the rules by which that copying was done, or else you get unrelated agents instead of copies.
To copy agent’s code, you don’t need to know strategy. World naturally changes if you change it, and the strategy might change as well if you run the agent on a changed world, but agent’s code is still the same, and you know this code. The new world will only depend on the new strategy, not the old one, but now we have a world that depends on its agent’s strategy, and you won’t be able to find how it does, if you don’t already know.
In any case, all this copying is irrelevant, because the point is that there can exist very convoluted worlds that depend on agent’s action, but it’s not feasible to know that they do or how they do. And we don’t get to choose the real world.
Yeah, in ASP the predictor looks inside the agent. But the problem still seems “fair” in a certain sense, because the predictor’s prediction is logically tied to the agent’s decision. A stupid agent that one-boxed no matter what would get rewarded, and a stupid agent that two-boxed would get punished. In other words, I’d still like a decision theory that does well on some suitably defined class of ASP-like problems, even if that class is wider than the class of “TDT-fair” problems that Eliezer envisioned. Of course we need a lot of progress to precisely define such classes of problems, too.
It would be useful to have list of problems that TDT can handle, a list that current specifications of UDT can handle and a list that are still in the grey area of not quite resolved. Among other things that would make the difference between TDT and UDT far more intuitively clear!
The class of problems which are well-posed in this type system is exactly the class of problems that would not change if you gave the agent a chance to self-modify in between receiving the problem statement and the start of the world program. Problems outside this class are neither fair nor solvable in principle nor interesting.
I work under the assumption that the problem statement is a world program or a prior over world programs. Maybe it’s a bad assumption. Can you suggest a better one?
I use that same assumption, with only the slight caveat that I keep the world program and the preference function separate for clarity’s sake, and you need both, but I often see them combined into one function.
The big difference, I think, is that the way I do it, the world program doesn’t get a decision theory directly as input; instead, the world program’s source is given to the decision theory, the decision theory outputs a strategy, and then the strategy is given to the world program as input. This is a better match for how we normally talk about decision theory problems, and prevents a lot of shennanigans.
Of course the world program shouldn’t get the decision theory as input! In the formulation I always use, the world program doesn’t have any inputs, it’s a computation with no arguments that returns a utility value. You live in the world, so the world program contains your decision theory as a subroutine :-)
That’s very wrong. You’ve given up the distinctions between question and answer, and between answer and solution procedure. This also lets in stupid problems, like “you get a dollar iff the source code to your decision theory is prime”, which I would classify as not a decision theory problem at all.
Yeah, you can’t control arbitrary world programs that way, but remember that our world really is like that: a human mind runs within the same universe that it’s trying to control. One idea would be to define a class of “fair” world programs (perhaps those that can be factorized in appropriate ways) and care only about solving those. But I guess I’d better wait for your writeup, because I don’t understand what kind of formalism you prefer.
In this case, the outcome is the same for every action you make. A decision theory can still consider this problem, and rightfully conclude that it’s irrelevant which action it takes (or maybe it infers that jumping on a right foot has a slightly higher chance of making Dark Lords of the Matrix to have written your source code so that it’s prime). I don’t see how this is a problem.
But the world is really like that. It is essential to attain this.
I don’t know of any important aspect of the world that can’t be captured in question-answer-procedure format. Could you give an example?
But who does the capturing? (To push the analogy one step up, I know of no solvable question that doesn’t admit an answer written on a sheet of paper.)
See the section on “utility functions” in this post. UDT/ADT seeks exactly to infer functional dependencies that can then be used in the usual manner (at least, this is one way to look at what’s going on). It is intended to solve the very problem which you argue must always be solvable.
My examples of explicit dependence bias show that there are wrong and right ways of parsing the outcome as depending on agent’s decision. If we are guaranteed to be given a right parsing, then that part is indeed covered, and there is no need to worry about the wrong ones. Believing that a right parsing merely exists doesn’t particularly help in finding one.
So I guess the problem you are having with UDT is that you assume that the problem of finding a correct explicit dependence of outcome on action is already solved, and we have a function World ready specified in a way that doesn’t in any way implicitly depend on agent’s actions. But UDT is intended to solve the problem while not making this assumption, and instead tries to find an unbiased dependence on its own. Since you assume the goal of UDT as a prerequisite in your thinking about decision theory, you don’t see the motivation for UDT, which indeed there would be none had we assumed this problem solved.
We’re talking past each other, and this back-and-forth conversation isn’t going anywhere because we’re starting from very different definitions. Let’s restart this conversation after I’ve finished the post that builds up the definitions from scratch.
At least address this concern, which suggests that our difficulty is probably an easy one of technical confusion, and not of communicating intuitive understanding of relevance of studying a certain question.