Thanks for the link! I appreciate your write-ups. A few points:
1. As you’ve already noticed, your anti-newcomb problem an instance of Dr. Nick Bone’s “problematic problems”. Benja actually gave a formalism of the general class of problems in the context of provability logic in a recent forum post. We dub these problems “evil problems,” and I’m not convinced that your XDT is a sane way to deal with evil problems.
For one thing, every decision theory has an evil problem. As shown in the links above, even in if we consider “fair” games, there is always a problem that punishes a decision theory for acting like it does and rewards other decision problems for acting differently. XDT does not escape this problem. For example, consider the following scenario: there are two actions, 0 and 1. Any agent that takes the action which XDT does not take, scores ten points. All other agents score zero points. In this scenario, CDT scores 10, but XDT scores 0.
So while XDT two-boxes on its own anti-newcomb problem, it is still sometimes out-performed by CDT. Or, in other words, the sort of optimality that your XDT seems to be searching for is not a very good notion of optimality. (There are other notions of optimality that I’m more partial to, although they are not entirely satisfactory.) Finding the right notion of “optimality” is most of the problem, but I don’t think the notion of optimality that XDT seems to be searching for is a very good one.
Specifically, this notion that “a good decision theory two-boxes on its anti-newcomb problem” strikes me as a terrible plan! Correct me if I’m wrong, but I think that the reasoning you’re using goes something like this: (a) UDT does not perform optimally on its anti-newcomb problem. (b) The ideal decision theory would perform optimally on its anti-newcomb problem. (c) But given how the anti-newcomb problem is defined, that means that the ideal decision theory would two-box on its own newcomb problem. (d) Therefore I want to design an agent that two-boxes on its own anti-newcomb problem.
But this doesn’t seem like the sort of reasoning that leads one to pick a sane decision theory: you can’t build an agent that wins its own anti-newcomb problem (in the sense of getting $1001000) but you can build one that logically controls whether it gets $1000000 or $1000. The above reasoning process selects a decision theory that logically-causes the worse outcome, and I don’t think that’s the right move.
2. All these agents reason by conditioning on statements which are false (such as “what if the predecessor wrote my code except with this line prepended?”). The resulting agents will obviously fail on a large class of problems; in particular, they’ll fail on problems where the payoff depends upon the facts that are violated.
For simple “unfair” games (in the sense defined in the links above) where this occurs, consider scenarios where the agent is paid if and only if the length of its program is exactly a certain length: clearly, agents (5) and (6) could be severely misled in games like these. If you’re only trying to make agents that work well on “fair” games (where the obvious formalization of “fair” is “extensional” as defined above), then you should probably make that much more explicit :-)
For “fair” games where the counterfactuals considered by these agents will be misleading, consider modal combat type scenarios, where the agent is reasoning about other agents that are reasoning about the first agent’s source code. In these cases, there seems to be no guarantee that the logical conditional (on a false statement) is going to give a sane counterfactual (e.g. one where the extra line of code was prepended both to the agent’s actual source, and to the source code that the opponent is reading.) See also my post on why conditionals are not counterfactuals.
To make this point slightly more general, it seems like all of these agents are depending pretty heavily on the “logical conditional” black box working out correctly. If you assume that conditioning on a false logical fact magically gets you all the right counterfactuals, then these decision theories make more sense. However, these decision theories all strike me as explorations about what happens when you put the logical-counterfactual-black-box in various new scenarios. (What happens when we condition on the parent’s output? What happens when we condition on the program having a line prepended? etc.) Whereas the type of progress that we’ve been trying to make in decision theory is mostly geared towards opening the black box: how, in theory, could we design a logical-counterfactual-box that reliably works as intended?
Your agents seem to be assuming that that part of the problem is solved, which doesn’t seem to be the case. As such, I have the impression that the agents you define in this post, while interesting, aren’t really attacking the core of the problem, which is this: how can one reason under false premises?
3. You say
It is often claimed that the use of logical uncertainty in UDT allows for agents in different universes to reach a Pareto optimal outcome using acausal trade. If this is the case, then agents which have the same utility function should cooperate acausally with ease.
but I’m very skeptical. First of all, it would sure be nice if we could formally show that UDT-type agents always end up making intuitively-good trades, but it turns out that that’s a big hairy problem (Wei Dai pointed out this comment thread).
Secondly, what makes you think that the agent defined by equation (6) is a UDT? I am not even convinced that it trades with itself (in, say, a counterfactual mugging), nevermind other UDT agents.
You also said “this argument should also make the use of full input-output mappings redundant in usual UDT,” and I think this indicates a misunderstanding of updatelessness. UDT doesn’t have some magical trades-with-other-UDTs property; rather, UDT choosing strategies without regard for its inputs is the mechanism by which it is able to trade with counterfactual versions of itself. If you take UDT and alter it so that it considers its input (instead of all I/O maps), then you get TDT, which definitely fails to trade with counterfactual versions of itself.
You can’t say “updateless decision theory trades with counterfactual versions of itself, therefore it would still do so if it we took away the updatelessness,” because the updatelessness is how it’s able to make those trades! For similar reasons, I’m quite unconvinced that the agents of equations (5) or (6) perform well in situations such as the counterfactual mugging.
4. And, finally, I’m not quite sure why you’re so concerned with avoiding quining here. The big problems, as I see them, are things like “what sort of logical conditioning mechanism gives us good logical counterfactuals?”, and “how do multiple slightly-assymmetric UDT agents actually divide trade gains?”, and “how do we resolve the problem where sometimes it seems like agents with less computing power have some sort of logical-first-mover-advantage?”, and so on.
The use of quining in UDT doesn’t seem to have any fundamental bearing on these questions (and indeed, I think there’s a pretty simple modification to Vladimir Slepnev’s original formalism that has the agent reason according to a distribution over its source code, instead of assuming it has a perfect quine), and therefore I don’t quite understand the malcontent.
With all that out of the way, I’d also like to say: Nice work! You’re clearly doing lots of in-depth thinking about the big decision theory problems, and I definitely applaud the effort. There are certainly some places where our thinking has diverged, but it’s also clear that you’re able to think about these things on your own and generate novel & interesting ideas, and that’s definitely something I want to encourage!
One particular thing that I wanted to emphasize is that I think you can see as a thread on this forum (in particular, the modal UDT work is relevant) is that it’s useful to make formal toy models where the math is fully specified, so that you can prove theorems about what exactly an agent would do (or, sometimes, write a program that figures it out for you). When you write out things that explicitly, then, for example, it becomes clearer that you need to assume that a decision problem is “fair” (extensional) to get certain results, as Nate points out (or if you don’t assume it, someone else can look at your result and point out that it’s not true as stated). In your post, you’re using “logical expectations” that condition on something being true, without defining exactly what all of this means, and as a result you can argue about what these agents will do, but not actually prove it; that’s certainly a reasonable part of the research process, but I’d like to encourage you to turn your work into models that are fully specified, so that you can actually prove theorems about them.
I agree that it’s best to work on fully specified models. Hopefully, soon I will write about my own approach to logical uncertainty via complexity theory.
It seems to me this problem can be avoided by allowing access to random bits. See my reply to KnaveOfAllTrades and my reply to V_V. Formally, we should allow pi in (4′) to be a random algorithm.
...The above reasoning process selects a decision theory that logically-causes the worse outcome, and I don’t think that’s the right move.
I don’t think “logical causation” in the sense you are using here is the right way to think about the anti-Newcomb problem. From the precursor’s point of view, there is no loss in utility due of choosing XDT over UDT.
If you’re only trying to make agents that work well on “fair” games (where the obvious formalization of “fair” is “extensional” as defined above), then you should probably make that much more explicit.
Of course. I didn’t attempt to formalize “fairness” at that post but the idea is approaching optimality for decision-determined problems in the sense of Yudkowsky 2010.
I have the impression that the agents you define in this post, while interesting, aren’t really attacking the core of the problem, which is this: how can one reason under false premises?
I realize that the logical expectation values I’m using are so far mostly wishful thinking. However, I think there is benefit in attacking the problems from both ends: understanding the usage of logical probabilities may shed light on the desirada they should satisfy.
...UDT choosing strategies without regard for its inputs is the mechanism by which it is able to trade with counterfactual versions of itself.
Consider two UDT agents A & B with identical utility functions living in different universes. Each of the agents is charged with making a certain decision, while receiving no input. If both agents are aware of each other’s existence, we expect [in the sense of “hope” rather than “are able to prove” :)] them to make decisions that will maximizing overall utility, even though that on the surface, each agent is only maximizing over its own decisions rather than the decisions of both agents.
What is the difference between this scenario and the scenario of a single agent existing in both universes which receives a single bit of input that indicates in which universe the given copy is?
… I’m not quite sure why you’re so concerned with avoiding quining here.
...how do we resolve the problem where sometimes it seems like agents with less computing power have some sort of logical-first-mover-advantage?
You’re referring to the agent-simulates-predictor problem? Actually, I think my (4′) may contain a clue for solving it. As I commented, the logical expectation values should only use about as much computer power as the precursor has rather than as much computing power as the successor has. Therefore, if the predictor is as at least strong as the precursor, the successor wins by choosing a policy pi which is a UDT agent symmetric to the predictor.
There are certainly some places where our thinking has diverged...
Hopefully, further discussion will lead us to a practical demonstration of Aumann’s theorem :)
Thanks for the link! I appreciate your write-ups. A few points:
1. As you’ve already noticed, your anti-newcomb problem an instance of Dr. Nick Bone’s “problematic problems”. Benja actually gave a formalism of the general class of problems in the context of provability logic in a recent forum post. We dub these problems “evil problems,” and I’m not convinced that your XDT is a sane way to deal with evil problems.
For one thing, every decision theory has an evil problem. As shown in the links above, even in if we consider “fair” games, there is always a problem that punishes a decision theory for acting like it does and rewards other decision problems for acting differently. XDT does not escape this problem. For example, consider the following scenario: there are two actions, 0 and 1. Any agent that takes the action which XDT does not take, scores ten points. All other agents score zero points. In this scenario, CDT scores 10, but XDT scores 0.
So while XDT two-boxes on its own anti-newcomb problem, it is still sometimes out-performed by CDT. Or, in other words, the sort of optimality that your XDT seems to be searching for is not a very good notion of optimality. (There are other notions of optimality that I’m more partial to, although they are not entirely satisfactory.) Finding the right notion of “optimality” is most of the problem, but I don’t think the notion of optimality that XDT seems to be searching for is a very good one.
Specifically, this notion that “a good decision theory two-boxes on its anti-newcomb problem” strikes me as a terrible plan! Correct me if I’m wrong, but I think that the reasoning you’re using goes something like this: (a) UDT does not perform optimally on its anti-newcomb problem. (b) The ideal decision theory would perform optimally on its anti-newcomb problem. (c) But given how the anti-newcomb problem is defined, that means that the ideal decision theory would two-box on its own newcomb problem. (d) Therefore I want to design an agent that two-boxes on its own anti-newcomb problem.
But this doesn’t seem like the sort of reasoning that leads one to pick a sane decision theory: you can’t build an agent that wins its own anti-newcomb problem (in the sense of getting $1001000) but you can build one that logically controls whether it gets $1000000 or $1000. The above reasoning process selects a decision theory that logically-causes the worse outcome, and I don’t think that’s the right move.
2. All these agents reason by conditioning on statements which are false (such as “what if the predecessor wrote my code except with this line prepended?”). The resulting agents will obviously fail on a large class of problems; in particular, they’ll fail on problems where the payoff depends upon the facts that are violated.
For simple “unfair” games (in the sense defined in the links above) where this occurs, consider scenarios where the agent is paid if and only if the length of its program is exactly a certain length: clearly, agents (5) and (6) could be severely misled in games like these. If you’re only trying to make agents that work well on “fair” games (where the obvious formalization of “fair” is “extensional” as defined above), then you should probably make that much more explicit :-)
For “fair” games where the counterfactuals considered by these agents will be misleading, consider modal combat type scenarios, where the agent is reasoning about other agents that are reasoning about the first agent’s source code. In these cases, there seems to be no guarantee that the logical conditional (on a false statement) is going to give a sane counterfactual (e.g. one where the extra line of code was prepended both to the agent’s actual source, and to the source code that the opponent is reading.) See also my post on why conditionals are not counterfactuals.
To make this point slightly more general, it seems like all of these agents are depending pretty heavily on the “logical conditional” black box working out correctly. If you assume that conditioning on a false logical fact magically gets you all the right counterfactuals, then these decision theories make more sense. However, these decision theories all strike me as explorations about what happens when you put the logical-counterfactual-black-box in various new scenarios. (What happens when we condition on the parent’s output? What happens when we condition on the program having a line prepended? etc.) Whereas the type of progress that we’ve been trying to make in decision theory is mostly geared towards opening the black box: how, in theory, could we design a logical-counterfactual-box that reliably works as intended?
Your agents seem to be assuming that that part of the problem is solved, which doesn’t seem to be the case. As such, I have the impression that the agents you define in this post, while interesting, aren’t really attacking the core of the problem, which is this: how can one reason under false premises?
3. You say
but I’m very skeptical. First of all, it would sure be nice if we could formally show that UDT-type agents always end up making intuitively-good trades, but it turns out that that’s a big hairy problem (Wei Dai pointed out this comment thread).
Secondly, what makes you think that the agent defined by equation (6) is a UDT? I am not even convinced that it trades with itself (in, say, a counterfactual mugging), nevermind other UDT agents.
You also said “this argument should also make the use of full input-output mappings redundant in usual UDT,” and I think this indicates a misunderstanding of updatelessness. UDT doesn’t have some magical trades-with-other-UDTs property; rather, UDT choosing strategies without regard for its inputs is the mechanism by which it is able to trade with counterfactual versions of itself. If you take UDT and alter it so that it considers its input (instead of all I/O maps), then you get TDT, which definitely fails to trade with counterfactual versions of itself.
You can’t say “updateless decision theory trades with counterfactual versions of itself, therefore it would still do so if it we took away the updatelessness,” because the updatelessness is how it’s able to make those trades! For similar reasons, I’m quite unconvinced that the agents of equations (5) or (6) perform well in situations such as the counterfactual mugging.
4. And, finally, I’m not quite sure why you’re so concerned with avoiding quining here. The big problems, as I see them, are things like “what sort of logical conditioning mechanism gives us good logical counterfactuals?”, and “how do multiple slightly-assymmetric UDT agents actually divide trade gains?”, and “how do we resolve the problem where sometimes it seems like agents with less computing power have some sort of logical-first-mover-advantage?”, and so on.
The use of quining in UDT doesn’t seem to have any fundamental bearing on these questions (and indeed, I think there’s a pretty simple modification to Vladimir Slepnev’s original formalism that has the agent reason according to a distribution over its source code, instead of assuming it has a perfect quine), and therefore I don’t quite understand the malcontent.
With all that out of the way, I’d also like to say: Nice work! You’re clearly doing lots of in-depth thinking about the big decision theory problems, and I definitely applaud the effort. There are certainly some places where our thinking has diverged, but it’s also clear that you’re able to think about these things on your own and generate novel & interesting ideas, and that’s definitely something I want to encourage!
Want to echo Nate’s points!
One particular thing that I wanted to emphasize is that I think you can see as a thread on this forum (in particular, the modal UDT work is relevant) is that it’s useful to make formal toy models where the math is fully specified, so that you can prove theorems about what exactly an agent would do (or, sometimes, write a program that figures it out for you). When you write out things that explicitly, then, for example, it becomes clearer that you need to assume that a decision problem is “fair” (extensional) to get certain results, as Nate points out (or if you don’t assume it, someone else can look at your result and point out that it’s not true as stated). In your post, you’re using “logical expectations” that condition on something being true, without defining exactly what all of this means, and as a result you can argue about what these agents will do, but not actually prove it; that’s certainly a reasonable part of the research process, but I’d like to encourage you to turn your work into models that are fully specified, so that you can actually prove theorems about them.
Hi Benja, thx for commenting!
I agree that it’s best to work on fully specified models. Hopefully, soon I will write about my own approach to logical uncertainty via complexity theory.
Hi Nate, thx for commenting!
It seems to me this problem can be avoided by allowing access to random bits. See my reply to KnaveOfAllTrades and my reply to V_V. Formally, we should allow pi in (4′) to be a random algorithm.
I don’t think “logical causation” in the sense you are using here is the right way to think about the anti-Newcomb problem. From the precursor’s point of view, there is no loss in utility due of choosing XDT over UDT.
Of course. I didn’t attempt to formalize “fairness” at that post but the idea is approaching optimality for decision-determined problems in the sense of Yudkowsky 2010.
I realize that the logical expectation values I’m using are so far mostly wishful thinking. However, I think there is benefit in attacking the problems from both ends: understanding the usage of logical probabilities may shed light on the desirada they should satisfy.
Consider two UDT agents A & B with identical utility functions living in different universes. Each of the agents is charged with making a certain decision, while receiving no input. If both agents are aware of each other’s existence, we expect [in the sense of “hope” rather than “are able to prove” :)] them to make decisions that will maximizing overall utility, even though that on the surface, each agent is only maximizing over its own decisions rather than the decisions of both agents.
What is the difference between this scenario and the scenario of a single agent existing in both universes which receives a single bit of input that indicates in which universe the given copy is?
See my reply to Wei Dai.
You’re referring to the agent-simulates-predictor problem? Actually, I think my (4′) may contain a clue for solving it. As I commented, the logical expectation values should only use about as much computer power as the precursor has rather than as much computing power as the successor has. Therefore, if the predictor is as at least strong as the precursor, the successor wins by choosing a policy pi which is a UDT agent symmetric to the predictor.
Hopefully, further discussion will lead us to a practical demonstration of Aumann’s theorem :)