Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn’t care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be “We’ll examine your source code and punish you iff you’re a CDT agent, but we won’t punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output.” The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.
More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by ‘the sort of decision you make in the world that you actually encounter, having the algorithm that you do’. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn’t, i.e., the Counterfactual Mugging. (Parfit’s Hitchhiker is outside TDT’s problem class, and in UDT, because the car-driver asks “What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb’s Box. We can clearly see how to maximize on the problem but it’s in UDT’s class of ‘fair’ scenarios, not TDT’s class.)
If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT’s class of ‘fair’ problems strictly including all CDT-fair problems.
If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.
If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.
This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other’s source code and trying to prove things about each other, calling this ‘unfair’ is uninteresting. Reality is always the most important domain to maximize over.
This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit’s Hitchhiker and TDT can’t).
If that’s the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT’s advantages to be counterbalanced by disadvantages?
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
I expect the ultimate theory to look more like “TDT modded to handle UDT’s class of problems and blackmail and anything else we end up throwing at it” than “UDT modded to be naturalistic and etc”, but I could be wrong—others have different intuitions about this.
As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don’t understand where it’s assuming its own Cartesian bubble. Can you explain?
The version I saw involved a Universe computation which accepts an Agent function and then computes itself, with the Agent makings it choices based on its belief about the Universe? That seemed to me like a pretty clean split.
No, the version we’ve been discussing for the last several years involves an argumentless Universe function that contains the argumentless Agent function as a part. Agent knows the source code of Agent (via quining) and the source code of Universe, but does not apriori know which part of the Universe is the Agent. The code of Universe might be mixed up so it’s hard to pick out copies of Agent. Then Agent tries to prove logical statements of the form “if Agent returns a certain value, then Universe returns a certain value”. As you can see, that automatically takes into account the logical correlates of Agent as well.
I find it rather disappointing that the UDT people and the TDT people have seemingly not been communicating very efficiently with each other in the last few years...
I think what has happened is that most of the LW people working on decision theory in the past few years have been working with different variations on UDT, while Eliezer hasn’t participated much in the discussions due to being preoccupied with other projects. It seems understandable that he saw some ideas that somebody was playing with, and thought that everyone was assuming something similar.
I think you must have been looking at someone else’s idea. None of the versions of UDT that I’ve proposed are like this. See my original UDT post for the basic setup, which all of my subsequent proposals share.
“The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it.” A big computation with embedded discrete copies of S seems to me like a different concept from doing logical updates on a big graph with causal and logical nodes, some of which may correlate to you even if they are not exact copies of you.
The sentence you quoted was just trying to explain how “physical consequences” might be interpreted as “logical consequences” and therefore dealt with within the UDT framework (which doesn’t natively have a concept of “physical consequences”). It wasn’t meant to suggest that UDT only works if there are discrete copies of S in the universe.
In that same post I also wrote, “A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S’ which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs.”
I guess I didn’t explicitly write about parts of the universe that are “correlate to you” as opposed to having more exact logical relationships with you, but given how UDT is supposed to work, it was meant to just handle them naturally. At least I don’t see why it wouldn’t do so as well as TDT (assuming it had access to your “general-logical-consequence algorithm” which I’m guessing is the same thing as my “math intuition module”).
FWIW, as far as I can remember I’ve always understood this the same way as Wei and cousin_it. (cousin_it was talking about the later logic-based work rather than Wei’s original post, but that part of the idea is common between the two systems.) If the universe is a Game of Life automaton initialized with some simple configuration which, when run with unlimited resources and for a very long time, eventually by evolution and natural selection produces a structure that is logically equivalent to the agent’s source code, that’s sufficient for falling under the purview of the logic-based versions of UDT, and Wei’s informal (underspecified) probabilistic version would not even require equivalence. There’s nothing Cartesian about UDT.
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable.
I’m not so sure about this one…
It seems that UDT would be deciding “If blackmailed, pay or don’t pay” without knowing whether it actually will be blackmailed yet. Assuming it knows the payoffs the other agent receives, it would reason “If a pay if blackmailed...I get blackmailed, whereas if I don’t pay if blackmailed...I don’t get blackmailed. I therefore should never pay if blackmailed”, unless there’s something I’m missing.
TDT’s reply to this is a bit more specific.
Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn’t care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be “We’ll examine your source code and punish you iff you’re a CDT agent, but we won’t punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output.” The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.
More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by ‘the sort of decision you make in the world that you actually encounter, having the algorithm that you do’. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn’t, i.e., the Counterfactual Mugging. (Parfit’s Hitchhiker is outside TDT’s problem class, and in UDT, because the car-driver asks “What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb’s Box. We can clearly see how to maximize on the problem but it’s in UDT’s class of ‘fair’ scenarios, not TDT’s class.)
If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT’s class of ‘fair’ problems strictly including all CDT-fair problems.
If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.
If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.
This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other’s source code and trying to prove things about each other, calling this ‘unfair’ is uninteresting. Reality is always the most important domain to maximize over.
I’d just like to say that this comparison of CDT, TDT, and UDT was a very good explanation of the differences. Thanks for that.
Agreed. Found the distinction between TDT and UDT especially clear here.
This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit’s Hitchhiker and TDT can’t).
If that’s the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT’s advantages to be counterbalanced by disadvantages?
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
I expect the ultimate theory to look more like “TDT modded to handle UDT’s class of problems and blackmail and anything else we end up throwing at it” than “UDT modded to be naturalistic and etc”, but I could be wrong—others have different intuitions about this.
UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don’t understand where it’s assuming its own Cartesian bubble. Can you explain?
The version I saw involved a Universe computation which accepts an Agent function and then computes itself, with the Agent makings it choices based on its belief about the Universe? That seemed to me like a pretty clean split.
No, the version we’ve been discussing for the last several years involves an argumentless Universe function that contains the argumentless Agent function as a part. Agent knows the source code of Agent (via quining) and the source code of Universe, but does not apriori know which part of the Universe is the Agent. The code of Universe might be mixed up so it’s hard to pick out copies of Agent. Then Agent tries to prove logical statements of the form “if Agent returns a certain value, then Universe returns a certain value”. As you can see, that automatically takes into account the logical correlates of Agent as well.
I find it rather disappointing that the UDT people and the TDT people have seemingly not been communicating very efficiently with each other in the last few years...
I think what has happened is that most of the LW people working on decision theory in the past few years have been working with different variations on UDT, while Eliezer hasn’t participated much in the discussions due to being preoccupied with other projects. It seems understandable that he saw some ideas that somebody was playing with, and thought that everyone was assuming something similar.
Yes. And now, MIRI is planning a decision theory workshop (for September) so that some of this can be hashed out.
I honestly thought we’d been communicating. Posting all our work on LW and all that. Eliezer’s comment surprised me. Still not sure how to react...
UDT can be modeled with a Universe computation that takes no arguments.
I think you must have been looking at someone else’s idea. None of the versions of UDT that I’ve proposed are like this. See my original UDT post for the basic setup, which all of my subsequent proposals share.
“The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it.” A big computation with embedded discrete copies of S seems to me like a different concept from doing logical updates on a big graph with causal and logical nodes, some of which may correlate to you even if they are not exact copies of you.
The sentence you quoted was just trying to explain how “physical consequences” might be interpreted as “logical consequences” and therefore dealt with within the UDT framework (which doesn’t natively have a concept of “physical consequences”). It wasn’t meant to suggest that UDT only works if there are discrete copies of S in the universe.
In that same post I also wrote, “A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S’ which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs.”
I guess I didn’t explicitly write about parts of the universe that are “correlate to you” as opposed to having more exact logical relationships with you, but given how UDT is supposed to work, it was meant to just handle them naturally. At least I don’t see why it wouldn’t do so as well as TDT (assuming it had access to your “general-logical-consequence algorithm” which I’m guessing is the same thing as my “math intuition module”).
FWIW, as far as I can remember I’ve always understood this the same way as Wei and cousin_it. (cousin_it was talking about the later logic-based work rather than Wei’s original post, but that part of the idea is common between the two systems.) If the universe is a Game of Life automaton initialized with some simple configuration which, when run with unlimited resources and for a very long time, eventually by evolution and natural selection produces a structure that is logically equivalent to the agent’s source code, that’s sufficient for falling under the purview of the logic-based versions of UDT, and Wei’s informal (underspecified) probabilistic version would not even require equivalence. There’s nothing Cartesian about UDT.
I’m not so sure about this one… It seems that UDT would be deciding “If blackmailed, pay or don’t pay” without knowing whether it actually will be blackmailed yet. Assuming it knows the payoffs the other agent receives, it would reason “If a pay if blackmailed...I get blackmailed, whereas if I don’t pay if blackmailed...I don’t get blackmailed. I therefore should never pay if blackmailed”, unless there’s something I’m missing.