There’s a difference between reasoning about your mind and actually reading your mind. CDT certainly faces situations in which it is advantageous to convince others that it does not follow CDT. On the other hand, this is simply behaving in a way that leads to the desired outcome. This is different from facing situations where you can only convince people of this by actually self-modifying. Those situations only occur when other people can actually read your mind.
dankane
Actually, I take it back. Depending on how you define things, UDT can still lose. Consider the following game:
I will clone you. One of the clones I paint red and the other I paint blue. The red clone I give $1000000 and the blue clone I fine $1000000. UDT clearly gets expectation 0 out of this. SMCDT however can replace its code with the following: If you are painted blue: wipe your hard drive If you are painted red: change your code back to standard SMCDT
Thus, SMCDT never actually has to play blue in this game, while UDT does.
OK. Fine. I will grant you this:
UDT is provably optimal if it has correct priors over possible universes and the universe can read its mind only through determining its behavior in hypothetical situations (because UDT basically is just find the behavior pattern that optimizes expected utility and implement that).
On the other hand, SMCDT is provably optimal in situations where it has an accurate posterior probability distribution, and where the universe can read its mind but not its initial state (because it just instantly self-modifies to the optimally performing program).
I don’t see why the former set of restrictions is any more reasonable than the latter, and at least for SMCDT you can figure out what it would do in a given situation without first specifying a prior over possible universes.
I’m also not convinced that it is even worth spending so much effort trying to decide the optimal decision theory in situations where the universe can read your mind. This is not a realistic model to begin with.
Which is actually one of the annoying things about UDT. Your strategy cannot depend simply on your posterior probability distribution, it has to depend on your prior probability distribution. How you even in practice determine your priors for Newcomb vs. anti-Newcomb is really beyond me.
But in any case, assuming that one is more common, UDT does lose this game.
Yes. And likewise if you put an unconditional extortion-refuser in an environment populated by unconditional extortionists.
Fine. How about this: “Have $1000 if you would have two-boxed in Newcomb’s problem.”
Only if the adversary makes its decision to attempt extortion regardless of the probability of success.
And thereby the extortioner’s optimal strategy is to extort independently of the probably of success. Actually, this is probably true is a lot of real cases (say ransomware) where the extortioner cannot actually ascertain the probably of success ahead of time.
Well, if the universe cannot read your source code, both agents are identical and provably optimal. If the universe can read your source code, there are easy scenarios where one or the other does better. For example,
“Here have $1000 if you are a CDT agent” Or “Here have $1000 if you are a UDT agent”
Eliezer thinks his TDT will refuse to give in to blackmail, because outputting another answer would encourage other rational agents to blackmail it.
This just means that TDT loses in honest one-off blackmail situations (in reality, you don’t give in to blackmail because it will cause other people to blackmail you whether or not you then self-modify to never give into blackmail again). TDT only does better if the potential blackmailers read your code in order to decide whether or not blackmail will be effective (and then only if your priors say that such blackmailers are more likely than anti-blackmailers who give you money if they think you would have given into blackmail). Then again, if the blackmailers think that you might be a TDT agent, they just need to precommit to using blackmail whether or not they believe that it will be effective.
Actually, this suggests that blackmail is a game that TDT agents really lose badly at when playing against each other. The TDT blackmailer will decide to blackmail regardless of effectiveness and the TDT blackmailee will decide to ignore the blackmail, thus ending in the worst possible outcome.
For analogous reasons, a CDT agent would self-modify to do well on all Newcomblike problems that it would face in the future (e.g., it would precommit generally)
I am not convinced that this is the case. A self-modifying CDT agent is not caused to self-modify in favor of precommitment by facing a scenario in which precommitment would have been useful, but instead by evidence that such scenarios will occur in the future (and in fact will occur with greater frequency than scenarios that punish you for such precommitments).
Anyone who can credibly claim to have knowledge of the agent’s original decision algorithm (e.g. a copy of the original source) can put the agent into such a situation, and in certain exotic cases this can be used to “blackmail” the agent in such a way that, even if it expects the scenario to happen, it still fails (for the same reason that CDT twoboxes even though it would precommit to oneboxing).
Actually, this seems like a bigger problem with UDT to me than with SMCDT (self-modifying CDT). Either type of program can be punished for being instantiated with the wrong code, but only UDT can be blackmailed into behaving differently by putting it in a Newcomb-like situation.
The story idea you had wouldn’t work. Against a SMCDT agent, all that getting the AIs original code would allow people to do is to laugh at it for having been instantiated with code that is punished by the scenario they are putting it in. You manipulate a SMCDT agent by threatening to get ahold of its future code and punishing it for not having self-modified. On the other hand, against a UDT agent you could do stuff. You just have to tell it “we’re going to simulate you and if the simulation behaves poorly, we will punish the real you”. This causes the actual instantiation to change its behavior if it’s a UDT agent but not if it’s a CDT agent.
On the other hand, all reasonable self-modifying agents are subject to blackmail. You just have to tell them “every day that you are not running code with property X, I will charge you $1000000”.
I guess I’ll see your later posts then, but I’m not quite sure how this could be the case. If self-modifying-CDT is considering making a self modification that will lead to a bad solution, it seems like it should realize this and instead not make that modification.
OK. I’ll bite. What’s so important about reflective stability? You always alter your program when you come across new data. Now sure we usually think about this in terms of running the same program on a different data set, but there’s no fundamental program/data distinction.
The acting differently when choosing a program and being in the scenario is perhaps worrying, but I think that it’s intrinsic to the situation we are in when your outcomes are allowed to depend on the behavior of counterfactual copies of you.
For example, consider the following pair of games. In REWARD, you are offered $1000. You can choose whether or not to accept. That’s it. In PUNISHMENT, you are penalized $1000000 if you accepted the money in REWARD. Thus programs win PUNISHMENT if and only if they lose REWARD. If you want to write a program to play one it will necessarily differ from the program you would write to play the other. In fact the program playing PUNISHMENT will behave differently than the program you would have written to play the (admittedly counterfactual) subgame of REWARD. How is this any worse than what CDT does with PD?
I think even TDT says that you should 2-box in Newcomb’s problem when the box is full if and only if false.
But more seriously, presumably in your scenario the behavior of a “perfectly rational agent” actually means the behavior of an agent whose behavior is specified by some fixed, known program. In this case, the participant can determine whether or not the box is full. Thus, either the box is always full or the box is always empty, and the participant knows which is the case. If you are playing Newcomb’s problem with the box always full, you 2-box. If you play Newcomb’s problem with the box always empty, you 2-box. Therefore you 2-box. Therefore, the perfectly rational agent 2-boxes. Therefore, the box is always empty.
OK. OK. OK. You TDT people will say something like “but I am a perfectly rational agent and therefore my actions are non-causally related to whether or not the box is full, thus I should 1-box as it will cause the box to be full.” On the other hand, if I modify your code to 2-box in this type of Newcomb’s problem you do better and thus you were never perfectly rational to begin with.
On the other hand, if the universe can punish you directly (i.e. not simply via your behavior) for running the wrong program, the program that does best depends heavily on which universe you are in and thus there cannot be a “perfectly rational agent” unless you assume a fixed prior over possible universes.
Yes. I agree that CDT fails to achieve optimal results in circumstances where the program that you are running directly affects the outside universe. For example, in clone PD, where running a program causes the opponent to run the same program, or in Newcomb’s problem, where running a program that 2-boxes causes the second box to be empty. On the other hand, ANY decision theory can be made to fail in such circumstances. You could merely face a universe that determines whether you are running program X and charges you $100 if you are.
Are there circumstances where the universe does not read your mind where CDT fails?
Is there a principled reason to use the MMEU strategy is the face of Knightian uncertainty? Why not maximize maximum expected utility, or minimized expected regret (i.e. the difference between the expected utility obtained by your action and the best expected utility you could have achieved if you knew the results of the Knightian uncertainty ahead of time).
Also, if we have two different types of uncertainty, is there a good reason that there shouldn’t be more than that? Maybe, ) Here’s a thing that I can confidently assign a probability to (e.g. the outcome of a coin flip) ) Here’s a thing that I cannot usually assign a precise probability to, but that it should be possible in principle to make such an assignment (e.g. the number of yellow balls in the bin) *) Here’s a thing that that I would have trouble even in principle assigning a meaningful probability to (e.g. the simulation hypothesis)
This isn’t a problem if you believe that there will only ever be finitely many people. Or if you exponentially discount (in some relativistically consistent manner) at an appropriate rate.
So utilitarianism has known paradoxes if you allow infinite positive/negative utilities (basically because infinite sums don’t always behave well). On the other hand, if you restrict yourself, say to situations that only last finitely long all these paradoxes go away. If both devices last for the same amount of subjective time, this holds true in all reference frames, and thus in all reference frames you can say that the situations are equally good.
Another effect: people on LW are massively more likely to describe themselves as effective altruists. My moral ideals were largely formed before I came into contact with LW, but not until I started reading was I introduced to the term “effective altruism”.
Well if AIXI believes that its interactions with the physical world are only due to the existence of BRAIN, it might not model the destruction of BRAIN leading to the destruction of its input, output and work streams (though in some sense this doesn’t actually happen since these are idealized concepts anyway), but it does model it as causing its output stream to no longer be able to affect its input stream, which seems like enough reason to be careful about making modifications.
I disagree. CDT correctly solves all problems in which other agents cannot read your mind. Real world occurrences of mind reading are actually uncommon.