Only if it’s common knowledge that both players are human.
ETA: Since I got downvoted, maybe I wasn’t being clear. I think that the Warren Buffett quote applies to human psychology more than to game theory in general. If outright deception were easy, it would probably become a good strategy to keep your allies in some doubt about your intentions, as a bargaining chip. But we humans don’t seem to be good at pulling that off, and so ambivalence is a strong signal of opposition.
Now that you have clarified, I wish I could downvote a second time.
Tit-for-tat is a good strategy in the iterated prisoner’s dilemma regardless of whether the players are human and regardless of whether the other player is “on your side”. In fact, it is pretty much taken for granted that there are no sides in the PD. Dre was downvoted by me for a complete misunderstanding of how Tit-for-tat relates to “sides”. You were downvoted for continuing the confusion.
Oh, you’re right- my response would have made sense talking about players in a one-shot PD with communication beforehand, but it’s a non sequitur to Dre’s mistaken comment. Don’t know how I missed that.
Upvoted, but even with communication beforehand, the rational move in a one-shot PD is to defect. Unless there is some way to make binding commitments, or unless there is some kind of weird acausal influence connecting the players. Regardless of whether the other player is human and rational, or silicon and dumb as a rock.
Upvoted, but even with communication beforehand, the rational move in a one-shot PD is to defect.
Taboo “rational”.
Unless there is some way to make binding commitments, or unless there is some kind of weird acausal influence connecting the players.
Acausal control is not something additional, it’s structure that already exists in a system if you know where to look for it. And typically, it’s everywhere, to some extent.
Highest-scoring move, adjective applied to the course that maximises fulfillment of desires.
The best move in a one-shot PD is to defect against a cooperator.
With no communication or precommitment, and with the knowledge that it is a one-shot PD, the overwhelming outcome is both defect. Adding communication to the mix creates a non-zero chance you can convince your opponent to cooperate—which increases the utility of defecting.
Adding communication to the mix creates a non-zero chance you can convince your opponent to cooperate—which increases the utility of defecting.
There is a question of what will actually happen, but also more relevant questions of what will happen if you do X, for various values of X. If you convince the opponent to cooperate, it’s one thing, not related to the case of convincing your opponent to cooperate if you cooperate.
the case of convincing your opponent to cooperate if you cooperate.
Determine what kinds of control influence your opponent, appear to also be influenced by the same, and then defect when they think you are forced into cooperating because they are forced into cooperating?
Is that a legitimate strategy, or am I misunderstanding what you mean by convincing your opponent to cooperate if you cooperate?
Determine what kinds of control influence your opponent, appear to also be influenced by the same, and then defect when they think you are forced into cooperating because they are forced into cooperating?
[W]hat [do] you mean by convincing your opponent to cooperate if you cooperate?
It’s not in general possible to predict what you’ll actually do, since if it were possible, you could take such predictions into consideration in deciding what to do, in particular you could decide differently as a result, invalidating the “prediction”. Similarly, it’s not in general possible to predict what will actually happen, without assuming what you’ll decide first. It’s better to ask, what is likely to happen if you decide X, than to ask just what is likely to happen. It’s more useful too, since it gives you information about (acausal) consequences of your actions that can be used as basis for making decisions.
In the case of Prisoner’s Dilemma, it’s not very helpful to ask, what will your opponent do. What your opponent will do generally depends on what you’ll do, and assuming that it doesn’t is a mistake that leads to the classical conclusion that defecting is always the better option (falsified by the case of identical players that always make the same decision, with cooperation the better one). If you ask instead, what will your opponent do (1) if you cooperate, and (2) if you defect, that can sometimes give you interesting answers, such that cooperating suddenly becomes the better option. When you talk to the opponent with the intention of “convincing” them, again you are affecting both predictions about what they’ll do, on both sides of your possible decision, and not just the monolithic prediction of what they’ll do unconditionally. In particular, you might want to influence the probability of your opponent cooperating with you if you cooperate, without similarly affecting the probability of your opponent cooperating with you if you defect. If you affect both probabilities in the same way, then you are correct, such influence makes the decision of defecting more profitable than before. But if you affect these probabilities to a different degree, then it might turn out that the opposite is true, that the influence in question makes cooperating more profitable.
Ah, I see! I have been butting my head against various ideas that lead to cooperating in one-shot PDs and the like and not making any progress, it was because while I had the idea of splitting my actions into groups conditional on the opponent’s action, I didn’t have the concept of doing the same for my opponent.
With that in mind, I can no longer parse my previous comment either. I think I meant that I would increase their probability of cooperating if I cooperated, and have them increase my probability of cooperating if they cooperated (thus decreasing both of our probabilities of defecting if the other cooperates), and then when the probabilities have moved far enough to tell us both to cooperate, I would defect, knowing that I would score a defect-against-cooperate. But yeah, it doesn’t make any sense at all, because the probabilities tell us both to cooperate.
Thanks for taking the time to explain this concept to me.
(Note that probability of you making a given decision is not knowable, when you are considering it yourself while allowing this consideration to influence the decision.)
LW Wiki for timeless decision theory (start with the posts- Eliezer’s PDF is very long and spends more time justifying than explaining).
Essentially, this may be beyond the level of humans to implement, but there are decision theories for an AI which do strictly better than the usual causal decision theory, without being exploitable. Two of these would cooperate with each other on the PD, given a chance to communicate beforehand.
Perplexed, have you come across the decision theory posts here yet? You’ll find them pretty interesting, I think.
Yes, I have read them, and commented on them. Negatively, for the most part. If any of these ideas are ever published in the peer reviewed literature, I will be both surprised and eager to read more.
there are decision theories for an AI which do strictly better than the usual causal decision theory, without being exploitable. Two of these would cooperate with each other on the PD, given a chance to communicate beforehand.
I think that you may have been misled by marketing hype. Even the proponents of those theories admit that they do not do strictly better (or at least as good) on all problems. They do better on some problems, and worse on others. Furthermore, sharing source code only provides a guarantee that the observed source is current if that source code cannot be changed. In other words, an AI that uses this technique to achieve commitment has also forsaken (at least temporarily) the option of learning from experience.
I am intrigued by the analogy between these acausal decision theories and the analysis of Hamilton’s rule in evolutionary biology. Nevertheless, I am completely mystified as to the motivation that the SIAI has for pursuing these topics. If the objective is to get two AIs to cooperate with each other there are a plethora of ways to do that already well known in the game theory canon. An exchange of hostages, for example, is one obvious way to achieve mutual enforceable commitment. Why is there this fascination with the bizarre here? Why so little reference to the existing literature?
So far as I understand the situation, the SIAI is working on decision theory because they want to be able to create an AI that can be guaranteed not to modify its own decision function.
There are circumstances where CDT agents will self-modify to use a different decision theory (e.g. Parfit’s Hitchhiker). If this happens (they believe), it will present a risk of goal-distortion, which is unFriendly.
Put another way: the objective isn’t to get two AIs to cooperate, the objective is to make it so that an AI won’t need to alter its decision function in order to cooperate with another AI. (Or any other theoretical bargaining partner.)
Does that make any sense? As a disclaimer, I definitely do not understand the issues here as well as the SIAI folks working on them.
I don’t think that’s quite right- a sufficiently smart Friendly CDT agent could self-modify into a TDT (or higher decision theory) agent without compromising Friendliness (albeit with the ugly hack of remaining CDT with respect to consequences that happened causally before the change).
As far as I understand SIAI, the idea is that decision theory is the basis of their proposed AI architecture, and they think it’s more promising than other AGI approaches and better suited to Friendliness content.
I don’t think that’s quite right- a sufficiently smart Friendly CDT agent could self-modify into a TDT (or higher decision theory) agent without compromising Friendliness (albeit with the ugly hack of remaining CDT with respect to consequences that happened causally before the change).
That sounds intriguing also. Again, a reference to something written by someone who understands it better might be helpful so as to make some sense of it.
Maybe it would be helpful to you to think of self-modifications and alternative decision theories as unrestricted precommitment. If you had the ability to irrevocably precommit to following any decision rule in the future, which rule would you choose? Surely it wouldn’t be pure CDT, because you can tractably identify situations where CDT loses.
you can tractably identify situations where CDT loses.
“Tractably” is a word that I find a bit unexpected in this context. What do you mean by it?
“Situations where CDT loses.” Are we talking about real-world-ish situations here? Situations in which causality applies? Situations in which the agents are free rather than being agents whose decisions have already been made for them by a programmer at some time in the past? What kind of situations do you have in mind?
And what do you mean by “loses”? Loses to who or what? Loses to agents that can foresee their opponent’s plays? Agents that have access to information channels not available to the CDT agent? Just what information channels are allowed? Why those, and not others?
ETA: And that “Surely it wouldn’t be CDT … because you can identify …” construction simply begs for completion with “Surely it would be … because you can’t identify …”. Do you have a candidate? Do you have a proof of “you can’t identify situations where it loses”. If not, what grounds do you have for criticizing?
CDT still loses to TDT in Newcomb’s problem if Omega has can predict your actions with better than 50.05% chances. You can’t get out of this by claiming that Omega has access to unrealistic information channels, because these chances seem fairly realistic to me.
Situations in which the agents are free rather than being agents whose decisions have already been made for them by a programmer at some time in the past?
Free from what? Causality? This sounds distressingly like you are relying on some notion of “free will”.
I understand that every normative decision theory adopts the assumption (convenient fiction if you prefer) that the agent being advised is acting of “his own free will”. Otherwise, why bother advising?
Being a compatibilist, as I understand Holy Scripture (i.e. The Sequences) instructs me to be, I see no incompatibility between this “fiction” of free will and the similar fiction of determinism. They model reality at different levels.
For certain purposes, it is convenient to model myself and other “free agents” as totally free in our decisions, but not completely free in carrying out those decisions. For example, my free will ego may decide to quit smoking, but my determined id has some probability of overruling that decision.
Why the distinction between agents which are free and agents which have had their decisions made for them by a programmer, then? Are you talking about cases in which specific circumstances have hard-coded behavioral responses? Every decision every agent makes is ultimately made for it by the agent’s programmer; I suppose I’m wondering where you draw the line.
As a side note, I feel very uncomfortable seeing the sequences referred to as inviolable scripture, even in jest. In my head, it just screams “oh my god how could anyone ever be doing it this wrong arghhhhhh.”
I’m still trying to figure out what I think of that reaction, and do not mention it as a criticism. I think.
Why the distinction between agents which are free and agents which have had their decisions made for them by a programmer, then? Are you talking about cases in which specific circumstances have hard-coded behavioral responses? Every decision every agent makes is ultimately made for it by the agent’s programmer; I suppose I’m wondering where you draw the line.
I make the distinction because the distinction is important. The programmer makes decisions at one point in time, with his own goals and/or utility functions, and his own knowledge of the world. The agent makes decisions at a different point in time, based on different values and different knowledge of the world. A decision theory which advises the programmer is not superior to a decision theory which advises the agent. Those two decision theories are playing different games.
“Totally free” sounds like too free. You’re not free to actually decide at time T to “decide X at time T+1″ and then actually decide Y at time T+1, since that is against the laws of physics.
It’s my understanding that what goes through your head when you actually decide X at time T+1 is (approximately) what we call TDT. Or you can stick to CDT and not be able to make decisions for your future self.
I upvoted this because it seems to contain a grain of truth, but I’m nervous that someone before me had downvoted it. I don’t know whether that was because it actually is just completely wrong about what TDT is all about, or because you went a bit over the top with “against the laws of physics”.
Situations where CDT loses are precisely those situations where credible precommitment helps you, and inability to credibly precommit hurts you. There’s no shortage of those in game theory.
Ok, those are indeed a reasonable class of decisions to consider. Now, you say that CDT loses. Ok, loses to what? And presumably you don’t mean loses to opponents of your preferred decision theory. You mean loses in the sense of doing less well in the same situation. Now, presumably that means that both CDT and your candidate are playing against the same game opponent, right?
I think you see where I am going here, though I can spell it out if you wish. In claiming the superiority of the other decision theory you are changing the game in an unfair way by opening a communication channel that didn’t exist in the original game statement and which CDT has no way to make use of.
Well, yeah, kind of, that’s one way to look at it. Reformulate the question like this: what would CDT do if that communication channel were available? What general precommitment for future situations would CDT adopt and publish? That’s the question TDT people are trying to solve.
what would CDT do if that communication channel were available?
The simplest answer that moves this conversation forward would be “It would pretend to be a TDT agent that keeps its commitments, whenever that act of deception is beneficial to it. It would keep accurate statistics on how often agents claiming to be TDT agents actually are TDT agents, and adjust its priors accordingly.”
Now it is your turn to explain why this strategy violates the rules, whereas your invention of a deception-free channel did not.
I’m going to have to refer you to Eliezer’s TDT document for that. (If you’re OK with starting in medias res, the first mention of this is on pages 22-23, though there it’s just specialized to Newcomb’s Dilemmas; see pages 50-52 for an example of the limits of this hack. Elsewhere he’s argued for the more general nature of the hack.)
I’m coming to realize just how much of this stuff derives from Eliezer’s insistance on reflective consistency of a decision theory. Given any decision theory, Eliezer will find an Omega to overthrow it.
But doesn’t a diagonal argument show that no decision theory can be reflectively consistent over all test data presented by a malicious Omega? Just as there is no enumeration of the reals, isn’t there a game which can make any specified rational agent regret its rationality? Omega holds all the cards. He can always make you regret your choice of decision theory.
Just as there is no enumeration of the reals, isn’t there a game which can make any specified rational agent regret its rationality? Omega holds all the cards. He can always make you regret your choice of decision theory.
No. We can ensure that no such problem exists if we assume that (1) only the output decisions are used, not any internals; and (2) every decision is made with access to the full problem statement.
I’m not entirely sure what “every decision is made with full access to the problem statement means”, but I can’t see how it can possibly get around the diagonalisation argument. Basically, Omega just says “I simulated your decision on problem A, on which your algorithm outputs something different from algorithm X, and give you a shiny black ferrari iff you made the same decision as algorithm X”
As cousin_it pointed out last time I brought this up, Caspian made this argument in response to the very first post on the Counterfactual Mugging. I’ve yet to see anyone point out a flaw in it as an existence proof.
As far as I can see the only premise needed for this diagonalisation to work is that your decision theory doesn’t agree with algorithm X on all possible decisions, so just make algorithm X “whatever happens, recite the Bible backwards 17 times”.
I’m not entirely sure what “every decision is made with full access to the problem statement means”, but I can’t see how it can possibly get around the diagonalisation argument. Basically, Omega just says “I simulated your decision on problem A, on which your algorithm outputs something different from algorithm X, and give you a shiny black ferrari iff you made the same decision as algorithm X”
In that case, your answer to problem A is being used in a context other than problem A. That other context is the real problem statement, and you didn’t have it when you chose your answer to A, so it violates the assumption.
Yeah, that definitely violates the “every decision is made with full access to the problem statement” condition. The outcome depends on your decision on problem A, but when making your decision on problem A you have no knowledge that your decision will also be used for this purpose.
I don’t see how this is useful. Let’s take a concrete example, let’s have decision problem A, Omega offers you the choice of $1,000,000, or being slapped in the face with a wet fish. Which would you like your decision theory to choose?
Now, No-mega can simulate you, say, 10 minutes before you find out who he is, and give you 3^^^3 utilons iff you chose the fish-slapping. So your algorithm has to include some sort of prior on the existence of “fish-slapping”-No-megas.
My algorithm “always get slapped in the face with a wet fish where that’s an option”, does better than any sensible algorithm on this particular problem, and I don’t see how this problem is noticeably less realistic than any others.
In other words, I guess I might be willing to believe that you can get around diagonalisation by posing some stringent limits on what sort of all-powerful Omegas you allow (can anyone point me to a proof of that?) but I don’t see how it’s interesting.
Now, No-mega can simulate you, say, 10 minutes before you find out who he is, and give you 3^^^3 utilons iff you chose the fish-slapping. So your algorithm has to include some sort of prior on the existence of “fish-slapping” No-megas.
Actually, no, the probability of fish-slapping No-megas is part of the input given to the decision theory, not part of the decision theory itself. And since every decision theory problem statement comes with an implied claim that it contains all relevant information (a completely unavoidable simplifying assumption), this probability is set to zero.
Decision theory is not about determining what sorts of problems are plausible, it’s about getting from a fully-specified problem description to an optimal answer. Your diagonalization argument requires that the problem not be fully specified in the first place.
“I simulated your decision on problem A, on which your algorithm outputs something different from algorithm X, and give you a shiny black ferrari iff you made the same decision as algorithm X”
This is a no-choice scenario. If you say that the Bible-reciter is the one that will “win” here, you are using the verb “to win” with a different meaning from the one used when we say that a particular agent “wins” by making the choice that leads to the best outcome.
But doesn’t a diagonal argument show that no decision theory can be reflectively consistent over all test data presented by a malicious Omega?
With the strong disclaimer that I have no background in decision theory beyond casually reading LW...
I don’t think so. The point of simulation (Omega) problems, to me, doesn’t seem to be to judo your intelligence against yourself; rather, it is to “throw your DT off the scent”, building weird connections between events (weird, but still vaguely possible, at least for AIs), that a particular DT isn’t capable of spotting and taking into account.
My human, real-life decision theory can be summarised as “look at as many possible end-result worlds as I can, and at what actions will bring them into being; evaluate how much I like each of them; then figure out which actions are most efficient at leading to the best worlds”. But that doesn’t exactly fly when you’re programming a computer, you need something that can be fully formalised, and that is where those strange Omega scenarios are useful, because your code must get it right “on autopilot”, it cannot improvise a smarter approach on the spot—the formula is on paper, and if it can’t solve a given problem, but another one can, it means that there is room for improvement.
In short, DT problems are just clever software debugging.
I agreed with everything you said after “I don’t think so”. So I am left confused as to why you don’t think so.
You analogize DT problems as test data used to determine whether we should accept or reject a decision theory. I am claiming that our requirements (i.e. “reflective consistency”) are so unrealistic that we will always be able to find test data forcing us to reject. Why do you not think so?
Because I suspect that there are only so many functionally different types of connections between events (at the very least, I see no hint that there must be infinitely many) and once you’ve found them all you will have the possibility of writing a DT that can’t be led to corner itself into suboptimal outcomes due to blind spots.
at the very least, I see no hint that there must be infinite ones
Am I correct in interpreting this as “infinitely many of them”? If so, I am curious as to what you mean by “functionally different types of connections between events”. Could you provide an example of some “types of connections between events”? Functionally different ones to be sure.
Presumably, the relevance must be your belief that decision theories differ in just how many of these different kinds of connections they handle correctly. Could you illustrate this by pointing out how the decision theory of your choice handles some types of connections, and why you have confidence that it does so correctly?
Am I correct in interpreting this as “infinitely many of them”?
Oops, yes. Fixed.
If so, I am curious as to what you mean by “functionally different types of connections between events”. Could you provide an example of some “types of connections between events”? Functionally different ones to be sure.
CDT can ‘see’ the classical, everyday causal connections that are marked in formulas with the symbol “>” (and I’d have to spend several hours reading at least the Stanford Encyclopaedia before I could give you a confident definition of that), but it cannot ‘see’ the connection in Newcomb’s problem between the agent’s choice of boxes and the content of the opaque box (sometimes called ‘retrocausality’).
Presumably, the relevance must be your belief that decision theories differ in just how many of these different kinds of connections they handle correctly. Could you illustrate this by pointing out how the decision theory of your choice handles some types of connections, and why you have confidence that it does so correctly?
I don’t have a favourite formal decision theory, because I am not sufficiently familiar with the underlying math and with the literature of discriminating scenarios to pick a horse. If you’re talking about the human decision “theory” of mine I described above, it doesn’t explicitly do that; the key hand-waving passage is “figure out which actions are most efficient at leading to the best worlds”, meaning I’ll use whatever knowledge I currently possess to estimate how big is the set of Everett branches where I do X and get A, compared to the set of those where I do X and get B. (For example, six months ago I hadn’t heard of the concept of acausal connections and didn’t account for them at all while plotting the likelihoods of possible futures, whereas now I do—at least technically; in practice, I think that between human agents they are a negligible factor. For another example, suppose that some years from now I became convinced that the complexity of human minds, and the variability between different ones, were much greater than I previously thought; then, given the formulation of Newcomb’s problem where Omega isn’t explicitly defined as a perfect simulator and all we know is that it has had a 100% success rate so far, I would suitably increase my estimation of the chances of Omega screwing up and making two-boxing profitable.)
CDT can ‘see’ the classical, everyday causal connections that are marked in formulas with the symbol “>” (and I’d have to spend several hours reading at least the Stanford Encyclopaedia before I could give you a confident definition of that), but it cannot ‘see’ the connection in Newcomb’s problem between the agent’s choice of boxes and the content of the opaque box (sometimes called ‘retrocausality’).
Ok, so if I understand you, there are only some finite number of valid kinds of connections between events and when we have all of them incorporated—when our decision theory can “see” each of them—we are then all done. We have the final, perfect decision theory (FPDT).
But what do you do then when someone—call him Yuri Geller—comes along and points out that we left out one important kind of connection: the “superspooky” connection. And then he provides some very impressive statistical evidence that this connection exists and sets up games in front of large (paying) audiences in which FPDT agents fail to WIN. He then proclaims the need for SSPDT.
Or, if you don’t buy that, maybe you will prefer this one. Yuri Geller doesn’t really exist. He is a thought experiment. Still the existence of even the possibility of superspooky connections proves that they really do exist and hence that we need to have SADT—Saint Anselm’s Decision Theory.
Ok, I’ve allowed my sarcasm to get the better of me. But the question remains—how are you ever going to know that you have covered all possible kinds of connections between events?
But the question remains—how are you ever going to know that you have covered all possible kinds of connections between events?
You can’t, I guess. Within an established mathematical model, it may be possible to prove that a list of possible configurations of event pairs {A, B} is exhaustive. But the model may always prove in need of expansion or refinement—whether because some element gets understood and modellised at a deeper level (eg the nature of ‘free’ will) or, more worryingly, because of paradigm shifts about physical reality (eg turns out we can time travel).
Decision theories should usually be seen as normative, not descriptive. How “realistic” something is, is not very important, especially for thought experiments. Decision theory cashes out where you find a situation that can indeed be analyzed with it, and where you’ll secure a better outcome by following theory’s advice. For example, noticing acausal control has advantages in many real-world situations (Parfit’s Hitchhiker variants). Eliezer’s TDT paper discusses this towards the end of Part I.
I believe you misinterpreted my “unrealistic requirements”. A better choice of words would have been “unachievably stringent requirements”. I wasn’t complaining that Omega and the like are unrealistic. At least not here.
The version I have of Eliezer’s TDT paper doesn’t have a “Part I”. It is dated “September 2010 and has 112 pages. Is there a better version available?
I don’t understand your other comments. Or, perhaps more accurately, I don’t understand what they were in response to.
Ah! Thank you. I see now. The circumstance in which a CDT agent will self modify to use a different decision theory are that:
The agent was programmed by Eliezer Yudkowsky and hence is just looking for an excuse to self-modify.
The agent is provided with a prior leading it to be open to the possibility of omnicient, yet perverse agents bearing boxes full of money.
The agent is supplied with (presumably faked) empirical data leading it to believe that all such omniscient agents reward one-boxers.
Since the agent seeks reflective equilibrium (because programmed by aforesaid Yudkowsky), and since it knows that CDT requires two boxing, and since it has no reason to doubt that causality is important in this world, it makes exactly the change to its decision theory that seems appropriate. It continues to use CDT except on Newcomb problems, where it one boxes. That is, it self-modifies to use a different decision theory, which we can call CDTEONPWIOB.
Well, ok, though I wouldn’t have said that these are cases where CDT agents do something weird. These are cases where EYDT agents do something weird.
I apologize if it seems that the target of my sarcasm is you WrongBot. It is not.
EY has deluded himself into thinking that reflective consistency is some kind of gold standard of cognitive stability. And then he uses reflective consistency as a lever by which completely fictitious data can uproot the fundamental algorithms of rationality.
Which would be fine, except that he has apparently convinced a lot of smart people here that he knows what he is talking about. Even though he has published nothing on the topic. Even though other smart people like Robin tell him that he is trying to solve an already solved problem.
I would say more but …
This manuscript was cut off here, but interested readers are suggested to look at these sources for more discussion:Bibliography
Gibbard, A., and Harper, W. L. (1978), “Counterfactuals and Two Kinds of Expected Utility”, in C. A. Hooker, J. J. Leach, and E. F. McClennen (eds.), Foundations and Applications of Decision Theory, vol. 1, Reidel, Dordrecht, pp. 125-162.
Reflective consistency is not a “gold standard”. It is a basic requirement. It should be easy to come up with terrible, perverse decision theories that are reflectively consistent (EY does so, sort of, in his TDT outline, though it’s not exactly serious / thorough). The point is not that reflective consistency is a sign you’re on the right track, but that a lack of it is a sign that something is really wrong, that your decision theory is perverse. If using your decision theory causes you to abandon that same decision theory, it can’t have been a very good decision theory.
Consider it as being something like monotonicity in a voting system; it’s a weak requirement for weeding out things that are clearly bad. (Well, perhaps not everyone would agree IRV is “clearly bad”, but… it isn’t even monotonic!) It just happens that in this case evidently nobody noticed before that this would be a good condition to satisfy and hence didn’t try. :)
Am not sure that decision theory is an “already solved” problem. There’s the issue of what happens when agents can self-modify—and so wirehead themselves. I am pretty sure that is an unresolved “grand challenge” problem.
Because it can’t find a write-up that explains how to use it?
Perhaps you can answer the questions that I asked here What play does TDT make in the game of Chicken? Can you point me to a description of TDT that would allow me to answer that question for myself?
Suppose I’m an agent implementing TDT. My decision in Chicken depends on how much I know about my opponent.
If I know my opponent implements the same decision procedure I do (because I have access to its source code, say), and my opponent has this knowledge about me, I swerve. In this case, my opponent and I are in symmetrical positions and its choice is fully determined by mine; my choice is between payoffs of (0,0) and (-10,-10).
Else, I act identically to a CDT agent.
As Eliezer says here, the one-sentence version of TDT is “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
If I know my opponent implements the same decision procedure I do (because I have access to its source code, say), and my opponent has this knowledge about me, I swerve. In this case, my opponent and I are in symmetrical positions and its choice is fully determined by mine; my choice is between payoffs of (0,0) and (-10,-10).
I’m not sure this is right. Isn’t there a correlated equilibrium that does better?
I think we’re looking at different payoff matrices. I was using the formulation of Chicken that rewards
# | ….C....|.....D..... C | +0, +0 | −1,+1 D | +1, −1 | −10, −10
which doesn’t have a correlated equilibrium that beats (C,C).
Using the payoff matrix Perplexed posted here, there is indeed a correlated equilibrium, which I believe the TDT agents would arrive at (given a source of randomness). My bad for not specifying the exact game I was talking about.
Why do you believe the TDT agents would find the correlated equilibrium? Your previous statement and Eliezer quote suggested that a pair of TDT agents would always play symmetrically in a symmetric game. No “spontaneous symmetry breaking”.
Even without a shared random source, there is a Nash mixed equilibrium that is also better than symmetric cooperation. Do you believe TDT would play that if there were no shared random input?
In a symmetric game, TDT agents choose symmetric strategies. Without a source of randomness, this entails playing symmetrically as well.
I’m not sure why you’re talking about shared random input. If both agents get the same input, they can both be expected to treat it in the same way and make the same decision, regardless of the input’s source. Each agent needs an independent source of randomness in order to play the mixed equilibrium; if my strategy is to play C 30% of the time, I need to know whether this iteration is part of that 30%, which I can’t do deterministically because my opponent is simulating me.
Yeah, I think any use of correlated equilibrium here is wrong—that requires a shared random source. I think in this case we just get symmetric strategies, i.e., it reduces to superrationality, where they each just get their own private random source.
I’m not sure why you’re talking about shared random input.
Sorry if this was unclear. It was a reference to the correlated pair of random variables used in a correlated equilibrium. I was saying that even without such a correlated pair, you may presume the availability of independent random variables which would allow a Nash equilibrium—still better than symmetric play in this game.
Gah, wait. I feel dumb. Why would TDT find correlated equilibria? I think I had the “correlated equilibrium” concept confused. A correlated equilibrium would require a public random source, which two TDTers won’t have.
Ignoring the whole pi-is-not-known-to-be-normal thing, how do you determine which digit of pi to use when you can’t actually communicate and you have no idea how many digits of pi the other player may already know?
Thank you. I hope you realize that you have provided an example of a game in which CDT does better than TDT. For example, in the game with the payoff matrix shown below, there is a mixed strategy Nash equilibrium which is better than the symmetric cooperative result.
So TDT is different from CDT only in cases where the game is perfectly symmetric? If you are playing a game that is roughly the symmetric PD, except that one guy’s payoffs are shifted by a tiny +epsilon, then they should both defect?
TDT is different from CDT whenever one needs to consider the interaction of multiple decisions made using the same TDT-based decision procedure. This applies both to competitions between agents, as in the case of Chicken, and to cases where an agent needs to make credible precommitments, as in Newcomb’s Problem.
In the case of an almost-symmetric PD, the TDT agents should still cooperate. To change that, you’d have to make the PD asymmetrical enough that the agents were no longer evaluating their options in the same way. If a change is small enough that a CDT agent wouldn’t change its strategy, TDT agents would also ignore it.
This doesn’t strike me as the world’s greatest explanation, but I can’t think of a better way to formulate it. Please let me know if there’s something that’s still unclear.
If a change is small enough that a CDT agent wouldn’t change its strategy, TDT agents would also ignore it.
This strikes me as a bit bizarre. You test whether a warped PD is still close enough to symmetric by asking whether a CDT agent still defects in order to decide whether a TDT agent should still cooperate? Are you sure you are not just making up these rules as you go?
Please let me know if there’s something that’s still unclear.
Much is unclear and very little seems to be coherently written down. What amazes me is that there is so much confidence given to something no one can explain clearly. So far, the only stable thing in your description of TDT is that it is better than CDT.
I have yet to see a description of TDT which allows me to calculate what TDT does on an arbitrary problem. But I do know that I have seen long lists from Eliezer of problems that TDT does not solve that he thinks it ought to be improved so as to solve.
The world isn’t sufficiently formalized for us to meet that standard for any decision theory (though we come closer with CDT and TDT than with EDT, in my opinion). However, cousin_it has a few recent posts on formalized situations where an agent of a more TDT (actually, UDT) type does strictly better than a CDT one in the same situation. I don’t know of any formalization (or any fuzzy real-world situation) where the opposite is true.
I apparently misled you by using that word “arbitrary”. I’m not asking for solutions to soft problems that are difficult to formalize. Simply solutions to the standard kinds of games already formalized in game theory. For example, the game of Chicken). Can anyone point me to a description that tells me what play TDT would make in this game? Or what mixed strategy it would use? Both assuming and not assuming the reading of each other’s code.
ETA: Slightly more interesting than the payoff matrix shown in the wikipedia article is the case when the payoff for a win is 2 units, with a loss still costing only −1. This means that in the iterated version, the negotiated solution would be to alternate wins. But we are interested in the one-shot case.
Can TDT find a correlated equilibrium? If not, which Nash equilibrium does it pick? Or does it always chicken out? Where can I learn this information?
But I do know that I have seen long lists from Eliezer of problems that TDT does not solve that he thinks it ought to be improved so as to solve.
Since CDT and EDT don’t solve those problems either, all this justifies saying is that TDT does better on some problems, and the same on others, not “worse on others”.
A “nemesis” environment that feeds misleading evidence to a decision theory’s underlying epistimology does not indicate the sort of problem illustrated by an environment in which a decision theory does something stupid with true information.
What you asked for was a case where a decision theory did worse than its rivals.
However, that seems pretty trivial if it behaves differently from them—you just consider an appropriate pathological environment set up to punish that decision theory.
Not necessarily. Various decision theories can come into play here. It depends precisely on what you mean by the prisoner’s paradox. If you are playing a true one shot where you have no information about the entity in question then that might be true. But if you are playing a true one shot where you each before making the decision have each player have access to the other player’s source code then defecting may not be the best solution. Some of the decision theory posts have discussed this. (Note that knowing each others’ source code is not nearly as strong an assumption as it might seem since one common idea in game theory is to look at what game theory occurs when people know when the other players know your strategy. (I’m oversimplifying some technical details here. I don’t fully understand all the issues. I’m not a game theorist. Add any other relevant disclaimers.))
No one on this thread has mentioned a “prisoner’s paradox”. We have been discussing the Prisoner’s Dilemma, a well known and standard problem in game theory which involves two players who must decide without prior knowledge of the other player’s decision.
A different problem in which neither player is actually making a decision, but instead is controlled by a deterministic algorithm, and in which both players, by looking at source, are able to know the other’s decision in advance, is certainly an interesting puzzle to consider, but it has next to nothing in common with the Prisoner’s Dilemma besides a payoff matrix.
No one on this thread has mentioned a “prisoner’s paradox”. We have been discussing the Prisoner’s Dilemma, a well known and standard problem in game theory which involves two players who must decide without prior knowledge of the other player’s decision.
Prisoner’s paradox is another term for the prisoner’s dilemma. See for example this Wikipedia redirect. You may want to reread what I wrote in that light. (Although there’s some weird bit of illusion of transparency going on here in that part of me has a lot of trouble understanding how someone wouldn’t be able to tell from context that they were the same thing.)
A different problem in which neither player is actually making a decision, but instead is controlled by a deterministic algorithm, and in which both players, by looking at source, are able to know the other’s decision in advance, is certainly an interesting puzzle to consider, but it has next to nothing in common with the Prisoner’s Dilemma besides a payoff matrix.
No. The problem of what to do is actually closely related when one has systems which are able to understand each others source code. It is in fact related to the problem of iterating the problem.
In general, given no information, the problem still has relevant decision theoretic considerations.
The problem of what to do is actually closely related when one has systems which are able to understand each others source code. It is in fact related to the problem of iterating the problem.
I’m curious why you assert this. Game theorists have a half dozen or so standard simple one-shot two person games which they use to illustrate principles. PD is one, matching pennies is another, Battle of the Sexes, Chicken, … the list is not that long.
They also have a handful of standard ways of taking a simple one-shot game and turning it into something else—iteration is one possibility, but you can also add signaling, bargaining with commitment, bargaining without commitment but with a correlated shared signal, evolution of strategies to an ESS, etc. I suppose that sharing source code can be considered yet another of these basic game transformations.
Now we have the assertion that for one (PD is the only one?) of these games, one (iteration is the only one?) of these transformations is closely related to this new code-sharing transformation. Why is this assertion made? Is there some kind of mathematical structure to this claimed relationship? Some kind of proof? Surely there is more evidence for this claimed relationship than just pointing out that both transformations yield the same prescription—“cooperate”—when there are only two possible prescriptions to choose among.
Is the code-sharing version of Chicken also closely related to the iterated version? How about Battle of the Sexes?
So I’m going to need to repeat my earlier disclaimer that I’m far from my area of expertise. But the basic idea is that iterating games gives you a probabilistic estimate for what the underlying code looks like (assuming some sort of nice distribution on potential source code such that in general simpler code is more likely than complicated code). Unfortunately, I don’t know any details of this approach beyond its existence but it should apply to other games like Chicken also.
Wouldn’t this be a problem for tit for tat players going up against other tit for tat players (but not knowing the strategy of their opponent)?
Only if it’s common knowledge that both players are human.
ETA: Since I got downvoted, maybe I wasn’t being clear. I think that the Warren Buffett quote applies to human psychology more than to game theory in general. If outright deception were easy, it would probably become a good strategy to keep your allies in some doubt about your intentions, as a bargaining chip. But we humans don’t seem to be good at pulling that off, and so ambivalence is a strong signal of opposition.
Now that you have clarified, I wish I could downvote a second time.
Tit-for-tat is a good strategy in the iterated prisoner’s dilemma regardless of whether the players are human and regardless of whether the other player is “on your side”. In fact, it is pretty much taken for granted that there are no sides in the PD. Dre was downvoted by me for a complete misunderstanding of how Tit-for-tat relates to “sides”. You were downvoted for continuing the confusion.
Oh, you’re right- my response would have made sense talking about players in a one-shot PD with communication beforehand, but it’s a non sequitur to Dre’s mistaken comment. Don’t know how I missed that.
Upvoted, but even with communication beforehand, the rational move in a one-shot PD is to defect. Unless there is some way to make binding commitments, or unless there is some kind of weird acausal influence connecting the players. Regardless of whether the other player is human and rational, or silicon and dumb as a rock.
Taboo “rational”.
Acausal control is not something additional, it’s structure that already exists in a system if you know where to look for it. And typically, it’s everywhere, to some extent.
Highest-scoring move, adjective applied to the course that maximises fulfillment of desires.
The best move in a one-shot PD is to defect against a cooperator.
With no communication or precommitment, and with the knowledge that it is a one-shot PD, the overwhelming outcome is both defect. Adding communication to the mix creates a non-zero chance you can convince your opponent to cooperate—which increases the utility of defecting.
There is a question of what will actually happen, but also more relevant questions of what will happen if you do X, for various values of X. If you convince the opponent to cooperate, it’s one thing, not related to the case of convincing your opponent to cooperate if you cooperate.
Determine what kinds of control influence your opponent, appear to also be influenced by the same, and then defect when they think you are forced into cooperating because they are forced into cooperating?
Is that a legitimate strategy, or am I misunderstanding what you mean by convincing your opponent to cooperate if you cooperate?
Couldn’t parse.
It’s not in general possible to predict what you’ll actually do, since if it were possible, you could take such predictions into consideration in deciding what to do, in particular you could decide differently as a result, invalidating the “prediction”. Similarly, it’s not in general possible to predict what will actually happen, without assuming what you’ll decide first. It’s better to ask, what is likely to happen if you decide X, than to ask just what is likely to happen. It’s more useful too, since it gives you information about (acausal) consequences of your actions that can be used as basis for making decisions.
In the case of Prisoner’s Dilemma, it’s not very helpful to ask, what will your opponent do. What your opponent will do generally depends on what you’ll do, and assuming that it doesn’t is a mistake that leads to the classical conclusion that defecting is always the better option (falsified by the case of identical players that always make the same decision, with cooperation the better one). If you ask instead, what will your opponent do (1) if you cooperate, and (2) if you defect, that can sometimes give you interesting answers, such that cooperating suddenly becomes the better option. When you talk to the opponent with the intention of “convincing” them, again you are affecting both predictions about what they’ll do, on both sides of your possible decision, and not just the monolithic prediction of what they’ll do unconditionally. In particular, you might want to influence the probability of your opponent cooperating with you if you cooperate, without similarly affecting the probability of your opponent cooperating with you if you defect. If you affect both probabilities in the same way, then you are correct, such influence makes the decision of defecting more profitable than before. But if you affect these probabilities to a different degree, then it might turn out that the opposite is true, that the influence in question makes cooperating more profitable.
Ah, I see! I have been butting my head against various ideas that lead to cooperating in one-shot PDs and the like and not making any progress, it was because while I had the idea of splitting my actions into groups conditional on the opponent’s action, I didn’t have the concept of doing the same for my opponent.
With that in mind, I can no longer parse my previous comment either. I think I meant that I would increase their probability of cooperating if I cooperated, and have them increase my probability of cooperating if they cooperated (thus decreasing both of our probabilities of defecting if the other cooperates), and then when the probabilities have moved far enough to tell us both to cooperate, I would defect, knowing that I would score a defect-against-cooperate. But yeah, it doesn’t make any sense at all, because the probabilities tell us both to cooperate.
Thanks for taking the time to explain this concept to me.
(Note that probability of you making a given decision is not knowable, when you are considering it yourself while allowing this consideration to influence the decision.)
Perplexed, have you come across the decision theory posts here yet? You’ll find them pretty interesting, I think.
LW Wiki for the Prisoner’s Dilemma
LW Wiki for timeless decision theory (start with the posts- Eliezer’s PDF is very long and spends more time justifying than explaining).
Essentially, this may be beyond the level of humans to implement, but there are decision theories for an AI which do strictly better than the usual causal decision theory, without being exploitable. Two of these would cooperate with each other on the PD, given a chance to communicate beforehand.
Yes, I have read them, and commented on them. Negatively, for the most part. If any of these ideas are ever published in the peer reviewed literature, I will be both surprised and eager to read more.
I think that you may have been misled by marketing hype. Even the proponents of those theories admit that they do not do strictly better (or at least as good) on all problems. They do better on some problems, and worse on others. Furthermore, sharing source code only provides a guarantee that the observed source is current if that source code cannot be changed. In other words, an AI that uses this technique to achieve commitment has also forsaken (at least temporarily) the option of learning from experience.
I am intrigued by the analogy between these acausal decision theories and the analysis of Hamilton’s rule in evolutionary biology. Nevertheless, I am completely mystified as to the motivation that the SIAI has for pursuing these topics. If the objective is to get two AIs to cooperate with each other there are a plethora of ways to do that already well known in the game theory canon. An exchange of hostages, for example, is one obvious way to achieve mutual enforceable commitment. Why is there this fascination with the bizarre here? Why so little reference to the existing literature?
So far as I understand the situation, the SIAI is working on decision theory because they want to be able to create an AI that can be guaranteed not to modify its own decision function.
There are circumstances where CDT agents will self-modify to use a different decision theory (e.g. Parfit’s Hitchhiker). If this happens (they believe), it will present a risk of goal-distortion, which is unFriendly.
Put another way: the objective isn’t to get two AIs to cooperate, the objective is to make it so that an AI won’t need to alter its decision function in order to cooperate with another AI. (Or any other theoretical bargaining partner.)
Does that make any sense? As a disclaimer, I definitely do not understand the issues here as well as the SIAI folks working on them.
I don’t think that’s quite right- a sufficiently smart Friendly CDT agent could self-modify into a TDT (or higher decision theory) agent without compromising Friendliness (albeit with the ugly hack of remaining CDT with respect to consequences that happened causally before the change).
As far as I understand SIAI, the idea is that decision theory is the basis of their proposed AI architecture, and they think it’s more promising than other AGI approaches and better suited to Friendliness content.
That sounds intriguing also. Again, a reference to something written by someone who understands it better might be helpful so as to make some sense of it.
Maybe it would be helpful to you to think of self-modifications and alternative decision theories as unrestricted precommitment. If you had the ability to irrevocably precommit to following any decision rule in the future, which rule would you choose? Surely it wouldn’t be pure CDT, because you can tractably identify situations where CDT loses.
“Tractably” is a word that I find a bit unexpected in this context. What do you mean by it?
“Situations where CDT loses.” Are we talking about real-world-ish situations here? Situations in which causality applies? Situations in which the agents are free rather than being agents whose decisions have already been made for them by a programmer at some time in the past? What kind of situations do you have in mind?
And what do you mean by “loses”? Loses to who or what? Loses to agents that can foresee their opponent’s plays? Agents that have access to information channels not available to the CDT agent? Just what information channels are allowed? Why those, and not others?
ETA: And that “Surely it wouldn’t be CDT … because you can identify …” construction simply begs for completion with “Surely it would be … because you can’t identify …”. Do you have a candidate? Do you have a proof of “you can’t identify situations where it loses”. If not, what grounds do you have for criticizing?
CDT still loses to TDT in Newcomb’s problem if Omega has can predict your actions with better than 50.05% chances. You can’t get out of this by claiming that Omega has access to unrealistic information channels, because these chances seem fairly realistic to me.
Free from what? Causality? This sounds distressingly like you are relying on some notion of “free will”.
(Apologies if I’m misreading you.)
I am relying on a notion of free will.
I understand that every normative decision theory adopts the assumption (convenient fiction if you prefer) that the agent being advised is acting of “his own free will”. Otherwise, why bother advising?
Being a compatibilist, as I understand Holy Scripture (i.e. The Sequences) instructs me to be, I see no incompatibility between this “fiction” of free will and the similar fiction of determinism. They model reality at different levels.
For certain purposes, it is convenient to model myself and other “free agents” as totally free in our decisions, but not completely free in carrying out those decisions. For example, my free will ego may decide to quit smoking, but my determined id has some probability of overruling that decision.
Why the distinction between agents which are free and agents which have had their decisions made for them by a programmer, then? Are you talking about cases in which specific circumstances have hard-coded behavioral responses? Every decision every agent makes is ultimately made for it by the agent’s programmer; I suppose I’m wondering where you draw the line.
As a side note, I feel very uncomfortable seeing the sequences referred to as inviolable scripture, even in jest. In my head, it just screams “oh my god how could anyone ever be doing it this wrong arghhhhhh.”
I’m still trying to figure out what I think of that reaction, and do not mention it as a criticism. I think.
I make the distinction because the distinction is important. The programmer makes decisions at one point in time, with his own goals and/or utility functions, and his own knowledge of the world. The agent makes decisions at a different point in time, based on different values and different knowledge of the world. A decision theory which advises the programmer is not superior to a decision theory which advises the agent. Those two decision theories are playing different games.
“Totally free” sounds like too free. You’re not free to actually decide at time T to “decide X at time T+1″ and then actually decide Y at time T+1, since that is against the laws of physics.
It’s my understanding that what goes through your head when you actually decide X at time T+1 is (approximately) what we call TDT. Or you can stick to CDT and not be able to make decisions for your future self.
I upvoted this because it seems to contain a grain of truth, but I’m nervous that someone before me had downvoted it. I don’t know whether that was because it actually is just completely wrong about what TDT is all about, or because you went a bit over the top with “against the laws of physics”.
Situations where CDT loses are precisely those situations where credible precommitment helps you, and inability to credibly precommit hurts you. There’s no shortage of those in game theory.
Ok, those are indeed a reasonable class of decisions to consider. Now, you say that CDT loses. Ok, loses to what? And presumably you don’t mean loses to opponents of your preferred decision theory. You mean loses in the sense of doing less well in the same situation. Now, presumably that means that both CDT and your candidate are playing against the same game opponent, right?
I think you see where I am going here, though I can spell it out if you wish. In claiming the superiority of the other decision theory you are changing the game in an unfair way by opening a communication channel that didn’t exist in the original game statement and which CDT has no way to make use of.
Well, yeah, kind of, that’s one way to look at it. Reformulate the question like this: what would CDT do if that communication channel were available? What general precommitment for future situations would CDT adopt and publish? That’s the question TDT people are trying to solve.
The simplest answer that moves this conversation forward would be “It would pretend to be a TDT agent that keeps its commitments, whenever that act of deception is beneficial to it. It would keep accurate statistics on how often agents claiming to be TDT agents actually are TDT agents, and adjust its priors accordingly.”
Now it is your turn to explain why this strategy violates the rules, whereas your invention of a deception-free channel did not.
I’m going to have to refer you to Eliezer’s TDT document for that. (If you’re OK with starting in medias res, the first mention of this is on pages 22-23, though there it’s just specialized to Newcomb’s Dilemmas; see pages 50-52 for an example of the limits of this hack. Elsewhere he’s argued for the more general nature of the hack.)
Ok thanks.
I’m coming to realize just how much of this stuff derives from Eliezer’s insistance on reflective consistency of a decision theory. Given any decision theory, Eliezer will find an Omega to overthrow it.
But doesn’t a diagonal argument show that no decision theory can be reflectively consistent over all test data presented by a malicious Omega? Just as there is no enumeration of the reals, isn’t there a game which can make any specified rational agent regret its rationality? Omega holds all the cards. He can always make you regret your choice of decision theory.
No. We can ensure that no such problem exists if we assume that (1) only the output decisions are used, not any internals; and (2) every decision is made with access to the full problem statement.
I’m not entirely sure what “every decision is made with full access to the problem statement means”, but I can’t see how it can possibly get around the diagonalisation argument. Basically, Omega just says “I simulated your decision on problem A, on which your algorithm outputs something different from algorithm X, and give you a shiny black ferrari iff you made the same decision as algorithm X”
As cousin_it pointed out last time I brought this up, Caspian made this argument in response to the very first post on the Counterfactual Mugging. I’ve yet to see anyone point out a flaw in it as an existence proof.
As far as I can see the only premise needed for this diagonalisation to work is that your decision theory doesn’t agree with algorithm X on all possible decisions, so just make algorithm X “whatever happens, recite the Bible backwards 17 times”.
In that case, your answer to problem A is being used in a context other than problem A. That other context is the real problem statement, and you didn’t have it when you chose your answer to A, so it violates the assumption.
Yeah, that definitely violates the “every decision is made with full access to the problem statement” condition. The outcome depends on your decision on problem A, but when making your decision on problem A you have no knowledge that your decision will also be used for this purpose.
I don’t see how this is useful. Let’s take a concrete example, let’s have decision problem A, Omega offers you the choice of $1,000,000, or being slapped in the face with a wet fish. Which would you like your decision theory to choose?
Now, No-mega can simulate you, say, 10 minutes before you find out who he is, and give you 3^^^3 utilons iff you chose the fish-slapping. So your algorithm has to include some sort of prior on the existence of “fish-slapping”-No-megas.
My algorithm “always get slapped in the face with a wet fish where that’s an option”, does better than any sensible algorithm on this particular problem, and I don’t see how this problem is noticeably less realistic than any others.
In other words, I guess I might be willing to believe that you can get around diagonalisation by posing some stringent limits on what sort of all-powerful Omegas you allow (can anyone point me to a proof of that?) but I don’t see how it’s interesting.
Actually, no, the probability of fish-slapping No-megas is part of the input given to the decision theory, not part of the decision theory itself. And since every decision theory problem statement comes with an implied claim that it contains all relevant information (a completely unavoidable simplifying assumption), this probability is set to zero.
Decision theory is not about determining what sorts of problems are plausible, it’s about getting from a fully-specified problem description to an optimal answer. Your diagonalization argument requires that the problem not be fully specified in the first place.
This is a no-choice scenario. If you say that the Bible-reciter is the one that will “win” here, you are using the verb “to win” with a different meaning from the one used when we say that a particular agent “wins” by making the choice that leads to the best outcome.
With the strong disclaimer that I have no background in decision theory beyond casually reading LW...
I don’t think so. The point of simulation (Omega) problems, to me, doesn’t seem to be to judo your intelligence against yourself; rather, it is to “throw your DT off the scent”, building weird connections between events (weird, but still vaguely possible, at least for AIs), that a particular DT isn’t capable of spotting and taking into account.
My human, real-life decision theory can be summarised as “look at as many possible end-result worlds as I can, and at what actions will bring them into being; evaluate how much I like each of them; then figure out which actions are most efficient at leading to the best worlds”. But that doesn’t exactly fly when you’re programming a computer, you need something that can be fully formalised, and that is where those strange Omega scenarios are useful, because your code must get it right “on autopilot”, it cannot improvise a smarter approach on the spot—the formula is on paper, and if it can’t solve a given problem, but another one can, it means that there is room for improvement.
In short, DT problems are just clever software debugging.
I agreed with everything you said after “I don’t think so”. So I am left confused as to why you don’t think so.
You analogize DT problems as test data used to determine whether we should accept or reject a decision theory. I am claiming that our requirements (i.e. “reflective consistency”) are so unrealistic that we will always be able to find test data forcing us to reject. Why do you not think so?
Because I suspect that there are only so many functionally different types of connections between events (at the very least, I see no hint that there must be infinitely many) and once you’ve found them all you will have the possibility of writing a DT that can’t be led to corner itself into suboptimal outcomes due to blind spots.
Am I correct in interpreting this as “infinitely many of them”? If so, I am curious as to what you mean by “functionally different types of connections between events”. Could you provide an example of some “types of connections between events”? Functionally different ones to be sure.
Presumably, the relevance must be your belief that decision theories differ in just how many of these different kinds of connections they handle correctly. Could you illustrate this by pointing out how the decision theory of your choice handles some types of connections, and why you have confidence that it does so correctly?
Oops, yes. Fixed.
CDT can ‘see’ the classical, everyday causal connections that are marked in formulas with the symbol “>” (and I’d have to spend several hours reading at least the Stanford Encyclopaedia before I could give you a confident definition of that), but it cannot ‘see’ the connection in Newcomb’s problem between the agent’s choice of boxes and the content of the opaque box (sometimes called ‘retrocausality’).
I don’t have a favourite formal decision theory, because I am not sufficiently familiar with the underlying math and with the literature of discriminating scenarios to pick a horse. If you’re talking about the human decision “theory” of mine I described above, it doesn’t explicitly do that; the key hand-waving passage is “figure out which actions are most efficient at leading to the best worlds”, meaning I’ll use whatever knowledge I currently possess to estimate how big is the set of Everett branches where I do X and get A, compared to the set of those where I do X and get B. (For example, six months ago I hadn’t heard of the concept of acausal connections and didn’t account for them at all while plotting the likelihoods of possible futures, whereas now I do—at least technically; in practice, I think that between human agents they are a negligible factor. For another example, suppose that some years from now I became convinced that the complexity of human minds, and the variability between different ones, were much greater than I previously thought; then, given the formulation of Newcomb’s problem where Omega isn’t explicitly defined as a perfect simulator and all we know is that it has had a 100% success rate so far, I would suitably increase my estimation of the chances of Omega screwing up and making two-boxing profitable.)
Ok, so if I understand you, there are only some finite number of valid kinds of connections between events and when we have all of them incorporated—when our decision theory can “see” each of them—we are then all done. We have the final, perfect decision theory (FPDT).
But what do you do then when someone—call him Yuri Geller—comes along and points out that we left out one important kind of connection: the “superspooky” connection. And then he provides some very impressive statistical evidence that this connection exists and sets up games in front of large (paying) audiences in which FPDT agents fail to WIN. He then proclaims the need for SSPDT.
Or, if you don’t buy that, maybe you will prefer this one. Yuri Geller doesn’t really exist. He is a thought experiment. Still the existence of even the possibility of superspooky connections proves that they really do exist and hence that we need to have SADT—Saint Anselm’s Decision Theory.
Ok, I’ve allowed my sarcasm to get the better of me. But the question remains—how are you ever going to know that you have covered all possible kinds of connections between events?
You can’t, I guess. Within an established mathematical model, it may be possible to prove that a list of possible configurations of event pairs {A, B} is exhaustive. But the model may always prove in need of expansion or refinement—whether because some element gets understood and modellised at a deeper level (eg the nature of ‘free’ will) or, more worryingly, because of paradigm shifts about physical reality (eg turns out we can time travel).
Decision theories should usually be seen as normative, not descriptive. How “realistic” something is, is not very important, especially for thought experiments. Decision theory cashes out where you find a situation that can indeed be analyzed with it, and where you’ll secure a better outcome by following theory’s advice. For example, noticing acausal control has advantages in many real-world situations (Parfit’s Hitchhiker variants). Eliezer’s TDT paper discusses this towards the end of Part I.
I believe you misinterpreted my “unrealistic requirements”. A better choice of words would have been “unachievably stringent requirements”. I wasn’t complaining that Omega and the like are unrealistic. At least not here.
The version I have of Eliezer’s TDT paper doesn’t have a “Part I”. It is dated “September 2010 and has 112 pages. Is there a better version available?
I don’t understand your other comments. Or, perhaps more accurately, I don’t understand what they were in response to.
“Part I” is chapters 1-9. (This concept is referred to in the paper itself.)
Not to me. But a reference might repair that deficiency on my part.
See Eliezer’s posts on Newcomb’s Problem and regret of rationality and TDT problems he can’t solve.
(Incidentally, I found those reference in about 30 seconds, starting from the LW Wiki page on Parfit’s Hitchhiker.)
Ah! Thank you. I see now. The circumstance in which a CDT agent will self modify to use a different decision theory are that:
The agent was programmed by Eliezer Yudkowsky and hence is just looking for an excuse to self-modify.
The agent is provided with a prior leading it to be open to the possibility of omnicient, yet perverse agents bearing boxes full of money.
The agent is supplied with (presumably faked) empirical data leading it to believe that all such omniscient agents reward one-boxers.
Since the agent seeks reflective equilibrium (because programmed by aforesaid Yudkowsky), and since it knows that CDT requires two boxing, and since it has no reason to doubt that causality is important in this world, it makes exactly the change to its decision theory that seems appropriate. It continues to use CDT except on Newcomb problems, where it one boxes. That is, it self-modifies to use a different decision theory, which we can call CDTEONPWIOB.
Well, ok, though I wouldn’t have said that these are cases where CDT agents do something weird. These are cases where EYDT agents do something weird.
I apologize if it seems that the target of my sarcasm is you WrongBot. It is not.
EY has deluded himself into thinking that reflective consistency is some kind of gold standard of cognitive stability. And then he uses reflective consistency as a lever by which completely fictitious data can uproot the fundamental algorithms of rationality. Which would be fine, except that he has apparently convinced a lot of smart people here that he knows what he is talking about. Even though he has published nothing on the topic. Even though other smart people like Robin tell him that he is trying to solve an already solved problem.
I would say more but …
This manuscript was cut off here, but interested readers are suggested to look at these sources for more discussion: Bibliography Gibbard, A., and Harper, W. L. (1978), “Counterfactuals and Two Kinds of Expected Utility”, in C. A. Hooker, J. J. Leach, and E. F. McClennen (eds.), Foundations and Applications of Decision Theory, vol. 1, Reidel, Dordrecht, pp. 125-162.
Reflective consistency is not a “gold standard”. It is a basic requirement. It should be easy to come up with terrible, perverse decision theories that are reflectively consistent (EY does so, sort of, in his TDT outline, though it’s not exactly serious / thorough). The point is not that reflective consistency is a sign you’re on the right track, but that a lack of it is a sign that something is really wrong, that your decision theory is perverse. If using your decision theory causes you to abandon that same decision theory, it can’t have been a very good decision theory.
Consider it as being something like monotonicity in a voting system; it’s a weak requirement for weeding out things that are clearly bad. (Well, perhaps not everyone would agree IRV is “clearly bad”, but… it isn’t even monotonic!) It just happens that in this case evidently nobody noticed before that this would be a good condition to satisfy and hence didn’t try. :)
Am not sure that decision theory is an “already solved” problem. There’s the issue of what happens when agents can self-modify—and so wirehead themselves. I am pretty sure that is an unresolved “grand challenge” problem.
TDT gets better outcomes than CDT when faced with Newcomb’s Problem, Parfit’s Hitchhiker, and the True Prisoner’s Dilemma.
When does CDT outperform TDT? If the answer is “never”, as it currently seems to be, why wouldn’t a CDT agent self-modify to use TDT?
Because it can’t find a write-up that explains how to use it?
Perhaps you can answer the questions that I asked here What play does TDT make in the game of Chicken? Can you point me to a description of TDT that would allow me to answer that question for myself?
Suppose I’m an agent implementing TDT. My decision in Chicken depends on how much I know about my opponent.
If I know my opponent implements the same decision procedure I do (because I have access to its source code, say), and my opponent has this knowledge about me, I swerve. In this case, my opponent and I are in symmetrical positions and its choice is fully determined by mine; my choice is between payoffs of (0,0) and (-10,-10).
Else, I act identically to a CDT agent.
As Eliezer says here, the one-sentence version of TDT is “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
I’m not sure this is right. Isn’t there a correlated equilibrium that does better?
I think we’re looking at different payoff matrices. I was using the formulation of Chicken that rewards
which doesn’t have a correlated equilibrium that beats (C,C).
Using the payoff matrix Perplexed posted here, there is indeed a correlated equilibrium, which I believe the TDT agents would arrive at (given a source of randomness). My bad for not specifying the exact game I was talking about.
...and, this is what I get for not actually checking things before I post them.
Two questions:
Why do you believe the TDT agents would find the correlated equilibrium? Your previous statement and Eliezer quote suggested that a pair of TDT agents would always play symmetrically in a symmetric game. No “spontaneous symmetry breaking”.
Even without a shared random source, there is a Nash mixed equilibrium that is also better than symmetric cooperation. Do you believe TDT would play that if there were no shared random input?
In a symmetric game, TDT agents choose symmetric strategies. Without a source of randomness, this entails playing symmetrically as well.
I’m not sure why you’re talking about shared random input. If both agents get the same input, they can both be expected to treat it in the same way and make the same decision, regardless of the input’s source. Each agent needs an independent source of randomness in order to play the mixed equilibrium; if my strategy is to play C 30% of the time, I need to know whether this iteration is part of that 30%, which I can’t do deterministically because my opponent is simulating me.
Yeah, I think any use of correlated equilibrium here is wrong—that requires a shared random source. I think in this case we just get symmetric strategies, i.e., it reduces to superrationality, where they each just get their own private random source.
Sorry if this was unclear. It was a reference to the correlated pair of random variables used in a correlated equilibrium. I was saying that even without such a correlated pair, you may presume the availability of independent random variables which would allow a Nash equilibrium—still better than symmetric play in this game.
Gah, wait. I feel dumb. Why would TDT find correlated equilibria? I think I had the “correlated equilibrium” concept confused. A correlated equilibrium would require a public random source, which two TDTers won’t have.
Digits of pi are kind of like a public random source.
Ignoring the whole pi-is-not-known-to-be-normal thing, how do you determine which digit of pi to use when you can’t actually communicate and you have no idea how many digits of pi the other player may already know?
Same way you meet up in New York with someone you’ve never talked to: something like Schelling points. I’m not sure that answer works in practice.
Thank you. I hope you realize that you have provided an example of a game in which CDT does better than TDT. For example, in the game with the payoff matrix shown below, there is a mixed strategy Nash equilibrium which is better than the symmetric cooperative result.
Looks like we’re talking about different versions of Chicken. Please see my reply to Sniffnoy.
So TDT is different from CDT only in cases where the game is perfectly symmetric? If you are playing a game that is roughly the symmetric PD, except that one guy’s payoffs are shifted by a tiny +epsilon, then they should both defect?
TDT is different from CDT whenever one needs to consider the interaction of multiple decisions made using the same TDT-based decision procedure. This applies both to competitions between agents, as in the case of Chicken, and to cases where an agent needs to make credible precommitments, as in Newcomb’s Problem.
In the case of an almost-symmetric PD, the TDT agents should still cooperate. To change that, you’d have to make the PD asymmetrical enough that the agents were no longer evaluating their options in the same way. If a change is small enough that a CDT agent wouldn’t change its strategy, TDT agents would also ignore it.
This doesn’t strike me as the world’s greatest explanation, but I can’t think of a better way to formulate it. Please let me know if there’s something that’s still unclear.
This strikes me as a bit bizarre. You test whether a warped PD is still close enough to symmetric by asking whether a CDT agent still defects in order to decide whether a TDT agent should still cooperate? Are you sure you are not just making up these rules as you go?
Much is unclear and very little seems to be coherently written down. What amazes me is that there is so much confidence given to something no one can explain clearly. So far, the only stable thing in your description of TDT is that it is better than CDT.
Do you have an example of a problem on which CDT or EDT does better than TDT?
I have yet to see a description of TDT which allows me to calculate what TDT does on an arbitrary problem. But I do know that I have seen long lists from Eliezer of problems that TDT does not solve that he thinks it ought to be improved so as to solve.
The world isn’t sufficiently formalized for us to meet that standard for any decision theory (though we come closer with CDT and TDT than with EDT, in my opinion). However, cousin_it has a few recent posts on formalized situations where an agent of a more TDT (actually, UDT) type does strictly better than a CDT one in the same situation. I don’t know of any formalization (or any fuzzy real-world situation) where the opposite is true.
I apparently misled you by using that word “arbitrary”. I’m not asking for solutions to soft problems that are difficult to formalize. Simply solutions to the standard kinds of games already formalized in game theory. For example, the game of Chicken). Can anyone point me to a description that tells me what play TDT would make in this game? Or what mixed strategy it would use? Both assuming and not assuming the reading of each other’s code.
ETA: Slightly more interesting than the payoff matrix shown in the wikipedia article is the case when the payoff for a win is 2 units, with a loss still costing only −1. This means that in the iterated version, the negotiated solution would be to alternate wins. But we are interested in the one-shot case.
Can TDT find a correlated equilibrium? If not, which Nash equilibrium does it pick? Or does it always chicken out? Where can I learn this information?
Since CDT and EDT don’t solve those problems either, all this justifies saying is that TDT does better on some problems, and the same on others, not “worse on others”.
For every possible decision theory, there is a “nemesis” environment—where it does extremely badly. That is no-free-lunch fall out.
A “nemesis” environment that feeds misleading evidence to a decision theory’s underlying epistimology does not indicate the sort of problem illustrated by an environment in which a decision theory does something stupid with true information.
What you asked for was a case where a decision theory did worse than its rivals.
However, that seems pretty trivial if it behaves differently from them—you just consider an appropriate pathological environment set up to punish that decision theory.
Yes, in the context of Perplexed dismissing examples of TDT doing better than CDT because CDT was being stupid with true information.
Not necessarily. Various decision theories can come into play here. It depends precisely on what you mean by the prisoner’s paradox. If you are playing a true one shot where you have no information about the entity in question then that might be true. But if you are playing a true one shot where you each before making the decision have each player have access to the other player’s source code then defecting may not be the best solution. Some of the decision theory posts have discussed this. (Note that knowing each others’ source code is not nearly as strong an assumption as it might seem since one common idea in game theory is to look at what game theory occurs when people know when the other players know your strategy. (I’m oversimplifying some technical details here. I don’t fully understand all the issues. I’m not a game theorist. Add any other relevant disclaimers.))
No one on this thread has mentioned a “prisoner’s paradox”. We have been discussing the Prisoner’s Dilemma, a well known and standard problem in game theory which involves two players who must decide without prior knowledge of the other player’s decision.
A different problem in which neither player is actually making a decision, but instead is controlled by a deterministic algorithm, and in which both players, by looking at source, are able to know the other’s decision in advance, is certainly an interesting puzzle to consider, but it has next to nothing in common with the Prisoner’s Dilemma besides a payoff matrix.
Prisoner’s paradox is another term for the prisoner’s dilemma. See for example this Wikipedia redirect. You may want to reread what I wrote in that light. (Although there’s some weird bit of illusion of transparency going on here in that part of me has a lot of trouble understanding how someone wouldn’t be able to tell from context that they were the same thing.)
No. The problem of what to do is actually closely related when one has systems which are able to understand each others source code. It is in fact related to the problem of iterating the problem.
In general, given no information, the problem still has relevant decision theoretic considerations.
I’m curious why you assert this. Game theorists have a half dozen or so standard simple one-shot two person games which they use to illustrate principles. PD is one, matching pennies is another, Battle of the Sexes, Chicken, … the list is not that long.
They also have a handful of standard ways of taking a simple one-shot game and turning it into something else—iteration is one possibility, but you can also add signaling, bargaining with commitment, bargaining without commitment but with a correlated shared signal, evolution of strategies to an ESS, etc. I suppose that sharing source code can be considered yet another of these basic game transformations.
Now we have the assertion that for one (PD is the only one?) of these games, one (iteration is the only one?) of these transformations is closely related to this new code-sharing transformation. Why is this assertion made? Is there some kind of mathematical structure to this claimed relationship? Some kind of proof? Surely there is more evidence for this claimed relationship than just pointing out that both transformations yield the same prescription—“cooperate”—when there are only two possible prescriptions to choose among.
Is the code-sharing version of Chicken also closely related to the iterated version? How about Battle of the Sexes?
So I’m going to need to repeat my earlier disclaimer that I’m far from my area of expertise. But the basic idea is that iterating games gives you a probabilistic estimate for what the underlying code looks like (assuming some sort of nice distribution on potential source code such that in general simpler code is more likely than complicated code). Unfortunately, I don’t know any details of this approach beyond its existence but it should apply to other games like Chicken also.