Wei Dai comments on Ingredients of Timeless Decision Theory

Wei Dai 19 Aug 2009 7:08 UTC
36 points
Today I finally came up with a simple example where TDT clearly loses and CDT clearly wins, and as a bonus, proves that TDT isn’t reflectively consistent.

Omega comes to you and says

I’m hosting a game with 3 players. Two players are AIs I created running TDT but not capable of self-modification, one being a paperclip maximizer, the other being a staples maximizer. The last player is an AI you will design. When the game starts, my two AIs will first get the source code of your AI (which is only fair since you know the design of my AIs). Then 2 of the 3 players will be chosen randomly to play a one-shot true PD, without knowing who they are facing. What AI do you submit?

Say the payoffs of the PD are
- ⁵⁄₅ ⁰⁄₆
- ⁶⁄₀ ¹⁄₁
Suppose you submit an AI running CDT. Then, Omega’s AIs will reason as follows: “I have ¹⁄₂ chance of playing against a TDT, and ¹⁄₂ chance of playing against a CDT. If I play C, then my opponent will play C if it’s a TDT, and D if it’s a CDT, therefore my expected payoff is 5/2+0/2=2.5. If I play D, then my opponent will play D, so my payoff is 1. Therefore I should play C.” Your AI then gets a payoff of 6, since it will play D.

Suppose you submit an AI running TDT instead. Then everyone will play C, so your AI will get a payoff of 5.

So you submit a CDT, whether you are running CDT or TDT. That’s because explicitly giving the source code of your submitted AI to the other AIs makes the consequences of your decision the same under CDT and under TDT.

Suppose you have to play this game yourself instead of delegating it, you can self-modify, and the payoffs are large enough, you’d modify yourself from running TDT to running some other DT that plays D in this game! (Notice that I specified that Omega’s AIs can’t self-modify, so your decision to self-modify won’t have the logical consequence that they also self-modify.)

It seems that I’ve given a counter-example to the claim that

the behavior of TDT corresponds to reflective consistency on a problem class in which your payoff is determined by the type of decision you make, but not sensitive to the exact algorithm you use apart from that

Or does my example fall outside of the specified problem class?
What links here?
- Eliezer Yudkowsky 19 Aug 2009 8:01 UTC
  12 points
  Parent
  
  Or does my example fall outside of the specified problem class?
  
  If I wanted to defend the original thesis, I would say yes, because TDT doesn’t cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed—the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the “problem class” under which the original TDT is automatically reflectively consistent.
  
  Nonetheless it’s certainly an interesting dilemma.
  
  Your comment here is actually pre-empting a comment that I’d planned to make after providing some of the background for the content of TDT. I’d thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner’s Dilemma and predictably get away with it, provided you did so for unusual reasons. But the conditions on “unusual reasons” are much more difficult than your posts seem to imply. We can’t all act on unusual reasons and end up doing the same thing, after all. How is it that these two TDT AIs got here, if not by act of Omega, if the sensible thing to do is always to submit a CDT AI?
  
  To introduce yet another complication: What if the TDTs that you’re playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player? Given that your reason for submitting a CDT player involves your expectation about how the TDT players will respond, and that you can “get away with it”? It’s the TDT’s responses that make them “exploitable” by your decision to submit a CDT player—so what if they employ a different strategy instead? (This is another open problem—“who acts first” in timeless negotiations.)
  
  There might be a certain sense in which being in a “small subgroup internally correlated but not correlated with larger groups” could possibly act as a sort of resource for getting away with defection in the true PD, because if you’re in a large group then defecting shifts the probability of an opponent likewise defecting by a lot, but if you’re in a small subgroup then it shifts the probability of the opponent defecting by a little, so there’s a lower penalty for defection, so in marginal cases a small subgroup might play defection while a large subgroup plays cooperate. (But again, the conditions on this are difficult. If all small subgroups reason this way, then all small subgroups form a large correlated group!)
  
  Anyway—you can’t end up in a small subgroup if you start out in a large one, because if you decide to deliberately condition on noise in order to decrease the size of your subgroup, that itself is a correlated sort of decision with a clear line of reasoning and motive, and others in your correlated group will try doing the same thing, with predictable results. So to the extent that lots of AI designers in distant parts of Reality are discussing this same issue with the same logic, we are already in a group of a certain minimum size.
  
  But this does lead to an argument for CEV (values extrapolating / Friendly AI) algorithms that don’t automatically, inherently correlate us with larger groups than we already started out being in. If uncorrelation is a nonrenewable resource then FAI programmers should at least be careful not to wantonly burn it. You can’t deliberately add noise, but you might be able to preserve existing uncorrelation.
  
  Also, other TDTs can potentially set their “minimum cooperator frequency threshold” at just the right level that if any group of noticeable size chooses to defect, all the TDTs start defecting—though this itself is a possibility I am highly unsure of, and once again it has to do with “who goes first” in timeless strategies, which is an open problem.
  
  But these are issues in which my understanding is still shaky, and it very rapidly gets us into very dangerous territory like trying to throw the steering wheel out the window while playing chicken.
  
  So far as evolved biological organisms go, I suspect that the ones who create successful Friendly AIs (instead of losing control and dying at the hands of paperclip maximizers), would hardly start out seeing only the view from CDT—most of them/us would be making the decision “Should I build TDT, knowing that the decisions of other biological civilizations are correlated to this one?” and not “Should I build TDT, having never thought of that?” In other words, we may already be part of a large correlated subgroup—though I sometimes suspect that most of the AIs out there are paperclip maximizers born of experimental accidents, and in that case, if there is no way of verifying source code, nor of telling the difference between SIs containing bio-values-preserving civs and SIs containing paperclip maximizers, then we might be able to exploit the relative smallness of the “successful biological designer” group...
  
  ...but a lot of this presently has the quality of “No fucking way would I try that in real life”, at least based on my current understanding. The closest I would get might be trying for a CEV algorithm that did not inherently add correlation to decision systems with which we were not already correlated.
  What links here?
  - Wei Dai 19 Aug 2009 12:42 UTC
    9 points
    Parent
    
    This is another open problem—“who acts first” in timeless negotiations.
    
    You’re right, I failed to realize that with timeless agents, we can’t do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.
    
    Here’s an idea. The logical order of decisions is related to simulation ability. Suppose A can simulate B, meaning it has trustworthy information about B’s source code and has sufficient computing power to fully simulate B or sufficient intelligence to analyze B using reliable shortcuts, but B can’t simulate A. Then the logical order of decisions is B followed by A, because when B makes his decision, he can treat A’s decision as conditional on his. But when A makes her decision, she has to take B’s decision as a given.
    
    Does that make sense?
    What links here?
    Wei Dai's comment on Updatelessness doesn’t solve most problems by Martín Soto (6 Mar 2024 18:05 UTC; 7 points)
    Steve_Rayhawk's comment on The Danger of Stories by Matt_Simpson (9 Nov 2009 4:17 UTC; 0 points)
    - Eliezer Yudkowsky 19 Aug 2009 15:05 UTC
      12 points
      Parent
      Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power. Rational agents should not regret having more computing power (because they can always use less) or more knowledge (because they can always implement the same strategy they would use with less knowledge) - this sort of thing is a sure sign of reflective inconsistency.
      
      To see why moving logically second is a disadvantage, consider that it lets an opponent playing Chicken always toss their steering wheel out the window and get away with it.
      
      That both players desire to move “logically first” argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.
      
      (I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole ’nother story, and of course also falls into the “Way the fuck too dangerous to try in real life” category relative to my present knowledge.)
      
      With timeless agents, we can’t do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.
      
      BTW, thanks for this compact way of putting it.
      - rwallace 19 Aug 2009 19:33 UTC
        4 points
        Parent
        Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.
        
        One category of counterexample comes from warfare, where if you know what the enemy will do and he doesn’t know what you will do, you have the upper hand. (The logical versus temporal distinction is clear here: being temporally the first to reach an objective can be a big advantage.)
        
        Another counterexample is in negotiation where a buyer and seller are both uncertain about fair market price; each may prefer the other to be first to suggest a price. (In practice this is often resolved by the party with more knowledge, or more at stake, or both—usually the seller—being first to suggest a price.)
        Wei Dai 20 Aug 2009 0:18 UTC
        3 points
        Parent
        
        Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.
        
        You’re right. Rock-paper-scissors is another counter-example. In these cases, the relationship between between the logical order of moves and simulation ability seems pretty obvious and intuitive.
        Eliezer Yudkowsky 20 Aug 2009 0:19 UTC
        4 points
        Parent
        Except that the analogy to rock-paper-scissors would be that I get to move logically first by deciding my conditional strategy “rock if you play scissors” etc., and simulating you simulating me without running into an apparently non-halting computation (that would otherwise have to be stopped by my performing counterfactual surgery on the part of you that simulates my own decision), then playing rock if I simulate you playing scissors.
        
        At least I think that’s how the analogy would work.
        Vladimir_Nesov 20 Aug 2009 0:36 UTC
        4 points
        Parent
        I suspect that this kind of problems will run into computational complexity issues, not clever decision theory issues. Like with a certain variation on St. Petersburg paradox (see the last two paragraphs), where you need to count to the greatest finite number to which you can count, and then stop.
        Wei Dai 20 Aug 2009 0:29 UTC
        1 point
        Parent
        Suppose I know that’s your strategy, and decide to play the move equal to (the first googleplex digits of pi mod 3), and I can actually compute that but you can’t. What are you going to do?
        
        If you can predict what I do, then your conditional strategy works, which just shows that move order is related to simulation ability.
        Eliezer Yudkowsky 20 Aug 2009 3:32 UTC
        4 points
        Parent
        In this zero-sum game, yes, it’s possible that whoever has the most computing power wins, if neither can access unpredictable random or private variables. But what if both sides have exactly equal computing power? We could define a Timeless Paper-Scissors-Rock Tournament this way—standard language, no random function, each program gets access to the other’s source code and exactly 100 million ticks, if you halt without outputting a move then you lose 2 points.
        Wei Dai 20 Aug 2009 9:13 UTC
        3 points
        Parent
        This game is pretty easy to solve, I think. A simple equilibrium is for each side to do something like iterate x = SHA-512(x), with a random starting value, using an optimal implementation of SHA-512, until time is just about to run out, then output x mod 3. SHA-512 is easy to optimize (in the sense of writing the absolutely fastest implementation), and It seems very unlikely that there could be shortcuts to computing (SHA-512)^n until n gets so big (around 2^256 unless SHA-512 is badly designed) that the function starts to cycle.
        
        I think I’ve answered your specific question, but the answer doesn’t seem that interesting, and I’m not sure why you asked it.
        Paul Crowley 23 May 2012 11:51 UTC
        3 points
        Parent
        Schneier et al here prove that being able to calculate H^n(x) quickly leads to a faster way of finding collisions in H. http://www.schneier.com/paper-low-entropy.html
        Eliezer Yudkowsky 20 Aug 2009 22:11 UTC
        1 point
        Parent
        Well, it’s probably not all that interesting from a purely theoretical perspective, but if the prize money was divided up among only the top fifth of players, you’d actually have to try to win, and that would be an interesting challenge for computer programmers.
      - Wei Dai 19 Aug 2009 19:31 UTC
        1 point
        Parent
        
        Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power.
        
        But if you are TDT, you can’t always use less computing power, because that might be correlated with your opponents also deciding to use less computing power, or will be distrusted by your opponent because it can’t simulate you.
        
        But if you simply don’t have that much computing power (and opponent knows this) then you seem to have the advantage of logically moving first.
        
        (I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole ’nother story, and of course also falls into the “Way the fuck too dangerous to try in real life” category relative to my present knowledge.)
        
        Lack of computing power could be considered a form of “crazy reasoning”...
        
        Why does TDT lead to the phenomenon of “stupid winners”? If there’s a way to explain this as a reasonable outcome, I’d feel a lot better. But is that like a two-boxer asking for an explanation of why, when the stupid (from their perspective) one-boxers keep winning, that’s a reasonable outcome?
        What links here?
        Wei Dai's comment on The Commitment Races problem by Daniel Kokotajlo (23 Aug 2019 6:02 UTC; 5 points)
        Eliezer Yudkowsky 19 Aug 2009 19:55 UTC
        1 point
        Parent
        
        But if you are TDT, you can’t always use less computing power, because that correlates with your opponents also deciding to use less computing power.
        
        Substitute “move logically first” for “use less computing power”? Using less computing power seems like a red herring to me. TDT on simple problems (with the causal / logical structure already given) uses skeletally small amounts of computing power. “Who moves first” is a “battle”(?) over the causal / logical structure, not over who can manage to run out of computing power first. If you’re visualizing this using lots of computing power for the core logic, rather than computing the 20th decimal place of some threshold or verifying large proofs, then we’ve got different visualizations.
        
        The idea of “if you do this, the opponent does the same” might apply to trying to move logically first, but in my world this has nothing to do with computing power, so at this point I think it’d be pretty odd if the agents were competing to be stupider.
        
        Besides, you don’t want to respond to most logical threats, because that gives your opponent an incentive to make logical threats; you only want to respond to logical offers that you want your opponent to have an incentive to make. This gets into the scary issues I was hinting at before, like determining in advance that if you see your opponent predetermine to destroy the universe in a mutual suicide unless you pay a ransom, you’ll call their bet and die with them, even if they’ve predetermined to ignore your decision, etcetera; but if they offer to trade you silver for gold at a Ricardian-advantageous rate, you’ll predetermine to cooperate, etc. The point, though, is that “If I do X, they’ll do Y” is not a blank check to decide that minds do X, because you could choose a different form of responsiveness.
        
        But anyway, I don’t see in the first place that agents should be having these sorts of contests over how little computing power to use. That doesn’t seem to me like a compelling advantage to reach for.
        
        But if you simply don’t have that much computing power then you seem to have the advantage of logically moving first.
        
        If you’ve got that little computing power then perhaps you can’t simulate your opponent’s skeletally small TDT decision, i.e., you can’t use TDT at all. If you can’t close the loop of “I simulate you simulating me”—which isn’t infinite, and actually terminates rather quickly in the simple cases I know how to analyze at all, because we perform counterfactual surgery inside the loop—then you can’t use TDT at all.
        
        Lack of computing power could be considered a form of “crazy reasoning”...
        
        No, I mean much crazier than that. Like “This doesn’t follow, but I’m going to believe it anyway!” That’s what it takes to get “unusual reasons”—the sort of madness that only strictly naturally selected biological minds would find compelling in advance of a timeless decision to be crazy. Like “I’M GOING TO THROW THE STEERING WINDOW OUT THE WHEEL AND I DON’T CARE WHAT THE OPPONENT PREDETERMINES” crazy.
        
        Why does TDT lead to the phenomenon of “stupid winners”?
        
        It has not been established to my satisfaction that it does. It is a central philosophical intuition driving my decision theory that increased computing power, knowledge, or self-control, should not harm a rational agent.
      - Eliezer Yudkowsky 20 Aug 2009 0:11 UTC
        0 points
        Parent
        
        That both players desire to move “logically first” argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.
        
        ...possibly employing mixed strategies, by analogy to the equilibrium of games where neither agent gets to go first and both must choose simultaneously? But I haven’t done anything with this idea, yet.
      - RickJS 29 Aug 2009 3:04 UTC
        −9 points
        Parent
        First of all, congratulations, Eliezer! That’s great work. When I read your 3-line description, I thought it would never be computable. I’m glad to see you can actually test it.
        
        Eliezer_Yudkowsky wrote on 19 August 2009 03:05:15PM
        
        … Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them)
        
        Rock-paper-scissors ?
        Negotiating to buy a car?
        
        I would like to begin by saying that I don’t believe my own statements are True, and I suggest you don’t either. I do request that you try thinking WITH them before attacking them. It’s really hard to think with an idea AFTER you’ve attacked it. I’ve been told my writing sounds preachy or even fanatical. I don’t say “In My Opinion” enough. Please imagine “IMO” in front of every one of my statements. Thanks!
        
        Having more information (not incorrect “information”) on the opponent’s decisions is beneficial.
        
        Let’s distinguish Secret Commit & Simultaneous Effect (SCSE) from Commit First & Simultaneous Effect (CFSE) and from Act & Effect First (AEF). That’s just a few categories from a coarse categorization of board war games.
        
        The classic gunfight at high noon is AEF (to a first approximation, not counting watching his face & guessing when his reaction time will be lengthened). The fighter who draws first has a serious advantage, the fighter who hits first has a tremendous advantage, but not certain victory. (Hollywood not withstanding, people sometimes keep fighting after taking handgun hits, even a dozen of them.) I contend that all AEFs give advantage to the first actor. Chess is AEF.
        
        My understanding of the Prisoner’s Dilemma is that it is SCSE as presented. On this thread, it seems to have mutated into a CFSE (otherwise, there just isn’t any “first”, in the ordinary, inside-the-Box-Universe, timeful sense). If Prisoner A has managed to get information on Prisoner B’s commitment before he commits, this has to be useful. Even if PA is a near-Omega, it can be a reality check on his Visualization of the Cosmic All. In realistic July 2009 circumstances, it identifies PB as one of the 40% of humans who choose ‘cooperate’ in one-shot PD. PA now has a choice whether to be an economist or a friend.
        
        And now we get down to something fundamental. Some humans are better people than the economic definition of rationality, which ” … assume that each player cares only about minimizing his or her own time in jail”. ” … cooperating is strictly dominated) by defecting … ” even with leaked information.
        
        “I don’t care what happens to my partner in crime. I don’t and I won’t. You can’t make me care. On the advice of my economist… ” That gets both prisoners a 5-year sentence when they could have had 6 months.
        
        That is NOT wisdom! That will make us extinct. (In My Opinion)
        
        Now try on “an injury to one is an injury to all”. Or maybe “an injury to one is an (discounted) injury to ME”. We just might be able to see that the big nuclear arsenals are a BAD IDEA!
        
        Taking that on, the payoff matrix offered by Wei Dai’s Omega (19 August 2009 07:08:23AM)
        
        * cooperate 5/5 0/6 * defect 6/0 1/1
        is now transformed into PA’s Internal Payoff Matrix (IPM)
        
        * cooperate 5+5κ/5 0+6κ/6 * defect 6+0κ/0 1+1κ/1
        In other words, his utility function has a term for the freedom of Prisoner B. (Economists be damned! Some of us do, sometimes.)
        
        “I’ll set κ=0.3 ,” Says PA (well, he is a thief). Now PA’s IPM is:
        
        * cooperate 6.5/5 1.8/6 * Defect 6/0 1.3/1
        Lo and behold! ‘cooperate’ now strictly dominates!
        
        When over 6 billion people are affected, it doesn’t take much of a κ to swing my decisions around. If I’m not working to save humanity, I must have a very low κ for each distant person unknown to me.
        
        People say, “Human life is precious!” Show it to me in results. Show it to me in how people budget their time and money. THAT is why Friendly AI is our only hope. We will ‘defect’ our way into thwarting any plan that requires a lot of people to change their beliefs or actions. That sub-microscopic κ for unknown strangers is evolved-in, it’s not going away. We need a program that can be carried out by a tiny number of people.
        
        .
        
        .
        
        .
        
        IMO.
        
        ---=
        
        Maybe I missed the point. Maybe the whole point of TDT is to derive some sort of reduced-selfishness decision norm without an ad-hoc utility function adjustment (is that what “rained down from heaven” means?). I can derive the κ needed in order to save humanity, if there were a way to propagate it through the population. I cannot derive The One True κ from absolute principles, nor have I shown a derivation of “we should save humanity”. I certainly fell short of ” … looking at which agents walk away with huge heaps of money and then working out how to do it systematically … ”. I would RATHER look at which agents get their species through their singularity alive. Then, and only then, can we look at something grander than survival. I don’t grok in fullness “reflective consistency”, but from extinction we won’t be doing a lot of reflecting on what went wrong.
        
        IMO.
        
        Now, back to one-shot PD and “going first”. For some values of κ and some external payoff matrices (not this one), the resulting IPM is not strictly dominated, and having knowledge of PB’s commitment actually determines whether ‘cooperate’ or ‘defect’ produces a better world in PA’s internal not-quite-so-selfish world-view. Is that a disadvantage? (That’s a serious, non-rhetorical question. I’m a neophyte and I may not see some things in the depths where Eliezer & Wei think.)
        
        Now let’s look at that game of chicken. Was “throw out the steering wheel” in the definition of the thought experiment? If not, that player just changed the universe-under-consideration, which is a fairly impressive effect in an AEF, not a CFSE.
        
        If re-engineering was included, then Driver A may complete his wheel-throwing (while in motion!) only to look up and see Driver B’s steering gear on a ballistic trajectory. Each will have a few moments to reflect on “always get away with it.”
        
        If Driver A successfully defenestrates first, is Driver B at a disadvantage? Among humans, the game may be determined more by autonomic systems than by conscious computation, and B now knows that A won’t be flinching away. However, B now has information and choices. One that occurs to me is to stop the car and get out. “Your move, A.” A truly intelligent player (in which category I do not, alas, qualify) would think up better, or funnier, choices.
        
        Hmmm… to even play Chicken you have to either be irrational or have a damned strange IPM. We should establish that before proceeding further.
        
        I challenge anyone to show me a CFSE game that gives a disadvantage to the second player.
        
        I’m not too proud to beg: I request your votes. I’ve got an article I’d like to post, and I need the karma.
        
        Thanks for your time and attention.
        
        RickJS
        Saving Humanity from Homo Sapiens
        
        08/28/2009 ~20:10 Edit: formatting … learning formatting … grumble … GDSOB tab-deleter … Fine. I’ll create the HTML for tables, but this is a LOT of work for 3 simple tables … COMMENT TOO LONG!?!? … one last try … now I can’t quit, I’m hooked! … NAILED that sucker! … ~22:40 : added one more example *YAWN*
        Vladimir_Nesov 29 Aug 2009 9:09 UTC
        5 points
        Parent
        It’s incomprehensible. Try debugging individual ideas first, written up more carefully.
    - Eliezer Yudkowsky 19 Aug 2009 15:31 UTC
      1 point
      Parent
      
      With timeless agents, we can’t do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.
      
      BTW, thanks for this compact way of putting it.
    - [deleted] 13 Jun 2014 6:52 UTC
      −2 points
      Parent
      This reminds me of logical Fatalism and the Argument from Bivalence
  - cousin_it 20 Jun 2013 15:47 UTC
    3 points
    Parent
    
    What if the TDTs that you’re playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?
    
    That’s a good point, but what if the process that gives birth to CDT doesn’t listen to the incentives you give it? For example, it could be evolution or random chance.
    
    Here’s an example, similar to Wei’s example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob’s life will be much happier than Alice’s, right?
    
    We can’t all act on unusual reasons and end up doing the same thing, after all.
    
    What force will push against evolution and keep the number of Bobs small?
    What links here?
    Wei Dai's comment on Funk-tunul’s Legacy; Or, The Legend of the Extortion War by Zack_M_Davis (24 Dec 2019 18:05 UTC; 8 points)
- Vladimir_Nesov 19 Aug 2009 11:22 UTC
  3 points
  Parent
  The problem is that “source code of your AI” is not a complete story, since your decisions as AI programmer also depended on the Omega AIs’ code, and so what you give as the source of AI is already only one of the possible worlds that presupposes the behavior of Omega AIs.
  What links here?
  - Vladimir_Nesov's comment on Another attempt to explain UDT by cousin_it (14 Nov 2010 22:55 UTC; 7 points)
  - Wei Dai 19 Aug 2009 12:33 UTC
    2 points
    Parent
    Yes, I think Eliezer made a similar point:
    
    What if the TDTs that you’re playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?
    
    So if you run TDT, then there are at least two equilibria in this game, only one of which involves you submitting a CDT. Can you think of a way to select between these two equilibria?
    
    If not, I can fix this by changing the game a bit. Omega will now create his TDT AIs after you design yours, and hard code the source code of your AI into it as givens. His AIs won’t even know about you, the real player.
    - Eliezer Yudkowsky 19 Aug 2009 14:47 UTC
      8 points
      Parent
      
      Omega will now create his TDT AIs after you design yours, and hard code the source code of your AI into it as givens. His AIs won’t even know about you, the real player.
      
      They might simply infer you, the real player. You might as well tell the TDT AIs that they’re up against a hardcoded Defect move as the “other player”, but they won’t know if that player has been selected. In fact, that pretty much is what you’re telling them, if you show them a CDT player. The CDT player is a red herring—the decision to defect was made by you, in the moment of submitting a CDT player. There is no law against TDT players realizing this after Omega codes them.
      
      I should note that in matters such as these, the phrase “hard code” should act as a warning sign that you’re trying to fix something that, at least in your own mind, doesn’t want to be fixed. (E.g. “hard code obedience into AIs, build it into the very circuitry!”) Where you are tempted to say “hard code” you may just need to accept whatever complex burden you were trying to get rid of by saying “fix it in place with codes of iron!”
      What links here?
      Vladimir_Nesov's comment on Another attempt to explain UDT by cousin_it (14 Nov 2010 22:55 UTC; 7 points)
      - Wei Dai 19 Aug 2009 21:03 UTC
        2 points
        Parent
        By hard code, I meant code it into the TDT’s probability distribution. (Even TDT isn’t meta enough to say “My prior is wrong!”) But that does make the example less convincing, so let me try something else.
        
        Have Omega’s AIs physically go first and you play for yourself. They get a copy of your source code, then make their moves in the 3-choose-2 PD game first. You learn their move, then make your choice. Now, if you follow CDT, you’ll reason that your decision has no causal effect on the TDT’s decisions, and therefore choose D. The TDTs, knowing this, will play C.
        
        And I think I can still show that if you run TDT, you will decide to self-modify into CDT before starting this game. First, if Omega’s AIs know that you run TDT at the beginning, then they can use that “play D if you self-modify” strategy to deter you from self-modifying. But you can also use “I’ll self-modify anyway” to deter them from doing that. So who wins this game? (If someone moves first logically, then he wins, but what if everyone moves simultaneously in the logical sense, which seems to be the case in this game?)
        
        Suppose it’s common knowledge that Omega mostly chooses CDT agents to participate in this game, then “play D if you self-modify” isn’t very “credible”. That’s because they only see your source code after you self-modify so they’d have to play D if they predict that a TDT agent would self-modify, even if the actual player started with CDT. Given that, your “I’ll self-modify anyway” would be highly credible.
        
        I’m not sure how to formalize this notion of “credibility” among TDTs, but it seems to make intuitive sense.
        Eliezer Yudkowsky 19 Aug 2009 21:37 UTC
        6 points
        Parent
        
        And I think I can still show that if you run TDT, you will decide to self-modify into CDT before starting this game
        
        Well that should never happen. Anything that would make a TDT want to self-modify into CDT should make it just want to play D, no need for self-modification. It should give the same answer at different times, that’s what makes it a timeless decision theory. If you can break that without direct explicit dependence on the algorithm apart from its decisions, then I am in trouble! But it seems to me that I can substitute “play D” for “self-modify” in all cases above.
        
        First, if Omega’s AIs know that you run TDT at the beginning, then they can use that “play D if you self-modify” strategy to deter you from self-modifying.
        
        E.g., “play D if you play D to deter you from playing D” seems like the same idea, the self-modification doesn’t add anything.
        
        So who wins this game? (If someone moves first logically, then he wins, but what if everyone moves simultaneously in the logical sense, which seems to be the case in this game?)
        
        Well… it partially seems to me that, in assuming certain decisions are made without logical consequences—because you move logically first, or because the TDT agents have fixed wrong priors, etc. - you are trying to reduce the game to a Prisoner’s Dilemma in which you have a certain chance of playing against a piece of cardboard with “D” written on it. Even a uniform population of TDTs may go on playing C in this case, of course, if the probability of facing cardboard is low enough. But by the same token, the fact that the cardboard sometimes “wins” does not make it smarter or more rational than the TDT agents.
        
        Now, I want to be very careful about how I use this argument, because indeed a piece of cardboard with “only take box B” written on it, is smarter than CDT agents on Newcomb’s Problem. But who writes that piece of cardboard, rather than a different one?
        
        An authorless piece of cardboard genuinely does go logically first, but at the expense of being a piece of cardboard, which makes it unable to adapt to more complex situations. A true CDT agent goes logically first, but at the expense of losing on Newcomb’s Problem. And your choice to put forth a piece of cardboard marked “D” relies on you expecting the TDT agents to make a certain response, which makes the claim that it’s really just a piece of cardboard and therefore gets to go logically first, somewhat questionable.
        
        Roughly, what I’m trying to reply is that you’re reasoning about the response of the TDT agents to your choosing the CDT algorithm, which makes you TDT, but you’re also trying to force your choice of the CDT algorithm to go logically first, but this is begging the question.
        
        I would, perhaps, go so far as to agree that in an extension of TDT to cases in which certain agents magically get to go logically first, then if those agents are part of a small group uncorrelated with yet observationally indistinguishable from a large group, the small group might make a correlated decision to defect “no matter what” the large group does, knowing that the large group will decide to cooperate anyway given the payoff matrix. But the key assumption here is the ability to go logically first.
        
        It seems to me that the incompleteness of my present theory when it comes to logical ordering is the real key issue here.
        Wei Dai 19 Aug 2009 22:01 UTC
        1 point
        Parent
        
        Well that should never happen. Anything that would make a TDT want to self-modify into CDT should make it just want to play D, no need for self-modification. It should give the same answer at different times, that’s what makes it a timeless decision theory. If you can break that without direct explicit dependence on the algorithm apart from its decisions, then I am in trouble! But it seems to me that I can substitute “play D” for “self-modify” in all cases above.
        
        The reason to self-modify is to make yourself indistinguishable from players who started as CDT agents, so that Omega’s AIs can’t condition their moves on the player’s type. Remember that Omega’s AIs get a copy of your source code.
        
        A true CDT agent goes logically first, but at the expense of losing on Newcomb’s Problem.
        
        But a CDT agent would self-modify into something not losing on Newcomb’s problem if it expects to face that. On the other hand, if TDT doesn’t self-modify into something that wins my game, isn’t that worse? (Is it better to be reflectively consistent, or winning, if you had to choose one?)
        
        It seems to me that the incompleteness of my present theory when it comes to logical ordering is the real key issue here.
        
        Yes, I agree that’s a big piece of the puzzle, but I’m guessing the solution to that won’t fully solve the “stupid winner” problem.
        
        ETA: And for TDT agents that move simultaneously, there remains the problem of “bargaining” to use Nesov’s term. Lots of unsolved problems… I wish you started us working on this stuff earlier!
        Vladimir_Nesov 19 Aug 2009 22:19 UTC
        3 points
        Parent
        
        The reason to self-modify is to make yourself indistinguishable from players who started as CDT agents, so that Omega’s AIs can’t condition their moves on the player’s type.
        
        Being (or performing an action) indistinguishable from X doesn’t protect you from the inference that X probably resulted from such a plot. That you can decide to camouflage like this may even reduce X’s own credibility (and so a lot of platonic/possible agents doing that will make the configuration unattractive). Thus, the agents need to decide among themselves what to look like: first-mover configurations is a limited resource.
        
        (This seems like a step towards solving bargaining.)
        Wei Dai 19 Aug 2009 22:25 UTC
        0 points
        Parent
        Yes, I see that your comment does seem like a step towards solving bargaining among TDT agents. But I’m still trying to argue that if we’re not TDT agents yet, maybe we don’t want to become them. My comment was made in that context.
        Vladimir_Nesov 19 Aug 2009 22:47 UTC
        1 point
        Parent
        Let’s pick up Eliezer’s suggestion and distinguish now-much-less-mysterious TDT from the different idea of “updateless decision theory”, UDT, that describes choice of a whole strategy (function from states of knowledge to actions) rather than choice of actions in each given state of knowledge, of which latter class TDT is an example. TDT isn’t a UDT, and UDT is a rather vacuous statement, as it only achieves reflective consistency pretty much by definition, but doesn’t tell much about the structure of preference and how to choose the strategy.
        
        I don’t want to become a TDT agent, as in UDT sense, TDT agents aren’t reflectively consistent. They could self-modify towards more UDT-ish look, but this is the same argument as with CDT self-modifying into a TDT.
        Eliezer Yudkowsky 19 Aug 2009 22:59 UTC
        0 points
        Parent
        Dai’s version of this is a genuine, reflectively consistent updateless decision theory, though. It makes the correct decision locally, rather than needing to choose a strategy once and for all time from a privileged vantage point.
        
        That’s why I referred to it as “Dai’s decision theory” at first, but both you and Dai seem to think your idea was the important one, so I compromised and referred to it as Nesov-Dai decision theory.
        Expand this thread
        Vladimir_Nesov 19 Aug 2009 23:12 UTC
        2 points
        Parent
        Well, as I see UDT, it also makes decisions locally, with understanding that this local computation is meant to find the best global solution given other such locally computed decisions. That is, each local computation can make a mistake, making the best global solution impossible, which may make it very important for the other local computations to predict (or at least notice) this mistake and find the local decisions that together with this mistake constitute the best remaining global solution, and so on. The structure of states of knowledge produced by the local computations for the adjacent local computations is meant to optimize the algorithm of local decision-making in those states, giving most of the answer explicitly, leaving the local computations to only move the goalpost a little bit.
        
        The nontrivial form of the decision-making comes from the loop that makes local decisions maximize preference given the other local decisions, and those other local decisions do the same. Thus, the local decisions have to coordinate with each other, and they can do that only through the common algorithm and logical dependencies between different states of knowledge.
        
        At which point the fact that these local decisions are part of the same agent seems to become irrelevant, so that a more general problem needs to be solved, one of cooperation of any kinds of agents, or even more generally processes that aren’t exactly “agents”.
        What links here?
        Vladimir_Nesov's comment on Outlawing Anthropics: An Updateless Dilemma by Eliezer Yudkowsky (20 Sep 2009 14:33 UTC; 2 points)
        Wei Dai 19 Aug 2009 23:38 UTC
        4 points
        Parent
        One thing I don’t understand is that both you and Eliezer talk confidently about how agents would make use of logical dependencies/correlations. You guys don’t seem to think this is a really hard problem.
        
        But we don’t even know how to assign a probability (or whether it even makes sense to do so) to a simple mathematical statement like P=NP. How do we calculate and/or represent the correlation between one agent and another agent (except in simple cases like where they’re identical or easily proven to be equivalent)? I’m impressed by how far you’ve managed to push the idea of updatelessness, but it’s hard for me to process what you say, when the basic concept of logical uncertainty is still really fuzzy.
        Eliezer Yudkowsky 19 Aug 2009 23:53 UTC
        3 points
        Parent
        I can argue pretty forcefully that (1) a causal graph in which uncertainty has been factored into uncorrelated sources, must have nodes or some kind of elements corresponding to logical uncertainty; (2) that in presenting Newcomblike problems, the dilemma-presenters are in fact talking of such uncertainties and correlations; (3) that human beings use logical uncertainty all the time in an intuitive sense, to what seems like good effect.
        
        Of course none of that is actually having a good formal theory of logical uncertainty—I just drew a boundary rope around a few simple logical inferences and grafted them onto causal graphs. Two-way implications get represented by the same node, that sort of thing.
        
        I would be drastically interested in a formal theory of logical uncertainty for non-logically-omniscient Bayesians.
        
        Meanwhile—you’re carrying out logical reasoning about whole other civilizations starting from a vague prior over their origins, every time you reason that most superintelligences (if any) that you encounter in faraway galaxies, will have been built in such a way as to maximize a utility function rather than say choosing the first option in alphabetical order, on the likes of true PDs.
        Wei Dai 20 Aug 2009 10:54 UTC
        2 points
        Parent
        I want to try to understand the nature of logical correlations between agents a bit better.
        
        Consider two agents who are both TDT-like but not perfectly correlated. They play a one-shot PD but in turn. First one player moves, then the other sees the move and makes its move.
        
        In normal Bayesian reasoning, once the second player sees the first player’s move, all correlation between them disappears. (Does this happen in your TDT?) But in UDT, the second player doesn’t update, so the correlation is preserved. So far so good.
        
        Now consider what happens if the second player has more computing power than the first, so that it can perfectly simulate the first player and compute its move. After it finishes that computation and knows the first player’s move, the logical correlation between them disappears, because no uncertainty implies no correlation. So, given there’s no logical correlation, it ought to play D. The first player would have expected that, and also played D.
        
        Looking at my formulation of UDT, this may or may not happen, depending on what the “mathematical intuition subroutine” does when computing the logical consequences of a choice. If it tries to be maximally correct, then it would do a full simulation of the opponent when it can, which removes logical correlation, which causes the above outcome. Maybe the second player could get a better outcome if it doesn’t try to be maximally correct, but the way my theory is formulated, what strategy the “mathematical intuition subroutine” uses is not part of what’s being optimized.
        
        So, I’m not sure what to do about this, except to add it to the pile of unsolved problems.
        What links here?
        “UDT2” and “against UD+ASSA” by Wei Dai (12 May 2019 4:18 UTC; 50 points)
        Wei Dai's comment on Ingredients of Timeless Decision Theory by Eliezer Yudkowsky (23 Aug 2009 10:12 UTC; 2 points)
        ESRogs 26 Apr 2014 9:26 UTC
        2 points
        Parent
        Coming to this a bit late :), but I’ve got a basic question (which I think is similar to Nesov’s, but I’m still confused after reading the ensuing exchange). Why would it be that,
        
        The first player would have expected that, and also played D.
        
        If the second player has more computer power (so that the first player cannot simulate it), how can the first player predict what the second player will do? Can the first player reason that since the second player could simulate it, the second player will decide that they’re uncorrelated and play D no matter what?
        
        That dependence on computing power seems very odd, though maybe I’m sneaking in expectations from my (very rough) understanding of UDT.
        Vladimir_Nesov 20 Aug 2009 11:51 UTC
        0 points
        Parent
        
        Now consider what happens if the second player has more computing power than the first, so that it can perfectly simulate the first player and compute its move. After it finishes that computation and knows the first player’s move, the logical correlation between them disappears, because no uncertainty implies no correlation. So, given there’s no logical correlation, it ought to play D. The first player would have expected that, and also played D.
        
        The first player’s move could depend on the second player’s, in which case the second player won’t get the answer is a closed form, the answer must be a function of the second player’s move...
        What links here?
        ESRogs's comment on Ingredients of Timeless Decision Theory by Eliezer Yudkowsky (26 Apr 2014 9:26 UTC; 2 points)
        Wei Dai 20 Aug 2009 11:57 UTC
        0 points
        Parent
        But if the second player has more computational power, it can just keep simulating the first player until the first player runs out of clock cycles and has to output something.
        Vladimir_Nesov 20 Aug 2009 12:23 UTC
        0 points
        Parent
        I don’t understand your reply: exact simulation is brute force that isn’t a good idea. You can prove general statements about the behavior of programs on runs of unlimited or infinite length in finite time. But anyway, why would the second player provoke mutual defection?
        Wei Dai 20 Aug 2009 12:39 UTC
        0 points
        Parent
        
        But anyway, why would the second player provoke mutual defection?
        
        In my formulation, it doesn’t have a choice. Whether or not it does exact simulation of the first player is determined by its “mathematical intuition subroutine”, which I treated as a black box. If that module does an exact simulation, then mutual defection is the result. So this also ties in with my lack of understanding regarding logical uncertainty. If we don’t treat the thing that reasons about logical uncertainty as a black box, what should we do?
        
        ETA: Sometimes exact simulation clearly is appropriate, for example in rock-paper-scissors.
        Vladimir_Nesov 19 Aug 2009 23:49 UTC
        0 points
        Parent
        Conceptually, I treat logical uncertainty as I do prior+utility, a representation of preference, in this more general case over mathematical structures. The problems of representing this preference compactly and extracting human preference don’t hinder these particular explorations.
        Wei Dai 20 Aug 2009 0:02 UTC
        0 points
        Parent
        I don’t understand this yet. Can you explain in more detail what is a general (noncompact) way to representing logical uncertainty?
        What links here?
        Wei Dai's comment on Agree, Retort, or Ignore? A Post From the Future by Wei Dai (25 Nov 2009 0:20 UTC; 12 points)
        Vladimir_Nesov 19 Aug 2009 22:26 UTC
        0 points
        Parent
        If you are a CDT agent, you can’t (or simply won’t) become a normal TDT agent. If you are a human, who knows what that means.
      - MichaelVassar 19 Aug 2009 17:08 UTC
        0 points
        Parent
        After all, for anything you can hard code, the AI can build a new AI that lacks your hard coding and sacrifice its resources to that new AI.
- RickJS 28 Aug 2009 3:38 UTC
  −1 points
  Parent
  Wei_Dai wrote on 19 August 2009 07:08:23AM :
  
  … Omega’s AIs will reason as follows: “I have ¹⁄₂ chance of playing against a TDT, and ¹⁄₂ chance of playing against a CDT. If I play C, then my opponent will play C if it’s a TDT, and D if it’s a CDT …
  
  That seems to violate the secrecy assumptions of the Prisoner’s Dilemma problem! I thought each prisoner has to commit to his action before learning what the other one did. What am I missing?
  
  Thanks!