drnickbone comments on Problematic Problems for TDT

drnickbone 23 May 2012 15:34 UTC
4 points
I also had thoughts along these lines—variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren’t linked) then they won’t co-operate with each other in Prisoner’s Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T’ say, there will be a specific problematic problem on which T’ will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T’ being the exact algorithm running in the sim. So we still don’t get the—desirable—property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I’ve heard so far is that we try to adjust the definition of “fairness”, so that these problematic problems also count as “unfair”. I’m open to proposals on that one...
What links here?
- drnickbone's comment on Problematic Problems for TDT by drnickbone (23 May 2012 19:06 UTC; 4 points)
- AlexMennen 4 Jun 2012 23:39 UTC
  0 points
  Parent
  
  But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren’t linked) then they won’t co-operate with each other in Prisoner’s Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.
  
  I think this is avoidable. Let’s say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice’s source code contains a comment identifying it as Alice, whereas Bob’s source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.
  
  However, if Alice and Bob play the prisoner’s dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the “Alice” comment replaced with “Bob”, and Bob faces a player identical to itself except with the “Bob” comment replaced with “Alice”. Hopefully, their algorithm would compress this information down to “The other player is identical to me, but has a comment difference in its source code”, at which point each player would be in an identical situation.
  - drnickbone 9 Jun 2012 11:24 UTC
    1 point
    Parent
    You might want to look at my follow-up article which discusses a strategy like this (among others). It’s worth noting that slight variations of the problem remove the opportunity for such “sneaky” strategies.
    - AlexMennen 9 Jun 2012 20:46 UTC
      0 points
      Parent
      Ah, thanks. I had missed that, somehow.
  - kybernetikos 6 Jun 2012 12:12 UTC
    0 points
    Parent
    In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn’t affect Alices outcome. That’s why it’s OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.
  - MugaSofer 25 Dec 2012 16:13 UTC
    −1 points
    Parent
    
    However, if Alice and Bob play the prisoner’s dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the “Alice” comment replaced with “Bob”, and Bob faces a player identical to itself except with the “Bob” comment replaced with “Alice”. Hopefully, their algorithm would compress this information down to “The other player is identical to me, but has a comment difference in its source code”, at which point each player would be in an identical situation.
    
    Why doesn’t that happen when dealing with Omega?
    - AlexMennen 25 Dec 2012 20:01 UTC
      0 points
      Parent
      Because if Omega uses Alice’s source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.
      - MugaSofer 25 Dec 2012 22:21 UTC
        −2 points
        Parent
        So why doesn’t that happen in the prisoner’s dilemma?
        AlexMennen 25 Dec 2012 22:47 UTC
        0 points
        Parent
        Because Alice sees that Bob’s source code is the same as hers except for a comment difference, and Bob sees that Alice’s source code is the same as his except for a comment difference, so the situation is symmetric.
        MugaSofer 26 Dec 2012 1:32 UTC
        −2 points
        Parent
        Newcomb:
        
        Bob sees that there is a comment difference, so the situation is not symmetric.
        
        Prisoner’s Dilemma:
        
        Bob sees that Alice’s source code is the same as his except for a comment difference, so the situation is symmetric.
        
        Do you see the contradiction here?
        AlexMennen 26 Dec 2012 1:59 UTC
        2 points
        Parent
        Newcomb, Alice: The simulation’s source code and available information is literally exactly the same as Alice’s, so if Alice 2-boxes, the simulation will too. There’s no way around it. So Alice one-boxes.
        
        Newcomb, Bob: The simulation was in the situation described above. Bob thus predicts that it will one-box. Bob himself is in an entirely different situation, since he can see a source code difference, so if he two-boxes, it does not logically imply that the simulation will two-box. So Bob two-boxes and the simulation one-boxes.
        
        Prisoner’s Dilemma: Alice sees Bob’s source code, and summarizes it as “identical to me except for a different comment”. Bob sees Alice’s source code, and summarizes it as “identical to me except for a different comment”. Both Alice and Bob run the same algorithm, and they now have the same input, so they must produce the same result. They figure this out, and cooperate.
        MugaSofer 26 Dec 2012 2:15 UTC
        −2 points
        Parent
        Ignore Alice’s perspective for a second. Why is Bob acting differently? He’s seeing the same code both times.
        AlexMennen 26 Dec 2012 2:21 UTC
        0 points
        Parent
        Don’t ignore Alice’s perspective. Bob knows what Alice’s perspective is, so since there is a difference in Alice’s perspective, there is by extension a difference in Bob’s perspective.
        MugaSofer 26 Dec 2012 2:25 UTC
        −2 points
        Parent
        Bob looks at the same code both times. In the PD, he treats it as identical to his own. In NP, he treats it as different. Why?
        Expand this thread
        AlexMennen 26 Dec 2012 2:31 UTC
        0 points
        Parent
        The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.
        
        NP: Bob is looking at Alice, who is looking at Alice, who is looking at Alice, …
        
        PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, …
        
        Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the input to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.
        MugaSofer 26 Dec 2012 2:38 UTC
        −2 points
        Parent
        
        PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, …
        
        But you said Bob concludes that their decision theories are functionally identical, and thus it reduces to:
        
        PD: TDT is looking at TDT, who is looking at TDT, who is looking at TDT, …
        
        And yet this does not occur in NP.
        
        EDIT:
        
        The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.
        
        The point is that his judgement of the source code changes, from “some other agent” to “another TDT agent”.
        AlexMennen 26 Dec 2012 2:42 UTC
        0 points
        Parent
        Looks like my edit was poorly timed.
        
        Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the input to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.
        
        One way of describing it is that the comment is extra information that is distinct from the decision agent, and that Bob can make use of this information when making his decision.
        MugaSofer 26 Dec 2012 3:01 UTC
        −2 points
        Parent
        Oops, didn’t see that.
        
        What’s the point of adding comments if Bob’s just going to conclude their code is functionally identical anyway? Doesn’t that mean that you might as well use the same code for Bob and Alice, and call it TDT?
        AlexMennen 26 Dec 2012 4:02 UTC
        0 points
        Parent
        In NP, the comments are to provide Bob an excuse to two-box that does not result in the simulation two-boxing. In PD, the comments are there to illustrate that TDT needs a sophisticated algorithm for identifying copies of itself that can recognize different implementations of the same algorithm.
        
        Do you understand why Bob acts differently in the two situations, now?
        MugaSofer 26 Dec 2012 15:34 UTC
        −2 points
        Parent
        
        In NP, the comments are to provide Bob an excuse to two-box that does not result in the simulation two-boxing.
        
        I was assuming Bob was an AI, lacking a ghost to look over his code for reasonableness. If he’s not, then he isn’t strictly implementing TDT, is he?
        AlexMennen 26 Dec 2012 18:09 UTC
        0 points
        Parent
        Bob is an AI. He’s programmed to look for similarities between other AIs and himself so that he can treat their action and his as logically linked when it is to his advantage to do so. I was arguing that a proper implementation of TDT should consider Bob’s and Alice’s decisions linked in PD and nonlinked in the NP variant. I don’t really understand your objection.
        MugaSofer 27 Dec 2012 2:09 UTC
        −2 points
        Parent
        My objection is that an AI looking at the same question—is Alice functionally identical to me—can’t look for excuses why they’re not really the same when this would be useful, if they actually behave the same way. His answer should be the same in both cases, because they are either functionally identical or not.
        AlexMennen 27 Dec 2012 2:20 UTC
        1 point
        Parent
        The proper question is “In the context of the problems each of us face, is there a logical connection between my actions and Alice’s actions?”, not “Is Alice functionally identical to me?”
        MugaSofer 27 Dec 2012 2:53 UTC
        −2 points
        Parent
        I think those terms both mean the same thing.
        
        For reference, by “functionally identical” I meant “likely to choose the same way I do”. Thus, an agent that will abandon the test to eat beans is functionally identical when beans are unavailable.
        AlexMennen 27 Dec 2012 3:06 UTC
        0 points
        Parent
        I guess my previous response was unhelpful. Although “Is Alice functionally identical to me?” is not the question of primary concern, it is a relevant question. But another relevant question is “Is Alice facing the same problem that I am?” Two functionally identical agents facing different problems may make different choices.
        
        In the architecture I’ve been envisioning, Alice and Bob can classify other agents as “identical to me in both algorithm and implementation” or “identical to me in algorithm, with differing implementation”, or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.
        
        In the prisoners’ dilemma, each agent is facing the same problem, that is, “I’m playing a prisoner’s dilemma with another agent that is identical to me in algorithm but differing in implementation”. So they treat their decisions as linked.
        
        In the Newcomb’s problem variant, Alice’s problem is “I’m in Newcomb’s problem, and the predictor used a simulation that is identical to me in both algorithm and implementation, and which faced the same problem that I am facing.” Bob’s problem is “I’m in Newcomb’s problem, and the predictor used a simulation that is identical to me in algorithm but differing in implementation, and which faced the same situation as Alice.” There was a difference in the two problem descriptions even before the part about what problem the simulation faced, so when Bob notes that the simulation faced the same problem as Alice, he finds a difference between the problem that the simulation faced and the problem that he faces.
        MugaSofer 27 Dec 2012 3:43 UTC
        −2 points
        Parent
        
        For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”).
        
        Then why are we talking about “Bob” and “Alice” when they’re both just TDT agents?
        AlexMennen 27 Dec 2012 3:53 UTC
        0 points
        Parent
        Because if Bob does not ignore the implementation difference, he ends up with more money in the Newcomb’s problem variant.
        MugaSofer 27 Dec 2012 4:04 UTC
        −2 points
        Parent
        But there is no difference between “Bob looking at Alice looking at Bob” and “Alice looking at Alice looking at Alice”. That’s the whole point of TDT.
        AlexMennen 27 Dec 2012 4:37 UTC
        0 points
        Parent
        There is a difference. In the first one, the agents have a slight difference in their source code. In the second one, the source code of the two agents is identical.
        
        If you’re claiming that TDT does not pay attention to such differences, then we only have a definitional dispute, and by your definition, an agent programmed the way I described would not be TDT. But I can’t think of anything about the standard descriptions of TDT that would indicate such a restriction. It is certainly not the “whole point” of TDT.
        
        For now, I’m going to call the thing you’re telling me TDT is “TDT1”, and I’m going to call the agent architecture I was describing “TDT2″. I’m not sure if this is good terminology, so let me know if you’d rather call them something else.
        
        Anyway, consider the four programs Alice1, Bob1, Alice2, and Bob2. Alice1 and Bob1 are implementations of TDT1, and are identical except for having a different identifier in the comments (and this difference changes nothing). Alice2 and Bob2 are implementations of TDT2, and are identical except for having a different identifier in the comments.
        
        Consider the Newcomb’s problem variant with the first pair of agents (Alice1 and Bob1). Alice1 is facing the standard Newcomb’s problem, so she one-boxes and gets $1,000,000. As far as Bob1 can tell, he also faces the standard Newcomb’s problem (there is a difference, but he ignores it), so he one-boxes and gets $1,000,000.
        
        Now consider the same problem, but with all instances of Alice1 replaced with Alice2, and all instances of Bob1 replaced with Bob2. Alice2 still faces the standard Newcomb’s problem, so she one-boxes and gets $1,000,000. But Bob2 two-boxes and gets $1,001,000.
        
        The problem seems pretty fair; it doesn’t specifically reference either TDT1 or TDT2 in an attempt to discriminate. However, when we replace the TDT1 agents with TDT2 agents, one of them does better and neither of them does worse, which seems to indicate a pretty serious deficiency in TDT1.
        MugaSofer 27 Dec 2012 16:33 UTC
        −2 points
        Parent
        Either TDT decides if something is identical based on it’s actions, in which case I am right, or it’s source code, in which case you are wrong, because such an agent would not cooperate in the Prisoner’s Dilemma.
        AlexMennen 27 Dec 2012 18:50 UTC
        2 points
        Parent
        They decide using the source code. I already explained why this results in them cooperating in the Prisoner’s Dilemma.
        
        In the architecture I’ve been envisioning, Alice and Bob can classify other agents as “identical to me in both algorithm and implementation” or “identical to me in algorithm, with differing implementation”, or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.
        
        In the prisoners’ dilemma, each agent is facing the same problem, that is, “I’m playing a prisoner’s dilemma with another agent that is identical to me in algorithm but differing in implementation”. So they treat their decisions as linked.
        
        MugaSofer 28 Dec 2012 18:16 UTC
        0 points
        Parent
        Wait! I think I get it! In a Prisoner’s Dilemma, both agents are facing another agent, whereas in Newcomb’s Problem, Alice is facing an infinite chain of herself, whereas Bob is facing an infinite chain of someone else. It’s like the “favorite number” example in the followup post.
        AlexMennen 28 Dec 2012 23:29 UTC
        0 points
        Parent
        Yes.
        MugaSofer 29 Dec 2012 15:41 UTC
        −2 points
        Parent
        Well that took embarrassingly long.
- jimrandomh 23 May 2012 20:14 UTC
  0 points
  Parent
  The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime’s output and TDT-prime’s decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.
  - dlthomas 23 May 2012 20:25 UTC
    1 point
    Parent
    But doesn’t that make cliquebots, in general?
  - drnickbone 24 May 2012 12:08 UTC
    0 points
    Parent
    I’m thinking hard about this one…
    
    Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don’t know yet without detailed analysis.
    
    Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.
    
    I’m aiming for a follow-up article addressing this strategy (among others).
    - khafra 24 May 2012 17:57 UTC
      0 points
      Parent
      
      Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection?
      
      This sounds equivalent to asking “can a turing machine generate non-deterministically random numbers?” Unless you’re thinking about coding TDT agents one at a time and setting some constant differently in each one.
- APMason 23 May 2012 16:22 UTC
  0 points
  Parent
  Well, I’ve had a think about it, and I’ve concluded that it would matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega’s simulation, and therefore will not be able to take advantage of “logical separation”.
  
  But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to “separate” them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It’s as if you need the ability to prove that two agents necessarily give the same output for the particular problem you’re faced with, without proving what output those agents actually give, and that sure looks crazy-hard.
  
  EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.
  
  EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process—i.e. Omega can predict whether it’s going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).
  
  EDIT 3: Sorry to go on like this, but I’ve just realised that won’t work in situations where some other agent bases their decision on whether you’re predicting what their decision will be, i.e. Prisoner’s Dilemma.