Manfred comments on Identity and quining in UDT

Manfred 18 Mar 2015 7:30 UTC
11 points
I’m pretty sure we’ve had this anti-Newcomb problem discussion already. Short version: what this is is a constructive proof that any decision algorithm has to lose on some problems, because Omega could diagonalize against any algorithm, and then any agent implementing that algorithm is hosed. Therefore, it is fruitless to demand that a good decision algorithm or good agent never lose on any problems—one needs a more generous criterion of goodness. As for XDT, I don’t see why it shouldn’t get the $1M when playing an anti-Newcomb problem. 1,000,000 is bigger than 1,000, after all.

I’m not totally sure why you need the further hardware in this post either. From what I can tell, the real trouble is that logical implication is not what you want here—we’re still waiting on a reasonable system of logical counterfactuals. 3, 4′, and if I understand it right, 5, are just not addressing that core problem.
- Squark 18 Mar 2015 9:11 UTC
  5 points
  Parent
  Hi Manfred, thx for commenting!
  
  ...what this is is a constructive proof that any decision algorithm has to lose on some problems, because Omega could diagonalize against any algorithm, and then any agent implementing that algorithm is hosed.
  
  See my reply to KnaveOfAllTrades.
  
  As for XDT, I don’t see why it shouldn’t get the $1M when playing an anti-Newcomb problem. 1,000,000 is bigger than 1,000, after all.
  
  Which anti-Newcomb problem? In the XDT anti-Newcomb problem, 1000 is the maximal payoff. No decision theory gets more. In the UDT anti-Newcomb problem, XDT gets 1,000,1000 while UDT remains with 1,000,000.
  
  ...we’re still waiting on a reasonable system of logical counterfactuals. 3, 4′, and if I understand it right, 5, are just not addressing that core problem.
  
  Well, there is more than one remaining problem :) Regarding logical counterfactuals, I think that the correct approach is going to be via complexity theory. Hope to write about it later, in the meanwhile you can check out this. By now I discovered some problems with the formalism I used there (and a possible path to fixing them), but I think the general direction is right.
  - Manfred 18 Mar 2015 19:31 UTC
    2 points
    Parent
    
    As for XDT, I don’t see why it shouldn’t get the $1M when playing an anti-Newcomb problem. 1,000,000 is bigger than 1,000, after all.
    
    Which anti-Newcomb problem? In the XDT anti-Newcomb problem, 1000 is the maximal payoff. No decision theory gets more.
    
    Right, the XDT ANP. Because this is in fact a decision-controlled problem, only from the perspective of an XDT agent. And so they can simply choose to receive $1M on this problem if they know that that’s what they’re facing. $1M being bigger than $1000, I think they should do so.
    
    But you do raise a good point, which is that there might be some way to avoid being beaten by other agents on decision-controlled problems, if you give up on maximizing payoff. It might depend on what metric of success you optimize the the decision procedure for. If you take the view logically upstream of filling the boxes, the maximum is $1.001M, and success is relative to that. If you take the view downstream, you might be satisfied with $1000 because that’s the maximum.
    - Squark 19 Mar 2015 18:07 UTC
      0 points
      Parent
      
      Right, the XDT ANP. Because this is in fact a decision-controlled problem, only from the perspective of an XDT agent.
      
      It is decision-determined from the perspective of any agent. The payoff only depends on the agent’s decision: namely, it’s 1000$ for two-boxing and 0$ for one-boxing.
      
      And so they can simply choose to receive $1M on this problem if they know that that’s what they’re facing. $1M being bigger than $1000, I think they should do so.
      
      Look on the problem from the perspective of the precursor. The precursor knows XDT two-boxes on the problem. There is no way to change this fact. So one box is going to be empty. Therefore building an XDT agent in this situation is no worse than building any other agent.
      - Manfred 19 Mar 2015 20:08 UTC
        1 point
        Parent
        
        It is decision-determined from the perspective of any agent. The payoff only depends on the agent’s decision: namely, it’s 1000$ for two-boxing and 0$ for one-boxing.
        
        Yeah, sorry, I misspoke. The contents of the boxes are controlled by the agent’s decision, only for an XTD agent.
        
        Look on the problem from the perspective of the precursor. The precursor knows XDT two-boxes on the problem. There is no way to change this fact. So one box is going to be empty. Therefore building an XDT agent in this situation is no worse than building any other agent.
        
        I am using XDT here in the sense of “the correct decision algorithm (whatever it is).” An XDT agent, if faced with the XDT-anti-Newcomb-problem, can, based on its decision, either get $1M, or $1k. If it takes the $1M, it loses in the sense that it does worse on this problem than a CDT agent. If it takes the $1k, it loses in the sense that it just took $1k over $1M :P
        
        And because of XDT’s decision controlling the contents of the box, when you say “the payoff is $1000 for two-boxing and $0 for one-boxing,” you’re begging the question about what you think the correct decision algorithm should do.
        Squark 22 Mar 2015 19:47 UTC
        0 points
        Parent
        
        And because of XDT’s decision controlling the contents of the box, when you say “the payoff is $1000 for two-boxing and $0 for one-boxing,” you’re begging the question about what you think the correct decision algorithm should do.
        
        The problem is in the definition of “correct”. From my point of view, “correct” decision algorithm means the algorithm that a rational precursor should build. That is, it is the algorithm instantiating which by the precursor will yield at least as much payoff as instantiating any other algorithm.
        Manfred 22 Mar 2015 23:04 UTC
        0 points
        Parent
        Well, I agree with you there :P But I think you’re cashing this out as the fixed point of a process, rather than as the maximization I am cashing it out as.