Wei Dai comments on UDT agents as deontologists

Wei Dai 13 Jun 2010 17:47 UTC
0 points
0
Do you think your idea is applicable to multi-player games, which is ultimately what we’re after? (I don’t see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:

In http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I gave an example of a coordination game for two identical agents with the same (non-indexical) preferences and different inputs. The two agents had to choose different outputs in order to maximize their preferences, and I tried to explain why it seemed to me that they couldn’t do this by a logical correlation type reasoning alone.

A harder version of this problem involves two agents with different preferences, but are otherwise identical. For simplicity let’s assume they both care only about what happens in one particular world program (and therefore have no uncertainty about each other’s source code). This may not be the right way to frame the question, which is part of my confusion. But anyway, let the choices be C and D, and consider this payoff matrix (and suppose randomized strategies are not possible):
```
0,0  4,5
5,4  0,0
```
Here’s the standard PD matrix for comparison:
```
3,3  0,5
5,0  1,1
```
Nesov’s intuitions at http://lesswrong.com/lw/1vv/the_blackmail_equation/1qk9 make sense to me in this context. It seems that if these two agents are to achieve the 4,5 or 5,4 outcome, it has to be through some sort of “jumbles of wires” consideration, since there is no “principled” way to decide between the two, as far as I can tell. But what is that reasoning exactly? Does anyone understand acausal game theory (is this a good name?) well enough to walk me through how these two agents might arrive at one of the intuitively correct answers (and also show that the same type of reasoning gives a intuitively correct answer for PD)?

If my way of framing the question is not a good one, I’d like to see any kind of worked-out example in this vein.
- Vladimir_Nesov 13 Jun 2010 18:24 UTC
  2 points
  0
  Parent
  It’s tempting to take a step back and consider the coordination game from the point of view of the agent before-observation, as it gives a nice equivalence between the copies, control over the consequences for both copies from a common source. This comes with a simple algorithm, an actual explanation. But as I suspect you intended to communicate in this comment, this is not very interesting, because it’s not a general case: in two-player games the other player is not your copy, and wasn’t one any time previous. But if we try to consider the actions of agent after-observation, of the two copies diverged, there seems to be no nice solution anymore.
  
  It’s clear how the agent before-observations controls the copies after, and so how its decisions about the strategy of reacting to future observations control both copies, coordinate them. It’s far from clear how a copy that received one observation can control a copy that received the other observation. Parts control the whole, but not conversely. Yet the coordination problem could be posed about two agents that have nothing in common, and we’d expect there to be a solution to that as well. Thus I expect the coordination problem with two copies to have a local solution, apart from the solution of deciding in advance, as you describe in the post.
  
  My comment to which you linked is clearly flawed in at least one respect: it assumes that to control a structure B with agent A, B has to be defined in terms of A. This is still an explicit control mindset, what I call acausal control, but it’s wrong, not as general as ambient control, where you are allowed to discover new dependencies, or UDT, where the discovery of new dependencies is implicit in mathematical intuition.
  
  It’ll take much better understanding of theories of consequences, the process of their exploration, preference defined over them, to give specific examples, and I don’t expect these examples to be transparent (but maybe there is a simple proof that the decisions will be correct, that doesn’t point out the specific details of the decision-making process).
- Tyrrell_McAllister 13 Jun 2010 21:13 UTC
  0 points
  0
  Parent
  
  Do you think your idea is applicable to multi-player games, which is ultimately what we’re after? (I don’t see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:
  
  In http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I gave an example of a coordination game for two identical agents with the same (non-indexical) preferences and different inputs. The two agents had to choose different outputs in order to maximize their preferences, and I tried to explain why it seemed to me that they couldn’t do this by a logical correlation type reasoning alone.
  
  I think that there may have been a communication failure here. The comment that you’re replying to is specifically about that exact game, the one in your post Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). The communication failure is my fault, because I had assumed that you had been following along with the conversation.
  
  Here is the relevant context:
  
  In this comment, I re-posed your game from the “explicit optimization” post in the notation of my write-up of UDT. In that comment, I gave an example of a mathematical intuition such that a UDT1 agent with that mathematical intuition would win the game.
  
  In reply, Vladimir pointed out that the real problem is not to show that there exists a winning mathematical intuition. Rather, the problem is to give a general formal decision procedure that picks out a winning mathematical intuition. Cooking up a mathematical intuition that “proves” what I already believe to be the correct conclusion is “cheating”.
  
  The purpose of the comment that you’re replying to was to answer Vladimir’s criticism. I show that, for this particular game (the one in your “explicit optimization” post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria. The point is that these “reasonable criteria” do not involve any assumption about what the agent should do in the game.
  - Wei Dai 13 Jun 2010 21:25 UTC
    0 points
    0
    Parent
    Actually, I had been following your discussion with Nesov, but I’m not sure if your comment adequately answered his objection. So rather than commenting on that, I wanted to ask whether your approach of using “reasonable criteria” to narrow down mathematical intuitions can be generalized to deal with the harder problem of multi-player games. (If it can’t, then perhaps the discussion is moot.)
    - Tyrrell_McAllister 14 Jun 2010 1:30 UTC
      0 points
      0
      Parent
      I see. I misunderstood the grandparent to be saying that your “explicit optimization” LW post had originally appeared on another mailing list, and I thought that you were directing me to it to see what I had to say about the game there. I was confused because this whole conversation already centered around that very game :).
  - Vladimir_Nesov 13 Jun 2010 21:25 UTC
    0 points
    0
    Parent
    
    I show that, for this particular game (the one in your “explicit optimization” post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria.
    
    (1) Which one of them will actually be given? (2) If there is no sense in which some of these “reasonable” conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?
    
    I don’t find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can’t unambiguously decide which of the conclusions is the right one, and so can’t the agent.
    - Tyrrell_McAllister 13 Jun 2010 22:51 UTC
      0 points
      0
      Parent
      
      (1) Which one of them will actually be given?
      
      I claim to be giving, at best, a subset of “reasonable criteria” for mathematical intuition functions. Any UDT1-builder who uses a superset of these criteria, and who has enough decision criteria to decide which UDT1 agent to write, will write an agent who wins Wei’s game. In this case, it would suffice to have the criteria I mentioned plus a lexicographic tie-breaker (as in UDT1.1). I’m not optimistic that that will hold in general.
      
      (I also wouldn’t be surprised to see an example showing that my “counterfactual accuracy” condition, as stated, rules out all winning UDT1 algorithms in some other game. I find it pretty unlikely that it suffices to deal with mathematical counterfactuals in such a simple way, even given the binary certainty and accuracy conditions.)
      
      My point was only that the criteria above already suffice to narrow the field of options for the builder down to winning options. Hence, whatever superset of these criteria the builder uses, this superset doesn’t need to include any knowledge about which possible UDT1 agent would win.
      
      (2) If there is no sense in which some of these “reasonable” conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?
      
      I don’t follow. Are you suggesting that I could just as reasonably have made it a condition of any acceptable mathematical intuition function that M(1, A, E) = 0.5 ?
      
      I don’t find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can’t unambiguously decide which of the conclusions is the right one, and so can’t the agent.
      
      If I (the builder/writer) really couldn’t decide which mathematical intuition function to use, then the agent won’t come to exist in the first place. If I can’t choose among the two options that remain after I apply the described criteria, then I will be frozen in indecision, and no agent will get built or written. I take it that this is your point.
      
      But if I do have enough additional criteria to decide (which in this case could be just a lexicographic tie-breaker), then I don’t see what is unreasonable about the “certainty of conclusions” assumption for this game.
      - Vladimir_Nesov 13 Jun 2010 23:01 UTC
        0 points
        0
        Parent
        
        If I (the builder/writer) really couldn’t decide which mathematical intuition function to use, then the agent won’t come to exist in the first place.
        
        You don’t pick the output of mathematical intuition in a particular case, mathematical intuition is a general algorithm that works based on world programs, outcomes, and your proposed decisions. It’s computationally intensive, its results are not specified in advance based on intuition, on the contrary the algorithm is what stands for intuition. With more resources, this algorithm will produce different probabilities, as it comes to understand the problem better. And you just pick the algorithm. What you can say about its outcome is a matter of understanding the requirements for such general algorithm, and predicting what it must therefore compute. Absolute certainty of the algorithm, for example, would imply that the algorithm managed to logically infer that the outcome would be so and so, and I don’t see how it’s possible to do that, given the problem statement. If it’s unclear how to infer what will happen, then mathematical intuition should be uncertain (but it can know something to tilt the balance one way a little bit, perhaps enough to decide the coordination problem!)
        Tyrrell_McAllister 14 Jun 2010 0:31 UTC
        0 points
        0
        Parent
        Okay, I understand you to be saying this:
        
        There is a single ideal mathematical intuition, which, given a particular amount of resources, and a particular game, determines a unique function M mapping {inputs} x {outputs} x {execution histories} --> [0,1] for a UDT1 agent in that game. This ideal mathematical intuition (IMI) is defined by the very nature of logical or mathematical inference under computational limitation. So, in particular, it’s not something that you can talk about choosing using some arbitrary tie-breaker like lexicographic order.
        
        Now, maybe the IMI requires that the function M be binary in some particular game with some particular amount of resources. Or maybe the IMI requires a non-binary function M for all amounts of computational resources in that game. Unless you can explain exactly why the IMI requires a binary function M for this particular game, you haven’t really made progress on the kinds of questions that we’re interested in.
        
        Is that right?
        Vladimir_Nesov 14 Jun 2010 9:26 UTC
        0 points
        0
        Parent
        
        Is that right?
        
        More or less. Of course there is no point in going for a “single” mathematical intuition, but the criteria for choosing one shouldn’t be specific to a particular game. Mathematical intuition primarily works with the world program, trying to estimate how plausible it is that this world program will be equivalent to a given history definition, under the condition that the agent produces given output.