Daniel Kokotajlo comments on Extracting Money from Causal Decision Theorists

Daniel Kokotajlo Jan 28, 2021, 9:24 PM
9 points
0
If standard game theory has nothing to say about what to do in situations where you don’t have access to an unpredictable randomization mechanism, so much the worse for standard game theory, I say!
- Measure Jan 28, 2021, 10:25 PM
  3 points
  Parent
  I thought the ability to deploy mixed strategies was a pretty standard part of CDT. Is this not the case, or are you considering a non-standard formulation of CDT?
  - Caspar Oesterheld Jan 29, 2021, 6:18 PM
    5 points
    0
    Parent
    I think some people may have their pet theories which they call CDT and which require randomization. But CDT as it is usually/traditionally described doesn’t ever insist on randomizing (unless randomizing has a positive causal effect). In this particular case, even if a randomization device were made available, CDT would either uniquely favor one of the boxes or be indifferent between all distributions over ${buy box B_{1}, buy box B_{2}}$ . Compare Section IV.1 of the paper.
    
    What you’re referring to are probably so-called ratificationist variants of CDT. These would indeed require randomizing 50-50 between the two boxes. But one can easily construct scenarios which trip these theories up. For example, the seller could put no money in any box if she predicts that the buyer will randomize. Then no distribution is ratifiable. See Section IV.4 for a discussion of Ratificationism.
    - cousin_it Jan 30, 2021, 11:44 AM
      7 points
      Parent
      
      For example, the seller could put no money in any box if she predicts that the buyer will randomize.
      
      This is a bit unsatisfying, because in my view of decision theory you don’t get to predict things like “the agent will randomize” or “the agent will take one box but feel a little wistful about it” and so on. This is unfair, in the same way as predicting that “the agent will use UDT” and punishing for it is unfair. No, you just predict the agent’s output. Or if the agent can randomize, you can sample (as many times as you like, but finitely many) from the distribution of the agent’s output. A bit more on this here, though the post got little attention.
      
      Can your argument be extended to this case?
      - Daniel Kokotajlo Jan 30, 2021, 11:54 AM
        8 points
        0
        Parent
        in my view of decision theory you don’t get to predict things like “the agent will randomize”
        Why not? You surely agree that sometimes people can in fact predict such things. So your objection must be that it’s unfair when they do and that it’s not a strike against a decision theory if it causes you to get money-pumped in those situations. Well… why? Seems pretty bad to me, especially since some extremely high-stakes real-world situations our AIs might face will be of this type.
        Ben Dec 6, 2024, 7:36 PM
        4 points
        0
        Parent
        I see where you are coming from. But, I think the reason we are interested in CDT (for any DT) in the first place is because we want to know which one works best. However, if we allow the outcomes to be judged not just on the decision we make, but also on the process used to reach that decision then I don’t think we can learn anything useful.
        Or, to put it from a different angle, IF the process P is used to reach decision X, but my “score” depends not just on X but also P then that can be mapped to a different problem where my decision is “P and X”, and I use some other process (P’) to decide which P to use.
        For example, if a student on a maths paper is told they will be marked not just on the answer they give, but the working out they write on the paper—with points deducted for crossings outs or mistakes—we could easily imagine the student using other sheets of paper (or the inside of their head) to first work out the working they are going to show and the answer that goes with it. Here the decision problem “output” is the entire exame paper, not just the answer.
        Daniel Kokotajlo Dec 6, 2024, 7:48 PM
        3 points
        0
        Parent
        I don’t think I understand this yet, or maybe I don’t see how it’s a strong enough reason to reject my claims, e.g. my claim “If standard game theory has nothing to say about what to do in situations where you don’t have access to an unpredictable randomization mechanism, so much the worse for standard game theory, I say!”
        Ben Dec 7, 2024, 11:07 AM
        5 points
        0
        Parent
        I think we might be talking past each other. I will try and clarify what I meant.
        Firstly, I fully agree with you that standard game theory should give you access to randomization mechanisms. I was just saying that I think that hypotheticals where you are judged on the process you use to decide, and not on your final decision are a bad way of working out which processes are good, because the hypothetical can just declare any process to be the one it rewards by fiat.
        Related to the randomization mechanisms, in the kinds of problems people worry about with predictors guessing your actions in advance its very important to distinguish between [1] (pseudo-)randomization processes that the predictor can predict, and [2] ones that it cannot.
        [1] Randomisation that can be predicted by the predictor is (I think) a completely uncontroversial resource to give agents in these problems. In this case we don’t need to make predictions like “the agent will randomise”, because we can instead make the stronger prediction “the agent will randomize, and the seed of their RNG is this, so they will take one box” which is just a longer way of saying “they will one box”. We don’t need the predictor to show its working by mentioning the RNG intermediate step.
        [2] Randomisation that is beyond the predictor’s power is (I think) not the kind of thing that can sensibly be included in these thought experiments. We cannot simultaneously assume that the predictor is pretty good at predicting our actions and useless at predicting a random number generator we might use to choose our actions. The premises: “Alice has a perfect quantum random number generator that is completely beyond the power of Omega to predict. Alice uses this machine to make decisions. Omega can predict Alice’s decisions with 99% accuracy” are incoherent.
        So I don’t see how randomization helps. The first kind, [1] doesn’t change anything, and the second kind [2], seems like it cannot be consistently combined with the premise of the question. Perfect predictors and perfect random number generators cannot exist in the same universe.
        Their might be interesting nearby problems where you imagine the predictor is 100% effective at determining the agents algorithm, but because the agent has access to a perfect random number generator that it cannot predict their actions. Maybe this is what you meant? In this kind of situation I am still much happier with rules like “It will fill the box with gold if it knows their is a <50% chance of you picking it”, [the closest we can get to “outcomes not processes” in probabilistic land], (or perhaps the alternative “the probability that it fills the box with gold is one-minus the probability with which it predicts the agent will pick the box”.). But rules like “It will fill the box with gold if the agents process uses either randomisation or causal decision theory” seem unhelpful to me.
        cousin_it Jan 30, 2021, 1:19 PM
        4 points
        Parent
        Sure, and sometimes people can predict things like “the agent will use UDT” and use that to punish the agent. But this kind of prediction is “unfair” because it doesn’t lead to an interesting decision theory—you can punish any decision theory that way. So to me the boundaries of “fair” and “unfair” are also partly about mathematical taste and promising-ness, not just what will lead to a better tank and such.
        Daniel Kokotajlo Jan 30, 2021, 6:10 PM
        3 points
        Parent
        Right, that kind of prediction is unfair because it doesn’t lead to an interesting decision theory… but I asked why you don’t get to predict things like “the agent will randomize.” All sorts of interesting decision theory comes out of considering situations where you do get to predict such things. (Besides, such situations are important in real life.)
        Dagon Jan 30, 2021, 6:17 PM
        2 points
        Parent
        I might suggest “not interesting” rather than “not fair” as the complaint. One can image an Omega that leaves the box empty if the player is unpredictable, or if the player doesn’t rigorously follow CDT, or just always leaves it empty regardless. But there’s no intuition pump that it drives, and no analysis of why a formalization would or wouldn’t get the right answer.
        When I’m in challenge-the-hypothetical mode, I defend CDT by making the agent believe Omega cheats. It’s a trick box that changes contents AFTER the agent chooses, BEFORE the contents are revealed. This is much higher probability to any rational agent than mind-reading or extreme predictability.
      - Caspar Oesterheld Feb 2, 2021, 4:42 PM
        1 point
        Parent
        On the more philosophical points. My position is perhaps similar to Daniel K’s. But anyway...
        Of course, I agree that problems that punish the agent for using a particular theory (or using float multiplication or feeling a little wistful or stuff like that) are “unfair”/”don’t lead to interesting theory”. (Perhaps more precisely, I don’t think our theory needs to give algorithms that perform optimally in such problems in the way I want my theory to “perform optimally” Newcomb’s problem. Maybe we should still expect our theory to say something about them, in the way that causal decision theorists feel like CDT has interesting/important/correct things to say about Newcomb’s problem, despite Newcomb’s problem being designed to (unfairly, as they allege) reward non-CDT agents.)
        But I don’t think these are particularly similar to problems with predictions of the agent’s distribution over actions. The distribution over actions is behavioral, whereas performing floating point operations or whatever is not. When randomization is allowed, the subject of your choice is which distribution over actions you play. So to me, which distribution over actions you choose in a problem with randomization allowed, is just like the question of which action you take when randomization is not allowed. (Of course, if you randomize to determine which action’s expected utility to calculate first, but this doesn’t affect what you do in the end, then I’m fine with not allowing this to affect your utility, because it isn’t behavioral.)
        I also don’t think this leads to uninteresting decision theory. But I don’t know how to argue for this here, other than by saying that CDT, EDT, UDT, etc. don’t really care whether they choose from/rank a set of distributions or a set of three discrete actions. I think ratificationism-type concepts are the only ones that break when allowing discontinuous dependence on the chosen distribution and I don’t find these very plausible anyway.
        To be honest, I don’t understand the arguments against predicting distributions and predicting actions that you give in that post. I’ll write a comment on this to that post.
      - Caspar Oesterheld Feb 2, 2021, 4:30 PM
        1 point
        Parent
        Let’s start with the technical question:
        >Can your argument be extended to this case?
        No, I don’t think so. Take the class of problems. The agent can pick any distribution over actions. The final payoff is determined only as a function of the implemented action and some finite number of samples generated by Omega from that distribution. Note that the expectation is continuous in the distribution chosen. It can therefore be shown (using e.g. Kakutani’s fixed-point theorem) that there is always at least one ratifiable distribution. See Theorem 3 at https://users.cs.duke.edu/~ocaspar/NDPRL.pdf .
        (Note that the above is assuming the agent maximizes expected vNM utility. If, e.g., the agent maximizes some lexical utility function, then the predictor can just take, say, two samples and if they differ use a punishment that is of a higher lexicality than the other rewards in the problem.)
        cousin_it Feb 3, 2021, 10:38 AM
        2 points
        Parent
        Thanks! That’s what I wanted to know. Will reply to the philosophical stuff in the comments to the other post.
- River Jan 28, 2021, 9:38 PM
  3 points
  Parent
  How often do you encounter a situation where an unpredictable randomization mechanism is unavailable?
  - Caspar Oesterheld Jan 29, 2021, 6:36 PM
    4 points
    0
    Parent
    I agree with both of Daniel Kokotajlo’s points (both of which we also make in the paper in Sections IV.1 and IV.2): Certainly for humans it’s normal to not be able to randomize; and even if it was a primarily hypothetical situation without any obvious practical application, I’d still be interested in knowing how to deal with the absence of the ability to randomize.
    
    Besides, as noted in my other comment insisting on the ability to randomize doesn’t get you that far (cf. Sections IV.1 and IV.4 on Ratificationism): even if you always have access to some nuclear decay noise channel, your choice of whether to consult that channel (or of whether to factor the noise into your decision) is still deterministic. So you can set up scenarios where if you are punished for randomizing. In the particular case of the Adversarial Offer, the seller might remove all money from both boxes if she predicts the buyer to randomize.
    
    The reason why our main scenario just assumes that randomization isn’t possible is that our target of attack in this paper is primarily CDT, which is fine with not being allowed to randomize.
  - Daniel Kokotajlo Jan 29, 2021, 6:59 AM
    4 points
    0
    Parent
    Every day. But even if it was only something that happened in weird hypotheticals, my point would still stand.
    - River Jan 30, 2021, 4:18 AM
      1 point
      Parent
      Care to elaborate on the every day thing? Aside from literal coins, your cell phone is perfectly capable of generating pseudorandom numbers, and I’m almost never without mine.
      
      I guess whether your point stands depends on whether we are more concerned with abstract theory or practical decision making.
      - Daniel Kokotajlo Jan 30, 2021, 7:36 AM
        3 points
        0
        Parent
        Here are some circumstances where you don’t have access to an unpredictable random number generator:
        --You need to make a decision very quickly and so don’t have time to flip a coin
        --Someone is watching you and will behave differently towards you if they see you make the decision via randomness, so consulting a coin isn’t a random choice between options but rather an additional option with its own set of payoffs
        --Someone is logically entangled with you and if you randomize they will no longer be.
        --You happen to be up against someone who is way smarter than you and can predict your coin / RNG / etc.
        Admittedly, while in some sense these things happen literally every day to all of us, they typically don’t happen for important decisions.
        But there are important decisions having to do with acausal trade that fit into this category, that either we our our AI successors will face one day.
        And even if that wasn’t true, decision theory is decision THEORY. If one theory outperforms another in some class of cases, that’s a point in its favor, even if the class of cases is unusual.
        EDIT: See Paul Christiano’s example below, it’s an excellent example because it takes Caspar’s paper and condenses it into a very down-to-earth, probably-has-actually-happened-to-someone-already example.
  - DirectedEvolution Jan 28, 2021, 11:46 PM
    3 points
    Parent
    I’ve picked up my game theory entirely informally. But in real world terms, perhaps we’re imagining a situation where a randomization approach isn’t feasible for some other reason than a random number generator being unavailable.
    
    This connects slightly with the debate over whether or not to administer untested COVID vaccine en masse. To pick randomly “feels scary” compared to picking “for a reason,” but to pick “for a reason” when there isn’t an actual evidence basis yet undermines the authority of regulators, so regulators don’t pick anything until they have a “good reason” to do so. Their political calculus, in short, makes them unable to use a randomization scheme.
    
    So in terms of real world applicability, the constraint on a non-randomizing strategy seems potentially relevant, although the other aspects of this puzzle don’t map onto COVID vaccine selection specifically.