Jeremy Gillen comments on Why Not Subagents?

Jeremy Gillen 18 Jul 2024 7:20 UTC
2 points
0
I made a mistake again. As described above, complete only pseudodominates incomplete.
But this is easily patched with the trick described in the OP. So we need the choice complete to make two changes to the downstream decisions. First, change decision 1 to always choose up (as before), second, change the distribution of Decision 2 to { $1 - q (1 - p)$ , $q (1 - p)$ }, because this keeps the probability of B constant. Fixed diagram:
Now the lottery for complete is {B: $q (1 - p)$ , A+: $1 - q (1 - p)$ , A: $0$ }, and the lottery for incomplete is {B: $q (1 - p)$ , A+: $p q$ , A: $(1 - q)$ }. So overall, there is a pure shift of probability from A to A+.
[Edit 23/7: hilariously, I still had the probabilities wrong, so fixed them, again].
What links here?
- Jeremy Gillen's comment on Jeremy Gillen’s Shortform by Jeremy Gillen (6 Sep 2024 11:15 UTC; 25 points)
- Jeremy Gillen 2 Oct 2024 12:16 UTC
  5 points
  0
  Parent
  I think the above money pump works, if the agent sometimes chooses the A path, but I was incorrect in thinking that the caprice rule sometimes chooses the A path.
  I misinterpreted one of EJT’s comments as saying it might choose the A path. The last couple of days I’ve been reading through some of the sources he linked to in the original “there are no coherence theorems” post and one of them (Gustafsson) made me realize I was interpreting him incorrectly, by simplifying the decision tree in a way that doesn’t make sense. I only realized this yesterday.
  
  Now I think that the caprice rule is essentially equivalent to updatelessness. If I understand correctly, it would be equivalent to 1. choosing the best policy by ranking them in the partial order of outcomes (randomizing over multiple maxima), then 2. implementing that policy without further consideration. And this makes it immune to money pumps and renders any self-modification pointless. It also makes it behaviorally indistinguishable from an agent with complete preferences, as far as I can tell.
  The same updatelessness trick seems to apply to all money pump arguments. It’s what scott uses in this post to avoid the independence money pump.
  
  So currently I’m thinking updatelessness removes most of the justification for the VNM axioms (including transitivity!). But I’m confused because updateless policies still must satisfy local properties like “doesn’t waste resources unless it helps achieve the goal”, which is intuitively what the money pump arguments represent. So there must be some way to recover properties like this. Maybe via John’s approach here.
  But I’m only maybe 80% sure of my new understanding, I’m still trying to work through it all.
  What links here?
  - Jeremy Gillen's comment on Jeremy Gillen’s Shortform by Jeremy Gillen (2 Oct 2024 13:12 UTC; 2 points)
  - dxu 1 Nov 2024 23:27 UTC
    8 points
    0
    Parent
    It looks to me like the “updatelessness trick” you describe (essentially, behaving as though certain non-local branches of the decision tree are still counterfactually relevant even though they are not — although note that I currently don’t see an obvious way to use that to avoid the usual money pump against intransitivity) recovers most of the behavior we’d see under VNM anyway; and so I don’t think I understand your confusion re: VNM axioms.
    
    E.g. can you give me a case in which (a) we have an agent that exhibits preferences against whose naive implementation there exists some kind of money pump (not necessarily a repeatable one), (b) the agent can implement the updatelessness trick in order to avoid the money pump without modifying their preferences, and yet (c) the agent is not then representable as having modified their preferences in the relevant way?
    - Jeremy Gillen 5 Nov 2024 19:42 UTC
      2 points
      0
      Parent
      Good point.
      What I meant by updatelessness removes most of the justification is the reason given here at the very beginning of “Against Resolute Choice”. In order to make a money pump that leads the agent in a circle, the agent has to continue accepting trades around a full preference loop. But if it has decided on the entire plan beforehand, it will just do any plan that involves <1 trip around the preference loop. (Although it’s unclear how it would settle on such a plan, maybe just stopping its search after a given time). It won’t (I think?) choose any plan that does multiple loops, because they are strictly worse.
      After choosing this plan though, I think it is representable as VNM rational, as you say. And I’m not sure what to do with this. It does seem important.
      However, I think Scott’s argument here satisfies (a) (b) and (c). I think the independence axiom might be special in this respect, because the money pump for independence is exploiting an update on new information.
      - EJT 19 Nov 2024 13:46 UTC
        3 points
        0
        Parent
        I don’t think agents that avoid the money pump for cyclicity are representable as satisfying VNM, at least holding fixed the objects of preference (as we should). Resolute choosers with cyclic preferences will reliably choose B over A- at node 3, but they’ll reliably choose A- over B if choosing between these options ex nihilo. That’s not VNM representable, because it requires that the utility of A- be greater than the utility of B and. that the utility of B be greater than the utility of A-
  - EJT 19 Nov 2024 13:37 UTC
    3 points
    0
    Parent
    It also makes it behaviorally indistinguishable from an agent with complete preferences, as far as I can tell.
    That’s not right. As I say in another comment:
    And an agent abiding by the Caprice Rule can’t be represented as maximising utility, because its preferences are incomplete. In cases where the available trades aren’t arranged in some way that constitutes a money-pump, the agent can prefer (/reliably choose) A+ over A, and yet lack any preference between (/stochastically choose between) A+ and B, and lack any preference between (/stochastically choose between) A and B. Those patterns of preference/behaviour are allowed by the Caprice Rule.
    Or consider another example. The agent trades A for B, then B for A, then declines to trade A for B+. That’s compatible with the Caprice rule, but not with complete preferences.
    Or consider the pattern of behaviour that (I elsewhere argue) can make agents with incomplete preferences shutdownable. Agents abiding by the Caprice rule can refuse to pay costs to shift probability mass between A and B, and refuse to pay costs to shift probability mass between A and B+. Agents with complete preferences can’t do that.
    The same updatelessness trick seems to apply to all money pump arguments.
    [I’m going to use the phrase ‘resolute choice’ rather than ‘updatelessness.’ That seems like a more informative and less misleading description of the relevant phenomenon: making a plan and sticking to it. You can stick to a plan even if you update your beliefs. Also, in the posts on UDT, ‘updatelessness’ seems to refer to something importantly distinct from just making a plan and sticking to it.]
    That’s right, but the drawbacks of resolute choice depend on the money pump to which you apply it. As Gustafsson notes, if an agent uses resolute choice to avoid the money pump for cyclic preferences, that agent has to choose against their strict preferences at some point. For example, they have to choose B at node 3 in the money pump below, even though—were they facing that choice ex nihilo—they’d prefer to choose A-.
    There’s no such drawback for agents with incomplete preferences using resolute choice. As I note in this post, agents with incomplete preferences using resolute choice need never choose against their strict preferences. The agent’s past plan only has to serve as a tiebreaker: forcing a particular choice between options between which they’d otherwise lack a preference. For example, they have to choose B at node 2 in the money pump below. Were they facing that choice ex nihilo, they’d lack a preference between B and A-.