Vladimir_Nesov comments on Richard Ngo’s Shortform

Vladimir_Nesov Mar 20, 2024, 8:07 PM
4 points
0
In times of UDT2, the background assumption was that agents should maintain an unchanging preference, which is separate from knowledge. One motivation for UDT is that updating makes an agent stop caring about updated-away possibilities, while UDT is not doing that. Going back to a previous epistemic state is a way of preserving preference from that epistemic state, the “current” utility function is considered a bug and doesn’t do anything if UDT is adopted. The non-updated agent can in principle consider the information you currently have as one of the possibilities when formulating the general policy for all possibilities, though being bounded it won’t do a very good job.

Traditionally UDT1.1 wants to make its decisions from very little knowledge and to apply the policy to all always. A more pragmatic thing is to make decisions from modestly less knowledge and to scope the policy for middle-term future. Some form of this is useful for many thought experiments where the environment or other players also have the little knowledge our agent uses to make its decisions from the past, and so could know the policy the agent decides on before they need to prepare for it or make predictions about it.

The problem is commitment races (as in the game of chicken), where everyone wants to decide earlier and force the others to respond. But there is a need to remain bounded in making decisions, both to personally compute them and to make it possible for others to anticipate them and to coordinate. This creates a more reasonable equilibrium, motivating decisions from a less ignorant epistemic state that have a better chance of being relevant to the current situation, in balance with trying to decide from a more ignorant epistemic state where a general policy would enable more strategicness across possibilities. UDT1.1 can’t find such balance, but it’s possible that something UDT2-shaped might.
- Richard_Ngo Mar 20, 2024, 8:21 PM
  2 points
  0
  Parent
  One motivation for UDT is that updating makes an agent stop caring about updated-away possibilities, while UDT is not doing that.
  I think there’s an ambiguity here. UDT makes the agent stop considering updated-away possibilities, but I haven’t seen any discussion of UDT which suggests that it stops caring about them in principle (except for a brief suggestion from Paul that one option for UDT is to “go back to a position where I’m mostly ignorant about the content of my values”). Rather, when I’ve seen UDT discussed, it focuses on updating or un-updating your epistemic state.
  I don’t think the shift I’m proposing is particularly important, but I do think the idea that “you have your prior and your utility function from the very beginning” is a kinda misleading frame to be in, so I’m trying to nudge a little away from that.
  - Vladimir_Nesov Mar 20, 2024, 8:45 PM
    2 points
    0
    Parent
    
    UDT makes the agent stop considering updated-away possibilities, but I haven’t seen any discussion of UDT which suggests that it stops caring about them in principle
    
    UDT specifically enables agents to consider the updated-away possibilities in a way relevant to decision making, while an updated agent (that’s not using something UDT-like) wouldn’t be able to do that in any circumstance, and so would be functionally indistinguishable from an agent that has different preferences or undefined preferences for those possibilities. Not caring about them seems like an apt informal description (even as this is compatible with keeping the same utility function outside the event of current knowledge). In a similar way, we could say that after updating, an agent either changes their probability distribution or keeps the original prior.
    
    I do think the idea that “you have your prior and your utility function from the very beginning” is a kinda misleading frame to be in
    
    Historically it was overwhelmingly the frame until recently, so it’s the correct frame for interpreting the intended meaning of texts from that time. This is a simplifying assumption that still leaves many open questions about how to make decisions in sufficiently strange situations (where merely models of behavior make these strange situations ubiquitous in practice). When an agent doesn’t know its own preference and needs to do something about that, it’s an additional complication that usually wasn’t introduced.
    - Richard_Ngo Mar 20, 2024, 8:59 PM
      2 points
      0
      Parent
      UDT specifically enables agents to consider the updated-away possibilities in a way relevant to decision making, while an updated agent (that’s not using something UDT-like) wouldn’t be able to do that in any circumstance
      Agreed; apologies for the sloppy phrasing.
      Historically it was overwhelmingly the frame until recently, so it’s the correct frame for interpreting the intended meaning of texts from that time.
      I agree, that’s why I’m trying to outline an alternative frame for thinking about it.