Daniel Kokotajlo comments on The Commitment Races problem

Daniel Kokotajlo 1 Mar 2021 17:47 UTC
LW: 3 AF: 2
0
AF
If I read you correctly, you are suggesting that some portion of the problem can be solved, basically—that it’s in some sense obviously a good idea to make a certain sort of commitment, e.g. “When I encounter future bargaining partners, we will (based on our models at that time) agree on a world-model according to some protocol and apply some solution concept (e.g. Nash or Kalai-Smorodinsky) to it in order to arrive at an agreement.” So the commitment races problem may still exist, but it’s about what other commitments to make besides this one, and when. Is this a fair summary?
I guess my response would be “On the object level, this seems like maybe a reasonable commitment to me, though I’d have lots of questions about the details. We want it to be vague/general/flexible enough that we can get along nicely with various future agents with somewhat different protocols, and what about agents that are otherwise reasonable and cooperative but for some reason don’t want to agree on a world-model with us? On the meta level though, I’m still feeling burned from the various things that seemed like good commitments to me and turned out to be dangerous, so I’d like to have some sort of stronger reason to think this is safe.”
- JesseClifton 1 Mar 2021 19:46 UTC
  LW: 3 AF: 2
  0
  AF Parent
  Yeah I agree the details aren’t clear. Hopefully your conditional commitment can be made flexible enough that it leaves you open to being convinced by agents who have good reasons for refusing to do this world-model agreement thing. It’s certainly not clear to me how one could do this. If you had some trusted “deliberation module”, which engages in open-ended generation and scrutiny of arguments, then maybe you could make a commitment of the form “use this protocol, unless my counterpart provides reasons which cause my deliberation module to be convinced otherwise”. Idk.
  
  Your meta-level concern seems warranted. One would at least want to try to formalize the kinds of commitments we’re discussing and ask if they provide any guarantees, modulo equilibrium selection.
  - Daniel Kokotajlo 1 Mar 2021 22:14 UTC
    LW: 2 AF: 1
    0
    AF Parent
    I think we are on the same page then. I like the idea of a deliberation module; it seems similar to the “moral reasoning module” I suggested a while back. The key is to make it not itself a coward or bully, reasoning about schelling points and universal principles and the like instead of about what-will-lead-to-the-best-expected-outcomes-given-my-current-credences.