Nicholas / Heather Kross comments on Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Nicholas / Heather Kross 21 Jan 2024 3:30 UTC
4 points
2
Decision theory is hard. In trying to figure out why DT is useful (needed?) for AI alignment in the first place, I keep running into weirdness, including with bargaining.
Without getting too in-the-weeds: I’m pretty damn glad that some people out there are working on DT and bargaining.
- Anthony DiGiovanni 21 Jan 2024 7:51 UTC
  1 point
  0
  Parent
  Can you say more about what you think this post has to do with decision theory? I don’t see the connection. (I can imagine possible connections, but don’t think they’re relevant.)
  - Nicholas / Heather Kross 21 Jan 2024 16:58 UTC
    2 points
    0
    Parent
    So there’s a sorta-crux about how much DT alignment researchers would have to encode into the-AI-we-want-to-be-aligned, before that AI is turned on. Right now I’m leaning towards “an AI that implements CEV well, would either turn-out-to-have or quickly-develop good DT on its own”, but I can see it going either way. (This was especially true yesterday when I wrote this review.)
    And I was trying to think through some of the “DT relevance to alignment” question, and I looked at relevant posts by [Tamsin Leake](https://www.lesswrong.com/users/tamsin-leake) (whose alignment research/thoughts I generally agree with). And that led me to thinking more about value-handshakes, son-of-CDT (see Arbital), and systems like ROSE bargaining. Any/all of which, under certain assumptions, could determine (or hint at) answers to the “DT relevance” thing.
    - Anthony DiGiovanni 21 Jan 2024 18:17 UTC
      2 points
      1
      Parent
      Sorry, to be clear, I’m familiar with the topics you mention. My confusion is that ROSE bargaining per se seems to me pretty orthogonal to decision theory.
      I think the ROSE post(s) are an answer to questions like, “If you want to establish a norm for an impartial bargaining solution such that agents following that norm don’t have perverse incentives, what should that norm be?”, or “If you’re going to bargain with someone but you didn’t have an opportunity for prior discussion of a norm, what might be a particularly salient allocation [because it has some nice properties], meaning that you’re likely to zero-shot coordinate on that allocation?”
      - Nicholas / Heather Kross 21 Jan 2024 19:25 UTC
        2 points
        0
        Parent
        Yeah, mostly agreed. My main subquestion (that led me to write the review, besides this post being referenced in Leake’s work) was/sort-of-still-is “Where do the ratios in value-handshakes come from?”. The default (at least in the tag description quote from SSC) is uncertainty in war-winning, but that seems neither fully-principled nor nice-things-giving (small power differences can still lead to huge win-% chances, and superintelligences would presumably be interested in increasing accuracy). And I thought maybe ROSE bargaining could be related to that.
        The relation in my mind was less ROSE --> DT, and more ROSE --?--> value-handshakes --> value-changes --?--> DT.