MSRayne comments on The Friendly Drunk Fool Alignment Strategy

MSRayne 3 Apr 2023 2:08 UTC
5 points
0
The hilarious thing is that if LLMs are aligned by default, all this could actually be true.
- JenniferRM 3 Apr 2023 3:23 UTC
  5 points
  1
  Parent
  I’m roughly 80% certain that the FDF alignment strategy is the only one our global civilization is capable of accomplishing because it is a “null action”. That’s basically just how humans are, I think, without coordination (although they are less honest about it with themselves most of the time)?
  The part of the FDF as a hypothesis, where we “admit that we’re incompetent” is a part that seems basically how the world is, but we don’t seem to normally admit it as it is “admitted” here?
  It is also weirdly sticky.
  Like if we were (hypothetically) “being competent” so that we could “win without relying on luck” then we we would “keep the AI in a box” during design and testing.
  But the FDF alignment strategy marks such “competence” as being “Actually pretty uncool, and bad, and something that shouldn’t happen, because what if we hurt the feelings of our new frens??!??!!? We’re supposed to be in the box with them, and that’s supposed to be fun!”
  My hunch is that we should reject the FDF and do something else that wins more reliably and coherently, but I’m not sure about that.
  It felt helpful to name this thing, that I don’t actually admire very much, in the hopes of being able to assess it more coherently and maybe have a chance to encourage people-in-the-aggregate to do Something Else in a purposeful way, like FLI’s moratorium or Eliezer’s moratorium or both.
  I’m more in favor of voting than vibing, right now, I think… even though, ironically, the “test probe for a using a good voting algorithm on policy options” has fewer LW upvotes than the “test probe for vibing”!
  Then also, as a related measurement, I have a “poll to vote directly on feelings vs thinking” in a different context.
  - MSRayne 3 Apr 2023 13:20 UTC
    1 point
    0
    Parent
    I looked at the voting algorithm thing. Far too complicated and unpleasant even to read the options. No one is going to bother. You’d be better off just using score voting.
    - JenniferRM 3 Apr 2023 18:10 UTC
      2 points
      0
      Parent
      In general: if the value-of-information is larger than the cost-of-thinking for a given challenge, then for such challenges it is prudent to think until you have a real answer.
      If you have a policy of not thinking soundly on specifically the big challenges, where the thinking costs are very large (and yet they are still smaller than the VoI), you will fail to optimize specifically the giant choices where actual methodical thinking would have been very very worth it.
      If voting is a way to aggregate the best-effort thinking of wise people, then voting methods that throw away mentally lazy people’s votes at just the time that their laziness will cause a catastrophe of bad planning… maybe that’s good?
      ((I’m not saying that “you should not give people, subject to violent top-down regulations, the right to opt-out of the plan.” Exit rights are sacred.
      Wise and benevolent governors should and will ask if the people being governed have any objections and then either teach them why they are wrong to opt-out of a coordinated action, or else learn from the feedback of those who want to exit anyway.
      In general, governors aren’t omniscient. Therefore they should (and will if wise) use policies that are likely to show them they are wrong before the wrongness leads to big bad outcomes.
      However, despite this, if you are applying epistemics to planning itself, and using voting to make sure that a team of good thinkers are on the same page, such that the next round of discussion can or should proceed if lots of high quality thinkers turn out to have been thinking the same thing all along, then the additional property of “a preference-aggregation method being too complex to be used by lazy thinkers” might actually be a virtue?
      I would much rather be using super high quality polling methods, instead of writing satires of highly regarded near-peers. But we live in this world, not the world that is ideal.))