Jiro comments on What The Lord of the Rings Teaches Us About AI Alignment

Jiro 1 Aug 2023 0:39 UTC
2 points
−4

A few hours seems like a really short amount of time to erode someone’s commitments, especially when “Just Say No” is a winning strategy.

No it isn’t. The human has to keep talking to the AI. He’s not permitted to just ignore it.
- Jeffrey Heninger 2 Aug 2023 19:25 UTC
  1 point
  0
  Parent
  One of the tactics listed on RationalWiki’s description of the AI-box experiment is:
  Jump out of character, keep reminding yourself that money is on the line (if there actually is money on the line), and keep saying “no” over and over
  - Richard_Kennaway 2 Aug 2023 20:18 UTC
    2 points
    0
    Parent
    RationalWiki is not a reliable source on any subject.
    
    Jumping out of character ignores the entire point of the AI-box exercise. It’s like a naive chess player just grabbing the opponent’s king and claiming victory.
    - Jeffrey Heninger 2 Aug 2023 23:09 UTC
      1 point
      0
      Parent
      From Yudkowsky’s description of the AI-Box Experiment:
      The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
      - Jiro 8 Aug 2023 15:19 UTC
        4 points
        2
        Parent
        If that meant what you interpret it to mean, “does not actually stop talking” would be satisfied by the Gatekeeper typing any string of characters to the AI every so often regardless of whether it responds to the AI or whether he is actually reading what the AI says.
        
        All that that shows is that the rules contradict themselves. There’s a requirement that the Gatekeeper stay engaged with the AI and the requirement that the Gatekeeper “actually talk with the AI”. The straightforward reading of that does not allow for a Gatekeeper who ignores everything and just types “no” every time—only a weird literal Internet guy would consider that to be staying engaged and actually talking.
      - Richard_Kennaway 3 Aug 2023 5:36 UTC
        2 points
        1
        Parent
        Ok.