Richard Willis comments on Lying to chess players for alignment

Richard Willis 26 Oct 2023 2:51 UTC
7 points
4
I would be interested in this. A few years ago I failed to convince my favourite chess YouTubers to engage in something similar. My preference for the roles is A>C>B and I am 2100 on chess.com, 2300 lichess. I’m fairly addicted to chess, so willing to spend many hours on this.

Some musing for the format… I had proposed that instead of a game, the ‘human’ is shown positions that have been selected to be very complicated, but with there being one ambiguously good move. The good move should not be entirely tactical in nature, because this is easy to verify, but rather strategic. I have a book with such positions, but you can find examples online.

The reason for this is that you would otherwise need to be careful about the format. There are some positions that I believe I understand very well and even a top player would really struggle to deceive me in. However, there are also positions in which I have not the faintest clue what is going on. The latter are the more interesting ones to test. If the ‘deceptive AIs’ are forced to lie in a position I understand well, I could then discount them for the rest of the experiment. Even with something like randomising their identifiers at each move, grammatical tells might be present. Therefore, playing out a game, the ‘deceptive AIs’ would need to be truthful on many on the moves and only lie in a handful, which is additional complexity.
What links here?
- Suggestions for chess puzzles by Zane (13 Nov 2023 15:39 UTC; 13 points)
- Zane 26 Oct 2023 13:52 UTC
  1 point
  0
  Parent
  Individual positions like that could be an interesting thing to test; I’ll likely have some people try out some of those too.
  I think the aspect where the deceivers have to tell the truth in many cases to avoid getting caught could make it more realistic, as in the real AI situation the best strategy might be to present a mostly coherent plan with a few fatal flaws.
  - Richard Willis 26 Oct 2023 16:34 UTC
    4 points
    0
    Parent
    I agree that knowing when to lie is part of the challenge a deceptive AI will face. However, I would argue that a coherent plan is needed for every move suggestion. In a game of chess, there are typically only a few critical positions, and it is these where a deceptive AI ought to strike. This is similar to the cheating discussions in chess—a top player would only need a hint in a few positions to greatly benefit—the other 90% of moves they can make without assistance.
    But by focusing on challenging positions, it could be a more efficient use of the participant’s time. Otherwise, for a whole game you may only have had 3 moves where a deceptive AI actually lied.