GoteNoSente answers Lying to chess players for alignment

GoteNoSente 26 Oct 2023 12:10 UTC
2 points
0
I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.

Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.
- Zane 26 Oct 2023 14:10 UTC
  1 point
  0
  Parent
  No computers, because the advisors should be reporting their own reasoning (or, ²⁄₃ of the time, a lie that they claim is their own reasoning.) I would prefer to avoid explicit coordination between the advisors, because the AIs might not have access to each other in the real world, but I’m not sure at the moment whether player A can show the advisors each other’s suggestions and ask for critiques. I would prefer not to give either dishonest advisor information on who the other two were, since the real-world AIs probably can’t read each other’s source code.
  - GoteNoSente 27 Oct 2023 23:00 UTC
    5 points
    4
    Parent
    As an additional thought regarding computers, it seems to me that participant B could be replaced by a weak computer in order to provide a consistent experimental setting. For instance, Leela Zero running just the current T2 network (no look-ahead) would provide an opponent that is probably at master-level strength and should easily be able to crush most human opponents who are playing unassisted, but would provide a perfectly reproducible and beatable opponent.
    - Zane 27 Oct 2023 23:41 UTC
      1 point
      0
      Parent
      [facepalms] Thanks! That idea did not occur to me and drastically simplifies all of the complicated logistics I was previously having trouble with.
  - GoteNoSente 27 Oct 2023 22:53 UTC
    1 point
    0
    Parent
    I think having access to computer analysis would allow the advisors (both honest and malicious) to provide analysis far better than their normal level of play, and allow the malicious advisors in particular to set very deep traps. The honest advisor, on the other hand, could use the computer analysis to find convincing refutations of any traps the dishonest advisors are likely to set, so I am not sure whether the task of the malicious side becomes harder or easier in that setup. I don’t think reporting reasoning is much of a problem here, as a centaur (a chess player consulting an engine) can most certainly give reasons for their moves (even though sometimes they won’t understand their own advice and be wrong about why their suggested move is good).
    
    It does make the setup more akin to working with a superintelligence than working with an AGI, though, as the gulf between engine analysis and the analysis that most/all humans can do unassisted is vast.
    - Zane 27 Oct 2023 23:46 UTC
      1 point
      0
      Parent
      The problem is that while the human can give some rationalizations as to “ah, this is probably why the computer says it’s the best move,” it’s not the original reasoning that generated those moves as the best option, because that took place inside the engine. Some of the time, looking ahead with computer analysis is enough to reproduce the original reasoning—particularly when it comes to tactics—but sometimes they would just have to guess.