I like this analogy, but there are a couple of features that I think make it hard to think about:
1. The human wants to play, not just to win. You stipulated that “the human aims to win, and instructs their AI teammate to prioritise winning above all else”. The dilemma then arises because the aim to win cuts against the human having agency and control. Your takeaway is “Even perfectly aligned systems, genuinely pursuing human goals, might naturally evolve to restrict human agency.”
So in this analogy, it seems that “winning” stands for the human’s true goals. But (as you acknowledge) it seems like the human doesn’t just want to win, but actually wants both some “winning” and some “agency”. You’ve implicitly tried to factor the entirety of the human’s goals into the outcome of the game, but you have left some of the agency behind, outside of this objective, and this is what creates the dilemma.
For an AI system that is truly ‘perfectly aligned’—truly pursuing the human’s goals, it seems like either
(A) the AI partner would not pursue winning above all else, but would allow some human control at the cost of some ‘winning’, or
(B) if it were possible to actually factor the human’s meta-preference for having agency into ‘winning’, then we shouldn’t care if the AI plays to win above all else, because that already accounts for the human’s desired amount of agency.
For an AI system not perfectly aligned, this becomes a different game (in the sense of game theory). It’s a three player game between the AI partner, the human partner, and the opponent, each of which have different objectives (the difference between the AI and human partners is that the human wants some combination of ‘winning’ and ‘agency’ while the AI just wants ‘winning’; probably the opponent just wants both of them to lose). One interesting dynamic that could then arise is that the human partner could threaten and punish the AI partner by making worse moves than the best moves they can see if the AI doesn’t give them enough control. To stop the human from doing this, the AI either has to
(C) negotiate to give the human some control, or
(D) remove all control from the human (e.g. force the queen to have no bad moves or no moves at all).
In particular, (D) seems like it would be expensive for the AI partner as it requires playing without the queen (against an opponent with no such restriction), so maybe the AI will let the human play sometimes.
2. I don’t think it needs to be a stochastic chess variant. The game is set up so that the human gets to play whenever they roll a 6 on a (presumably six-sided) die. You said this stands in for the idea that in the real world, the AI system makes decisions on a faster timescale than the human. But this particular mechanism of implementing the speed differential as a game mechanism comes at the cost of making the chess variant stochastic. I think that determinism is an important feature of standard chess. In theory, you can solve chess with an adversarial look-ahead search, mini-max, alpha-beta pruning, etc. But as soon as the dice becomes involved, all of the players involved have to switch to expecti-mini-max. Rolling a six can suddenly throw off the tempo in your delicate exchange or your whirlwind manoeuvre. Etc.
I’m a novice at chess, so it’s not like this is going to make a difference to how I think about the analogy (I will struggle to think strategically in both cases). And maybe a sufficiently accomplished chess player is familiar with stochastic variants already. But for someone in-between who is familiar with deterministic chess, maybe it’s easier to consider a non-stochastic variant of the chess game, for example where the human gets the option to play every 6 turns (deterministically), which gives the same speed differential in expectation.
This law sounds super enticing and I want to understand it more. Could you spell out how the law suggests this?
I did a quick search of LessWrong and Wikipedia regarding this law.
″… Ashby’s “Law of requisite variety”, which roughly speaking states that a system can only remain in homeostasis if it has more internal states than the external states it encounters.” from Yuxi_Liu, “Cybernetic dreams”.
“Either the AI is too simple to be an independent robust agent in human society, or it needs to be approximately as complex as humans themselves. Cf. the law of requisite variety.” from Roman Leventov, “For alignment, we should simultaneously use multiple theories of cognition and value”.
“This law (of which Shannon’s theorem 10 relating to the suppression of noise is a special case) says that if a certain quantity of disturbance is prevented by a regulator from reaching some essential variables, then that regulator must be capable of exerting at least that quantity of selection.” from W. R. Ashby (1960), “Design for a Brain”, p. 229, quoted via Wikipedia page.
Enough testimonials, the Wikipedia page itself describes the law as based on the observation that in a two-player game between the environment (disturber) and a system trying to maintain stasis (regulator), if the environment has D moves that all lead to different outcomes (given any move from the system), and the system has R possible responses, then the best the system can do is restrict the number of outcomes to D/R.
I can see the link between this and the descriptions from Yuxi_Liu, Roman Leventov, and Ashby. Your reading is a couple of steps removed. How did you get from D/R outcomes in this game to “fundamental limits to human control over more capable systems”? My guess it that you simply mean that if the more capable system is more complex / has more moves available moves / more “variety” than humans then the law will apply with the human as the regulator and the AI as the disturber. Is that right? Could you comment on how you see capability in terms of variety?