The Queen’s Dilemma: A Paradox of Control

Our large learning machines find patterns in the world and use them to predict. When these machines exceed us and become superhuman, one of those patterns will be relative human incompetence. How comfortable are we with the incorporation of this pattern into their predictions, when those predictions become the actions that shape the world?

My thanks to Matthew Farrugia-Roberts for feedback and discussion.

As artificially intelligent systems improve at pattern recognition and prediction, one of the most prominent patterns that they’ll encounter in the world is human incompetence relative to their own abilities. This raises a question: how comfortable are we with these systems incorporating our relative inadequacy into their world-shaping decisions?

To illustrate the core dynamic at play, consider a chess match where White is played by an AI, while Black is controlled by a team consisting of a human and an AI working in tandem. The human, restricted to moving only the queen, gets to play whenever they roll a six on a die; otherwise, their AI partner makes the move. The human can choose to pass, rather than move the queen. The AI on the Black team can play any piece at any time, including the queen.[1]

If the human aims to win, and instructs their AI teammate to prioritise winning above all else, it might develop strategies that minimise the impact of human “interference” – perhaps by positioning pieces to restrict the queen’s movement. As the performance gap between the human and the AI on Black widens, this tension between achieving performance on the task and preserving meaningful human agency becomes more pronounced.

The challenge isn’t about explicit control – the human can still make any legal move with the queen when given the chance. Rather, it’s about the subtle erosion of effective control. The AI, making more moves and possessing superior strategic understanding, could systematically diminish the practical significance of human input while maintaining the appearance of cooperation. This distinction between de jure and de facto control becomes critical. We might accept if our queen becomes accidentally boxed in during normal play, but we bristle at the thought of our AI partner deliberately engineering such situations to mitigate our “unreliability.”

The broader point is that even if AIs are completely aligned with human values, the very mechanisms by which we maintain control (such as scalable oversight and other interventions) may shift how the system operates in a way that produces fundamental, widespread effects across all learning machines—effects that may be difficult to mitigate, because the nature of our interventions tends to enhance the phenomenon.[2]

We can view this as a systematic challenge, analogous to controlling contaminants in semiconductor manufacturing.[3] Just as chip fabrication must carefully manage unwanted elements through multiple processing stages, AI development might have to tackle the challenge of how patterns of human limitation influence system behaviour. The goal isn’t to eliminate awareness of human limitations – which would be both impossible and counterproductive – but to understand and manage how this awareness shapes AI behaviour.

Even perfectly aligned systems, genuinely pursuing human goals, might naturally evolve to restrict human agency.[4] Any balance between capability and control may ultimately prove unsustainable – perhaps leading to a permanent loss of de facto human control or requiring fundamental changes in human capabilities to close the gap. In the interim, understanding and managing this tension will be one of the ongoing challenges of AI alignment and control.

Related Work

The Queen’s Dilemma connects to several important traditions in social theory and systems thinking. Max Weber’s analysis of bureaucratic rationalisation describes how rational systems of organisation can create an “iron cage” that constrains human agency while maintaining the appearance of freedom. This mirrors how AI systems might maintain formal mechanisms for human input while systematically diminishing their practical significance through optimisation.

Amartya Sen’s capability approach provides a framework for understanding why preserving human agency matters beyond mere goal achievement, and his work on the relation between well-being, agency and freedom (Dewey Lectures 1984) informed the treatment here.

From cybernetics, several key ideas inform our analysis. W. Ross Ashby’s Law of Requisite Variety (1956) suggests fundamental limits to human control over more capable systems. Herbert A. Simon’s research on bounded rationality and organisational decision-making provides several key insights relevant to the Queen’s Dilemma. Simon (1976) showed how organisations develop procedures and structures to bridge the gap between human cognitive limitations and the complexity of decisions they face.

These perspectives suggest the Queen’s Dilemma might be understood as part of a broader pattern in how rational systems interact with human agency—one that has manifested in different forms across various domains of social organisation. The challenge of maintaining meaningful human agency in a world suffused with AI systems may require not just better control mechanisms, but a fundamental rethinking of the relationship between human agency and machine capability.

  1. ^

    This dice roll mechanism represents the differential speeds of decision-making between humans and AI systems; the AI system acts much more quickly than you in the world, that is, it takes more turns.

  2. ^

    In practice, this manifests in scenarios like scalable oversight, where we constrain AI systems to generate outputs that humans can effectively verify. This bias in the space of possible plans creates a kind of pressure, against which the optimisation pressure of task performance must push.

  3. ^

    The analogy to semiconductor manufacturing speaks to the complex dynamics of a system involving humans, AIs, alignment mechanisms, and control structures all operating together. The interaction between these components creates emergent pressures that can systematically erode meaningful human agency, even when each individual component is working as intended.

  4. ^

    Agency itself represents a special kind of preference – not just about what we want to achieve, but how we want to achieve it.