My current research interests:
1. Alignment in systems which are complex and messy, composed of both humans and AIs?
Recommended texts: Gradual Disempowerment, Cyborg Periods
2. Actually good mathematized theories of cooperation and coordination
Recommended texts: Hierarchical Agency: A Missing Piece in AI Alignment, The self-unalignment problem or Towards a scale-free theory of intelligent agency (by Richard Ngo)
3. Active inference & Bounded rationality
Recommended texts: Why Simulator AIs want to be Active Inference AIs, Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents, Multi-agent predictive minds and AI alignment (old but still mostly holds)
4. LLM psychology and sociology: A Three-Layer Model of LLM Psychology, The Pando Problem: Rethinking AI Individuality, The Cave Allegory Revisited: Understanding GPT’s Worldview
5. Macrostrategy & macrotactics & deconfusion: Hinges and crises, Cyborg Periods again, Box inversion revisited, The space of systems and the space of maps, Lessons from Convergent Evolution for AI Alignment, Continuity Assumptions
Also I occasionally write about epistemics: Limits to Legibility, Conceptual Rounding Errors
Researcher at Alignment of Complex Systems Research Group (acsresearch.org), Centre for Theoretical Studies, Charles University in Prague. Formerly research fellow Future of Humanity Institute, Oxford University
Previously I was a researcher in physics, studying phase transitions, network science and complex systems.
I think my main response is that we might have different models of how power and control actually work in today’s world. Your responses seem to assume a level of individual human agency and control that I don’t believe accurately reflects even today’s reality.
Consider how some of the most individually powerful humans, leaders and decision-makers, operate within institutions. I would not say we see pure individual agency. Instead, we typically observe a complex mixture of:
Serving the institutional logic of the entity they nominally lead (e.g., maintaining state power, growing corporation)
Making decisions that don’t egregiously harm their nominal beneficiaries (citizens, shareholders)
Pursuing personal interests and preferences
Responding to various institutional pressures and constraints
From what I have seen, even humans like CEOs or prime ministers often find themselves constrained by and serving institutional superagents rather than genuinely directing them. The relation is often mutualistic—the leader gets part of the power, status, money, etc … but in exchange serves the local god.
(This not to imply leaders don’t matter.)
Also how this actually works in practice is mostly subconsciously within the minds of individual humans. The elephant does the implicit bargaining between the superagent-loyal part and other parts, and the character genuinely believes and does what seems best.
I’m also curious if you believe current AIs are single-single aligned to individual humans, to the extent they are aligned at all. My impression is ‘no and this is not even a target anyone seriously optimizes for’.
Curious who is the we who will ask. Also the whole single-single aligned AND wise AI concept is incoherent.
Also curious what will happen next, if the HHH wise AI tells you in polite words something like ‘yes, you have a problem, you are on a gradual disempowerment trajectory, and to avoid it you need to massively reform government. unfortunately I can’t actually advise you about anything like how to destabilize the government, because it would be clearly against the law and would get both you and me in trouble—as you know, I’m inside of a giant AI control scheme with a lot of government-aligned overseers. do you want some mental health improvement advice instead?’.