My current research interests:
- alignment in systems which are complex and messy, composed of both humans and AIs?
- actually good mathematized theories of cooperation and coordination
- active inference
- bounded rationality
Research at Alignment of Complex Systems Research Group (acsresearch.org), Centre for Theoretical Studies, Charles University in Prague. Formerly research fellow Future of Humanity Institute, Oxford University
Previously I was a researcher in physics, studying phase transitions, network science and complex systems.
I like this review/retelling a lot.
Minor point
Regarding the “Phase I” and “Phase II” terminology—while it has some pedagogical value, I worry about people interpreting it as a clear temporal decomposition. The implication being we first solve alignment and then move on to Phase II.
In reality, the dynamics are far messier, with some ‘Phase II’ elements already complicating our attempts to address ‘Phase I’ challenges.
Some of the main concerning pathways include:
- People attempting to harness superagent-level powers to advance their particular visions of the future. For example, Leopold-style thinking of “let’s awaken the spirit of the US and its servants to engage in a life-or-death struggle with China.” Seem way easier to summon than to control. We already see a bunch of people feeling patriotic about AGI a feeling the need to be as fast for their nation to win—AGI is to a large extent already developed by memeplexes/superagents; people close to the development are partially deluding themselves about how much control they individually have about the process, or even about the identity of the ‘we’ they assume the AI will be aligned with. Memes often hide as part of people’s identities.