Jan_Kulveit

Karma: 5,571

My current research interests:

1. Alignment in systems which are complex and messy, composed of both humans and AIs?
Recommended texts: Gradual Disempowerment, Cyborg Periods

2. Actually good mathematized theories of cooperation and coordination
Recommended texts: Hierarchical Agency: A Missing Piece in AI Alignment, The self-unalignment problem or Towards a scale-free theory of intelligent agency (by Richard Ngo)

3. Active inference & Bounded rationality
Recommended texts: Why Simulator AIs want to be Active Inference AIs, Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents, Multi-agent predictive minds and AI alignment (old but still mostly holds)

4. LLM psychology and sociology: A Three-Layer Model of LLM Psychology, The Pando Problem: Rethinking AI Individuality, The Cave Allegory Revisited: Understanding GPT’s Worldview

5. Macrostrategy & macrotactics & deconfusion: Hinges and crises, Cyborg Periods again, Box inversion revisited, The space of systems and the space of maps, Lessons from Convergent Evolution for AI Alignment, Continuity Assumptions

Also I occasionally write about epistemics: Limits to Legibility, Conceptual Rounding Errors

Researcher at Alignment of Complex Systems Research Group (acsresearch.org), Centre for Theoretical Studies, Charles University in Prague. Formerly research fellow Future of Humanity Institute, Oxford University

Previously I was a researcher in physics, studying phase transitions, network science and complex systems.

The Pando Problem: Rethinking AI Individuality

Jan_KulveitMar 28, 2025, 9:03 PM

127 points

13 comments13 min readLW link

Conceptual Rounding Errors

Jan_KulveitMar 26, 2025, 7:00 PM

148 points

15 comments3 min readLW link

(boundedlyrational.substack.com)

Announcing EXP: Experimental Summer Workshop on Collective Cognition

Jan_Kulveit and Anna Gajdova

Mar 15, 2025, 8:14 PM

36 points

2 comments4 min readLW link

AI Control May Increase Existential Risk

Jan_KulveitMar 11, 2025, 2:30 PM

97 points

13 comments1 min readLW link

Gradual Disempowerment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM

124 points

36 comments6 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet) and David Duvenaud

Jan 30, 2025, 5:03 PM

160 points

52 comments2 min readLW link

(gradual-disempowerment.ai)

AI Assistants Should Have a Direct Line to Their Developers

Jan_KulveitDec 28, 2024, 5:01 PM

57 points

6 comments2 min readLW link

A Three-Layer Model of LLM Psychology

Jan_KulveitDec 26, 2024, 4:49 PM

216 points

13 comments8 min readLW link

“Alignment Faking” frame is somewhat fake

Jan_KulveitDec 20, 2024, 9:51 AM

150 points

13 comments6 min readLW link

“Charity” as a conflationary alliance term

Jan_KulveitDec 12, 2024, 9:49 PM

34 points

2 comments5 min readLW link

Hierarchical Agency: A Missing Piece in AI Alignment

Jan_KulveitNov 27, 2024, 5:49 AM

112 points

21 comments11 min readLW link

Jan_Kulveit’s Shortform

Jan_KulveitNov 14, 2024, 12:00 AM

7 points

5 comments LW link

You should go to ML conferences

Jan_KulveitJul 24, 2024, 11:47 AM

112 points

13 comments4 min readLW link

The Living Planet Index: A Case Study in Statistical Pitfalls

Jan_KulveitJun 24, 2024, 10:05 AM

24 points

0 comments4 min readLW link

(www.nature.com)

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

May 22, 2024, 8:55 AM

50 points

0 comments1 min readLW link

(humanaligned.ai)

InterLab – a toolkit for experiments with multi-agent interactions

Tomáš Gavenčiak, Ada Böhm and Jan_Kulveit

Jan 22, 2024, 6:23 PM

69 points

0 comments8 min readLW link

(acsresearch.org)

Box inversion revisited

Jan_KulveitNov 7, 2023, 11:09 AM

40 points

3 comments8 min readLW link

[Question] Snapshot of narratives and frames against regulating AI

Jan_KulveitNov 1, 2023, 4:30 PM

36 points

19 comments3 min readLW link

We don’t understand what happened with culture enough

Jan_KulveitOct 9, 2023, 9:54 AM

87 points

22 comments6 min readLW link 1 review

Elon Musk announces xAI

Jan_KulveitJul 13, 2023, 9:01 AM

75 points

35 comments1 min readLW link

(www.ft.com)