Counterfactual Planning

Counterfactual planning is a design approach for creating a range of safety mechanisms that can be applied to AGI systems. This sequence introduces the graphical notation used in counterfactual planning, and it defines several safety mechanisms.

Coun­ter­fac­tual Plan­ning in AGI Systems

Graph­i­cal World Models, Coun­ter­fac­tu­als, and Ma­chine Learn­ing Agents

Creat­ing AGI Safety Interlocks

Disen­tan­gling Cor­rigi­bil­ity: 2015-2021

Safely con­trol­ling the AGI agent re­ward function