Noa Nabeshima answers Poll: Which variables are most strategically relevant?

Noa Nabeshima 22 Jan 2021 17:32 UTC
LW: 5 AF: 4
AF
Deceptive alignment: “In the critical period, will AIs be deceptive?”

Within the framework of Risks from Learned Optimization, this is when a mesa-optimizer has a different objective than the base objective, but instrumentally optimizes the base objective to deceive humans. It can refer more generally to any scenario where an AI system behaves instrumentally one way to deceive humans.