Andrew_Critch comments on «Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch 27 Aug 2022 21:12 UTC
LW: 6 AF: 4
14
AF
6. Mesa-optimizers — instances of learned models that are themselves optimizers, which give rise to the so-called inner alignment problem (Hubinger et al, 2019).