Concerns about mesa-optimizers are mostly concerns that “capabilities” will be robust to distributional shift while “objectives” will not be robust.
Concerns about mesa-optimizers are mostly concerns that “capabilities” will be robust to distributional shift while “objectives” will not be robust.