CarlShulman comments on The evaluation function of an AI is not its aim

CarlShulman 11 Oct 2021 17:17 UTC
2 points
You may be interested in some recent empirical experiments, demonstrating objective robustness failures/inner misalignment, including ones predicted in the risks from learned optimization paper.