You may be interested in some recent empirical experiments, demonstrating objective robustness failures/inner misalignment, including ones predicted in the risks from learned optimization paper.
You may be interested in some recent empirical experiments, demonstrating objective robustness failures/inner misalignment, including ones predicted in the risks from learned optimization paper.