This is what I came here to say! I think you point out a crisp reason why some task settings make alignment harder than others, and why we get catastrophically optimized against by some kinds of smart agents but not others (like Deep Blue).
This is what I came here to say! I think you point out a crisp reason why some task settings make alignment harder than others, and why we get catastrophically optimized against by some kinds of smart agents but not others (like Deep Blue).