I agree with you—and yes we ignore this problem by assuming goal-alignment. I think there’s a lot riding on the pre-SLT model having “beneficial” goals.
To the extent that this framing is correct, the “sharp left turn” concept does not seem all that decision-relevant, since all most of the work of aligning the system (at least on the human side) should’ve happened way before that point.
I agree with you—and yes we ignore this problem by assuming goal-alignment. I think there’s a lot riding on the pre-SLT model having “beneficial” goals.
To the extent that this framing is correct, the “sharp left turn” concept does not seem all that decision-relevant, since
allmost of the work of aligning the system (at least on the human side) should’ve happened way before that point.EDIT: “all” was too strong here