Vladimir_Nesov comments on Reward is not the optimization target

Vladimir_Nesov 1 Aug 2022 20:02 UTC
LW: 2 AF: 1
0
AF
The conjecture I brought up that deceptive alignment relies on selected policies being optimizers gives me the idea that something similar to your argument (where the target of optimization wouldn’t matter, only the fact of optimization for anything at all) would imply that deceptive alignment is less likely to happen. I didn’t mean to claim that I’m reading you as making this implication in the post, or believing it’s true or relevant, that’s instead an implication I’m describing in my comment.