I think I agree with everything you said yet still feel confused. My question/objection/issue was not so much “How do you explain people sometimes falling victim to plans which spuriously appeal to their value shards!?!? Checkmate!” but rather “what does it mean for an appeal to be spurious? What is the difference between just thinking long and hard about what to do vs. adversarially selecting a plan that’ll appeal to you? Isn’t the former going to in effect basically equal the latter, thanks to extremal Goodhart? In the limit where you consider all possible plans (maximum optimization power), aren’t they the same?”
Yes, that’s a good question. This is what I’ve been aiming to answer with recent posts.
What is the difference between just thinking long and hard about what to do vs. adversarially selecting a plan that’ll appeal to you? Isn’t the former going to in effect basically equal the latter, thanks to extremal Goodhart? In the limit where you consider all possible plans (maximum optimization power), aren’t they the same?”
(I’m presently confident the answer is “no”, as might be clear from my comments and posts!)
I think I agree with everything you said yet still feel confused. My question/objection/issue was not so much “How do you explain people sometimes falling victim to plans which spuriously appeal to their value shards!?!? Checkmate!” but rather “what does it mean for an appeal to be spurious? What is the difference between just thinking long and hard about what to do vs. adversarially selecting a plan that’ll appeal to you? Isn’t the former going to in effect basically equal the latter, thanks to extremal Goodhart? In the limit where you consider all possible plans (maximum optimization power), aren’t they the same?”
Yes, that’s a good question. This is what I’ve been aiming to answer with recent posts.
(I’m presently confident the answer is “no”, as might be clear from my comments and posts!)
OK, guess I’ll go read those posts then...