Yes, that’s a good question. This is what I’ve been aiming to answer with recent posts.
What is the difference between just thinking long and hard about what to do vs. adversarially selecting a plan that’ll appeal to you? Isn’t the former going to in effect basically equal the latter, thanks to extremal Goodhart? In the limit where you consider all possible plans (maximum optimization power), aren’t they the same?”
(I’m presently confident the answer is “no”, as might be clear from my comments and posts!)
Yes, that’s a good question. This is what I’ve been aiming to answer with recent posts.
(I’m presently confident the answer is “no”, as might be clear from my comments and posts!)
OK, guess I’ll go read those posts then...