TurnTrout comments on A shot at the diamond-alignment problem

TurnTrout 28 Nov 2022 22:56 UTC
LW: 4 AF: 4
0
AF
Yes, that’s a good question. This is what I’ve been aiming to answer with recent posts.
What is the difference between just thinking long and hard about what to do vs. adversarially selecting a plan that’ll appeal to you? Isn’t the former going to in effect basically equal the latter, thanks to extremal Goodhart? In the limit where you consider all possible plans (maximum optimization power), aren’t they the same?”
(I’m presently confident the answer is “no”, as might be clear from my comments and posts!)
- Daniel Kokotajlo 28 Nov 2022 23:13 UTC
  LW: 2 AF: 2
  0
  AF Parent
  OK, guess I’ll go read those posts then...