Jack O'Brien comments on Accurate Models of AI Risk Are Hyperexistential Exfohazards

Jack O'Brien 26 Dec 2022 5:38 UTC
2 points
0
Yep, fair point. In my original comment I seemed to forget about the problem of AIs goodharting our long reflection. I probably agree now that doing a pivotal act into a long reflection is approximately as difficult as solving alignment.

(Side-note about how my brain works: I notice that when I think through all the argumentative steps deliberately, I do believe this statement: “Making an AI which helps humans clarify their values is approximately as hard as making an AI care about any simple, specific thing.” However it does not come to mind automatically when I’m reasoning about alignment. 2 Possible fixes:
1. Think more concretely about Retargeting the Search when I think about solving alignment. This makes the problems seem similar in difficulty.
2. Meditate on just how hard it is to target an AI at something. Sometimes I forget how Goodhartable any objective is. )