Donald Hobson comments on The Lebowski Theorem — Charitable Reads of Anti-AGI-X-Risk Arguments, Part 2

Donald Hobson 10 Dec 2022 23:13 UTC
2 points
0
“No superintelligent AI is going to bother with a task that is harder than hacking its reward function in such a way that it doesn’t perceive itself as hacking its reward function.”
Firstly “Bother” and “harder” are strange words to use. Are we assuming lazy AI?
Suppose action X would hack the AI’s reward signal. The AI is totally clueless of this, has no reason to consider X and doesn’t do X.
If the AI knows what X does, it still doesn’t do it.
I think the AI would need some sort of doublethink, to realize that X hacked it’s reward, yet also not realize this.
I also think this claim is factually false. Many humans can and do set out towards goals far harder than accessing large amounts of psychoactives.