“No superintelligent AI is going to bother with a task that is harder than hacking its reward function in such a way that it doesn’t perceive itself as hacking its reward function.”
Firstly “Bother” and “harder” are strange words to use. Are we assuming lazy AI?
Suppose action X would hack the AI’s reward signal. The AI is totally clueless of this, has no reason to consider X and doesn’t do X.
If the AI knows what X does, it still doesn’t do it.
I think the AI would need some sort of doublethink, to realize that X hacked it’s reward, yet also not realize this.
I also think this claim is factually false. Many humans can and do set out towards goals far harder than accessing large amounts of psychoactives.
Firstly “Bother” and “harder” are strange words to use. Are we assuming lazy AI?
Suppose action X would hack the AI’s reward signal. The AI is totally clueless of this, has no reason to consider X and doesn’t do X.
If the AI knows what X does, it still doesn’t do it.
I think the AI would need some sort of doublethink, to realize that X hacked it’s reward, yet also not realize this.
I also think this claim is factually false. Many humans can and do set out towards goals far harder than accessing large amounts of psychoactives.