Well, fine. Since the context of the discussion was how optimizers pose existential threats, it’s still not clear why an optimizer that is willing and able to modify it’s reward system would continue to optimize paperclips. If it’s intelligent enough to recognize the futility of wireheading, why isn’t it intelligent enough to recognize behavior that is inefficient wireheading?
But I think this is such a basic failure mechanism that I don’t believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem—somehow! - is part of the “normal” development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can’t just set reward to float.infinity.
You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it “somehow” doesn’t become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don’t have an AI catastrophe without that, because you don’t have an AI without that (at least not for long).
Well, fine. Since the context of the discussion was how optimizers pose existential threats, it’s still not clear why an optimizer that is willing and able to modify it’s reward system would continue to optimize paperclips. If it’s intelligent enough to recognize the futility of wireheading, why isn’t it intelligent enough to recognize behavior that is inefficient wireheading?
It wouldn’t.
But I think this is such a basic failure mechanism that I don’t believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem—somehow! - is part of the “normal” development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can’t just set reward to
float.infinity
.You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it “somehow” doesn’t become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don’t have an AI catastrophe without that, because you don’t have an AI without that (at least not for long).