But I think this is such a basic failure mechanism that I don’t believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem—somehow! - is part of the “normal” development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can’t just set reward to float.infinity.
You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it “somehow” doesn’t become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don’t have an AI catastrophe without that, because you don’t have an AI without that (at least not for long).
It wouldn’t.
But I think this is such a basic failure mechanism that I don’t believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem—somehow! - is part of the “normal” development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can’t just set reward to
float.infinity
.You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it “somehow” doesn’t become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don’t have an AI catastrophe without that, because you don’t have an AI without that (at least not for long).