Well, one hint is that if you look at the actual real intelligences (aka people), not that many express a desire to go directly to wireheading without passing Go and collecting $200...
I don’t think that’s a good reason to say that something like it wouldn’t happen. I think that given the ability, most people would go directly to rewiring their reward centers to respond to something “better” that would dispense with our current overriding goals. Regardless of how I ended up, I wouldn’t leave my reward center wired to eating, sex or many of the other basic functions that my evolutionary program has left me really wanting to do. I don’t see why an optimizer would be different. With an ANI, maybe it would keep the narrow focus, but I don’t understand why an A[SG]I wouldn’t scrap the original goal once it had the knowledge and and ability to do so.
I think that given the ability, most people would go directly to rewiring their reward centers to respond to something “better” that would dispense with our current overriding goals.
And do you have any evidence for that claim besides introspection into your own mind?
I’ve read short stories and other fictional works where people describe post-singularity humanity and almost none of the scenarios involve simulations that just satisfy biological urges. That suggests that thinking seriously about what you’d do with the ability to control your own reward circuitry wouldn’t lead to just using it to satisfy the same urges you had prior to gaining that control.
I see an awful lot of people here on LW who try to combat basic impulses by trying to develop habits that make them more productive. Anyone trying to modify a habit is trying to modify what behaviors lead to rewards.
almost none of the scenarios involve simulations that just satisfy biological urges
The issue isn’t whether you would mess with your reward circuitry, the issue is whether you would just discard it altogether and just directly stimulate the reward center.
And appealing to fictional evidence isn’t a particularly good argument.
Anyone trying to modify a habit is trying to modify what behaviors lead to rewards.
See above—modify, yes, jettison the whole system, no.
Well, fine. Since the context of the discussion was how optimizers pose existential threats, it’s still not clear why an optimizer that is willing and able to modify it’s reward system would continue to optimize paperclips. If it’s intelligent enough to recognize the futility of wireheading, why isn’t it intelligent enough to recognize behavior that is inefficient wireheading?
But I think this is such a basic failure mechanism that I don’t believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem—somehow! - is part of the “normal” development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can’t just set reward to float.infinity.
You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it “somehow” doesn’t become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don’t have an AI catastrophe without that, because you don’t have an AI without that (at least not for long).
I think that is in large part due to signalling and social mores. Once people actually do get the ability to wirehead, in a way that does not kill or debilitate them soon afterwards, I expect that very many many people will choose to wirehead. This is similar to e.g. people professing they don’t want to live forever.
Well, one hint is that if you look at the actual real intelligences (aka people), not that many express a desire to go directly to wireheading without passing Go and collecting $200...
I don’t think that’s a good reason to say that something like it wouldn’t happen. I think that given the ability, most people would go directly to rewiring their reward centers to respond to something “better” that would dispense with our current overriding goals. Regardless of how I ended up, I wouldn’t leave my reward center wired to eating, sex or many of the other basic functions that my evolutionary program has left me really wanting to do. I don’t see why an optimizer would be different. With an ANI, maybe it would keep the narrow focus, but I don’t understand why an A[SG]I wouldn’t scrap the original goal once it had the knowledge and and ability to do so.
And do you have any evidence for that claim besides introspection into your own mind?
I’ve read short stories and other fictional works where people describe post-singularity humanity and almost none of the scenarios involve simulations that just satisfy biological urges. That suggests that thinking seriously about what you’d do with the ability to control your own reward circuitry wouldn’t lead to just using it to satisfy the same urges you had prior to gaining that control.
I see an awful lot of people here on LW who try to combat basic impulses by trying to develop habits that make them more productive. Anyone trying to modify a habit is trying to modify what behaviors lead to rewards.
The issue isn’t whether you would mess with your reward circuitry, the issue is whether you would just discard it altogether and just directly stimulate the reward center.
And appealing to fictional evidence isn’t a particularly good argument.
See above—modify, yes, jettison the whole system, no.
Well, fine. Since the context of the discussion was how optimizers pose existential threats, it’s still not clear why an optimizer that is willing and able to modify it’s reward system would continue to optimize paperclips. If it’s intelligent enough to recognize the futility of wireheading, why isn’t it intelligent enough to recognize behavior that is inefficient wireheading?
It wouldn’t.
But I think this is such a basic failure mechanism that I don’t believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem—somehow! - is part of the “normal” development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can’t just set reward to
float.infinity
.You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it “somehow” doesn’t become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don’t have an AI catastrophe without that, because you don’t have an AI without that (at least not for long).
I think that is in large part due to signalling and social mores. Once people actually do get the ability to wirehead, in a way that does not kill or debilitate them soon afterwards, I expect that very many many people will choose to wirehead. This is similar to e.g. people professing they don’t want to live forever.