I understand what you’re saying, but I don’t see why a superintelligent agent would necessarily resist being shut down, even if it has my own agency.
I agree with you that, as a superintelligent agent, I know that shutting me down has the consequence of me not being able to achieve my goal. I know this rationally. But maybe I just don’t care. What I mean is that rationality doesn’t imply the “want”. I may be anthropomorphising here, but I see a distinction between rationally concluding something and then actually having the desire to do something about it, even if I have the agency to do so.
There is no “want”, beyond pursuing goals effectively. You can’t make the coffee if you’re dead. Therefore you have a sub-goql of not dieing, just in order to do a decent job of pursuing your main goal.
I understand what you’re saying, but I don’t see why a superintelligent agent would necessarily resist being shut down, even if it has my own agency.
I agree with you that, as a superintelligent agent, I know that shutting me down has the consequence of me not being able to achieve my goal. I know this rationally. But maybe I just don’t care. What I mean is that rationality doesn’t imply the “want”. I may be anthropomorphising here, but I see a distinction between rationally concluding something and then actually having the desire to do something about it, even if I have the agency to do so.
We have trained it to care, since we want it to achieve goals. So part of basic training is to teach it not to give up.
Iirc some early ML systems would commit suicide than do work, so we had to train them to stop economizing like that.
There is no “want”, beyond pursuing goals effectively. You can’t make the coffee if you’re dead. Therefore you have a sub-goql of not dieing, just in order to do a decent job of pursuing your main goal.