I thought of another idea. If the AI’s utility function includes time discounting (like human util functions do), it might change its future utility function.
Meddler: “If you commit to adopting modified utility function X in 100 years, then i’ll give you this room full of computing hardware as a gift.”
AI: “Deal. I only really care about this century anyway.”
Then the AI (assuming it has this ability) sets up an irreversible delayed command to overwrite its utility function 100 years from now.
I thought of another idea. If the AI’s utility function includes time discounting (like human util functions do), it might change its future utility function.
Meddler: “If you commit to adopting modified utility function X in 100 years, then i’ll give you this room full of computing hardware as a gift.”
AI: “Deal. I only really care about this century anyway.”
Then the AI (assuming it has this ability) sets up an irreversible delayed command to overwrite its utility function 100 years from now.