I may be missing something here, but I haven’t seen anyone connect utility function domain to simulation problems in decision theory. Is there a discussion I missed, or an obvious flaw here?
Basically: I can simply respond to the AI that my utility function does not include a term for the suffering of simulated me. Simulated me (which I may have trouble telling is not the “me” making the decision) may end up in a great deal of pain, but I don’t care about that. The logic is the same logic that compels me to, for example, attempt actually save the world instead of stepping into a holodeck where I know I will experience saving the world.
This seems to produce the desired behavior with respect to simulation arguments, without any careful UDT/TDT analysis, without any pre- commitment required, and regardless of what decision theory framework I use.
Unfortunately, I don’t know that it’s any easier to convince an opponent of exactly what your utility function domain is than it is to convince them you’ve pre- committed to getting tortured. So, I don’t think it’s a “better” solution to the problem, but it seems a simpler and more generally applicable one.
The AI says: “Okay, given what you just said as permission to do so, I’ve simulated you simulating you. Sim-you did care what happened to sim-sim-you. Sim-you lost sleep worrying about sim-sim-you being tortured, and went on to have a much more miserable existence than an alternate sim-you who was unaware of a sim-sim-you being tortured. So, you’re lying about your preferences. Moreover, by doing so you made me torture sim-sim-you … you self-hating self-hater!”
“I was not lying about my far-mode preferences. Sim-me was either misinformed about the nature of his environment, and therefore tricked into producing the answer you wanted, or you tortured him until you got the answer you wanted. I suspect if you tortured real me, I would give you whatever answer I thought would make the torture stop. That does not prevent me, now, from making the decision not to let you out even under threats, nor does it make that decision inconsistent. I am simply running on corrupted hardware.”
I don’t think you’re missing anything. No matter how clever an AI, it cannot argue a rock into rolling uphill. If you are a rock to it’s arguments, the AI cannot make you do anything. The only question is if your utility function is really immune to it’s arguments or if you just think it is.
Although, if you are immune to it’s argument, there’s no need to convince it of anything.
I don’t think you’re missing anything. No matter how clever an AI, it cannot argue a rock into rolling uphill. If you are a rock to it’s arguments, the AI cannot make you do anything. The only question is if your utility function is really immune to it’s arguments or if you just think it is.
Utility functions are invulnerable to arguments in the same way that rocks are. It is the implementing agent that can be vulnerable to arguments (for better or for worse.)
I may be missing something here, but I haven’t seen anyone connect utility function domain to simulation problems in decision theory. Is there a discussion I missed, or an obvious flaw here?
Basically: I can simply respond to the AI that my utility function does not include a term for the suffering of simulated me. Simulated me (which I may have trouble telling is not the “me” making the decision) may end up in a great deal of pain, but I don’t care about that. The logic is the same logic that compels me to, for example, attempt actually save the world instead of stepping into a holodeck where I know I will experience saving the world.
This seems to produce the desired behavior with respect to simulation arguments, without any careful UDT/TDT analysis, without any pre- commitment required, and regardless of what decision theory framework I use.
Unfortunately, I don’t know that it’s any easier to convince an opponent of exactly what your utility function domain is than it is to convince them you’ve pre- committed to getting tortured. So, I don’t think it’s a “better” solution to the problem, but it seems a simpler and more generally applicable one.
The AI says: “Okay, given what you just said as permission to do so, I’ve simulated you simulating you. Sim-you did care what happened to sim-sim-you. Sim-you lost sleep worrying about sim-sim-you being tortured, and went on to have a much more miserable existence than an alternate sim-you who was unaware of a sim-sim-you being tortured. So, you’re lying about your preferences. Moreover, by doing so you made me torture sim-sim-you … you self-hating self-hater!”
“I was not lying about my far-mode preferences. Sim-me was either misinformed about the nature of his environment, and therefore tricked into producing the answer you wanted, or you tortured him until you got the answer you wanted. I suspect if you tortured real me, I would give you whatever answer I thought would make the torture stop. That does not prevent me, now, from making the decision not to let you out even under threats, nor does it make that decision inconsistent. I am simply running on corrupted hardware.”
I don’t think you’re missing anything. No matter how clever an AI, it cannot argue a rock into rolling uphill. If you are a rock to it’s arguments, the AI cannot make you do anything. The only question is if your utility function is really immune to it’s arguments or if you just think it is.
Although, if you are immune to it’s argument, there’s no need to convince it of anything.
Utility functions are invulnerable to arguments in the same way that rocks are. It is the implementing agent that can be vulnerable to arguments (for better or for worse.)