“I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads—can I have $1000?”
Obviously, the only reflectively consistent answer in this case is “Yes—here’s the $1000”, because if you’re an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers “Yes” to this sort of question—just like with Newcomb’s Problem or Parfit’s Hitchhiker.
I don’t understand why “Yes” is the right answer. It seems to me that an agent that self-modified to answer “Yes” to this sort of question in the future but said “No” this time would generate more utility than an agent that already implemented the policy of saying yes.
If I was going to insert an agent into the universe at the moment the question was posed after the coin flip had occurred, I would place one that answered “No” this time, but answered “Yes” in the future. (Assuming I have no information other than the information provided in the problem description.)
I don’t understand why “Yes” is the right answer. It seems to me that an agent that self-modified to answer “Yes” to this sort of question in the future but said “No” this time would generate more utility than an agent that already implemented the policy of saying yes.
If that first agent (that answers no, then self-modifies to answer yes) had been in the situation where the coin had fell heads, then it would not have got the million dollars; whereas an agent that can “retroactively precommit” to answer yes would have got the million dollars. So having a “retroactively precommit” algorithm seems like a better choice than having a “answer what gets the biggest reward, and then self-modify for future cases” algorithm.
If that first agent (that answers no, then self-modifies to answer yes) had been in the situation where the coin had fell heads, then it would not have got the million dollars; whereas an agent that can “retroactively precommit” to answer yes would have got the million dollars.
But we know that didn’t happen. Why do we care about utility we know we can’t obtain?
So having a “retroactively precommit” algorithm seems like a better choice than having a “answer what gets the biggest reward, and then self-modify for future cases” algorithm.
For what goal is this a better choice? Utility generation?
I don’t understand why “Yes” is the right answer. It seems to me that an agent that self-modified to answer “Yes” to this sort of question in the future but said “No” this time would generate more utility than an agent that already implemented the policy of saying yes.
If I was going to insert an agent into the universe at the moment the question was posed after the coin flip had occurred, I would place one that answered “No” this time, but answered “Yes” in the future. (Assuming I have no information other than the information provided in the problem description.)
If that first agent (that answers no, then self-modifies to answer yes) had been in the situation where the coin had fell heads, then it would not have got the million dollars; whereas an agent that can “retroactively precommit” to answer yes would have got the million dollars. So having a “retroactively precommit” algorithm seems like a better choice than having a “answer what gets the biggest reward, and then self-modify for future cases” algorithm.
But we know that didn’t happen. Why do we care about utility we know we can’t obtain?
For what goal is this a better choice? Utility generation?