The relevant question is, how does surrendering, or not surrendering, control the probability of the ultimatum having been given? If it doesn’t, we should surrender. If the aliens sufficiently more likely wouldn’t make the ultimatum if we wouldn’t surrender if they did, we shouldn’t surrender. Furthermore, we should look for third options whose choosing could also control aliens’ actions.
Since this information is not given in the story, and the only thing we can go on is anthropomorphic intuition that we shouldn’t give in to blackmail (since having the property to not give in really does control the probability of getting blackmailed by humans), the correct answer wasn’t suggested, which defeats part of the appeal of a puzzle like this and can wreak some unnecessary memetic hazard.
For the same reason, focusing on whether one “cares about simulations” in this context is misleading, a false dilemma, since this is not the most relevant consideration. It’s like asking whether you should cross a road on prime-numbered minutes, and pointing out examples of people who did cross the road on prime-numbered minutes and were run over, instead of focusing on how you should respond to traffic lights.
Is this the correct interpretation of the first three sentences?
If the aliens are sufficiently less likely to present the ultimatum when they believe that we would not surrender upon being presented with the ultimatum, then we should not surrender.
That is, our decisions procedures should not return “surrender” in the situation where having a decision procedure that returns “surrender” increases the counter-factual prior probability of being presented the ultimatum, even after we have been given the ultimatum.
This correct decision of not surrendering when given the ultimatum (a decision which results in a loss of utility via the captives being tortured), if it is given or expected with sufficient certainty that “the aliens are more likely to not present the ultimatum if they think we will not surrender upon being presented the ultimatum” is analogous to the correct decision of paying the counterfactual mugger when given the losing outcome of a bet (a decision which results in a loss of utility via paying money), if it is provided or expected with sufficient certainty that “the counter-factual mugger would pay us if we would have won the bet given that the counter-factual mugger thought that we would pay em upon losing the bet”.
That is, in the same way that we act in accordance with how we have precommited to paying the losing counter-factual mugging bet, since doing so would have maximized our counterfactual prior expected utility, we now should act in accordance with how would we had precommited to not surrendering upon being presented the ultimatum, since doing so would have increased our counterfactual prior expected utility.
That is, the reflectively consistent algorithm to which a friendly AI would self-modify in advance of being presented with this situation is such that it would choose to let the captives be tortured in order to decrease the prior expectation of captives being tortured.
-
If all of that was correct, would a FAI modify to such a reflectively consistent decision procedure only on the condition of expecting to encounter such situations or unconditionally?
The relevant question is, how does surrendering, or not surrendering, control the probability of the ultimatum having been given? If it doesn’t, we should surrender. If the aliens sufficiently more likely wouldn’t make the ultimatum if we wouldn’t surrender if they did, we shouldn’t surrender. Furthermore, we should look for third options whose choosing could also control aliens’ actions.
Since this information is not given in the story, and the only thing we can go on is anthropomorphic intuition that we shouldn’t give in to blackmail (since having the property to not give in really does control the probability of getting blackmailed by humans), the correct answer wasn’t suggested, which defeats part of the appeal of a puzzle like this and can wreak some unnecessary memetic hazard.
For the same reason, focusing on whether one “cares about simulations” in this context is misleading, a false dilemma, since this is not the most relevant consideration. It’s like asking whether you should cross a road on prime-numbered minutes, and pointing out examples of people who did cross the road on prime-numbered minutes and were run over, instead of focusing on how you should respond to traffic lights.
Is this the correct interpretation of the first three sentences?
If the aliens are sufficiently less likely to present the ultimatum when they believe that we would not surrender upon being presented with the ultimatum, then we should not surrender.
That is, our decisions procedures should not return “surrender” in the situation where having a decision procedure that returns “surrender” increases the counter-factual prior probability of being presented the ultimatum, even after we have been given the ultimatum.
This correct decision of not surrendering when given the ultimatum (a decision which results in a loss of utility via the captives being tortured), if it is given or expected with sufficient certainty that “the aliens are more likely to not present the ultimatum if they think we will not surrender upon being presented the ultimatum” is analogous to the correct decision of paying the counterfactual mugger when given the losing outcome of a bet (a decision which results in a loss of utility via paying money), if it is provided or expected with sufficient certainty that “the counter-factual mugger would pay us if we would have won the bet given that the counter-factual mugger thought that we would pay em upon losing the bet”.
That is, in the same way that we act in accordance with how we have precommited to paying the losing counter-factual mugging bet, since doing so would have maximized our counterfactual prior expected utility, we now should act in accordance with how would we had precommited to not surrendering upon being presented the ultimatum, since doing so would have increased our counterfactual prior expected utility.
That is, the reflectively consistent algorithm to which a friendly AI would self-modify in advance of being presented with this situation is such that it would choose to let the captives be tortured in order to decrease the prior expectation of captives being tortured.
-
If all of that was correct, would a FAI modify to such a reflectively consistent decision procedure only on the condition of expecting to encounter such situations or unconditionally?