Humans seem to have a built-in solution to this dilemma, in that if I were presented with this situation and another human, where the payoff was something like minus ten, zero, or plus ten cents for me, versus insta-death, nothing, or ten billion dollars for the other human, I would voluntarily let the other person win and I would expect the other person to do the same to me if our situations were reversed. This means humans playing against other humans will all do exceptionally well in these sorts of dilemmas.
So this seems like an intelligent decision theoretic design choice, along the lines of “Precommit to maximizing the gains of the agent with the high gains now, in the hope of acausally influencing the other agent to do the same, thus making us both better off if we ever end up in a true prisoner’s dilemma with skewed payoff matrix.”
If I believe the alien to be sufficiently intelligent/well-programmed, and if I expect the alien to believe me to also be sufficiently intelligent/well-programmed, I would at least consider the alien graciously letting me win the first option in exchange for my letting the alien win the second. Even if only one of the two options is ever presented, and the second is the same sort of relevant hypothetical as a Counterfactual Mugging.
Yes, humans performing outstandingly well in this sort of problem was my inspiration for this. I am not sure how far it is possible to generalize this sort of winning. Humans themselves are kinda complex machines, so, if we start with perfectly rational LW reader and paperclip maximizer with one-shot PD with randomized payoff matrix, what’s the least amount of handicaps we need to give them to reach this super-optimal solution? At first, I thought we could even remove the randomization alltogether, but it is making the whole problem more ambiguous I think.
Humans seem to have a built-in solution to this dilemma, in that if I were presented with this situation and another human, where the payoff was something like minus ten, zero, or plus ten cents for me, versus insta-death, nothing, or ten billion dollars for the other human, I would voluntarily let the other person win and I would expect the other person to do the same to me if our situations were reversed. This means humans playing against other humans will all do exceptionally well in these sorts of dilemmas.
So this seems like an intelligent decision theoretic design choice, along the lines of “Precommit to maximizing the gains of the agent with the high gains now, in the hope of acausally influencing the other agent to do the same, thus making us both better off if we ever end up in a true prisoner’s dilemma with skewed payoff matrix.”
If I believe the alien to be sufficiently intelligent/well-programmed, and if I expect the alien to believe me to also be sufficiently intelligent/well-programmed, I would at least consider the alien graciously letting me win the first option in exchange for my letting the alien win the second. Even if only one of the two options is ever presented, and the second is the same sort of relevant hypothetical as a Counterfactual Mugging.
Yes, humans performing outstandingly well in this sort of problem was my inspiration for this. I am not sure how far it is possible to generalize this sort of winning. Humans themselves are kinda complex machines, so, if we start with perfectly rational LW reader and paperclip maximizer with one-shot PD with randomized payoff matrix, what’s the least amount of handicaps we need to give them to reach this super-optimal solution? At first, I thought we could even remove the randomization alltogether, but it is making the whole problem more ambiguous I think.