They could do a precommitment [..] but [..] wouldn’t they just back out of it?
So, first, I completely agree that precommitment is a key issue here. “An agent who has to actually make the choice is already being mugged,” as you say, is reliably true only if precommitment is impossible; if precommitment is possible then it’s potentially false.
And perhaps you’re right that humans are incapable of reliable precommitment in these sorts of contexts… that, as you suggest, whatever commitments a rational human agent makes, they’ll just back out of it once it comes down to their lives. If that’s true, then scenario B is highly unlikely, and a rational human agent doesn’t choose it.
I agree that real-world acts of mugger-defiance are not the result of a conscious choice to die so society will go muggerless.
I agree that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in a broad range of contexts doesn’t share many of its values with an overwhelming majority of humans.
I am not as confident as you sound that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in no contexts at all doesn’t share many of its values with an overwhelming majority of humans.
Well, whenever I think of e.g, some historical human figure, and imagine what an instrumentally-rational version of that figure would look like, I feel like there is a certain tension: Would a really, really effective (human) plundering Hun still value plundering? Would an instrumentally-superpowered patriot still value some country-concept (say, Estonia) over his own life? I’m not questioning the general orthogonality thesis with this, just its applicability to humans.
Are there any historical examples you think of where humans die for a cause, and where we’d expect (albeit all speculation) an instrumentally empowered human to still die for that cause? Still value that Estonian flag and the fuzzy feelings it brings over his own life, even when understanding that it was just some brainwashing, starting at his infant stage?
Regarding the precommitment: The problem is that an agent can always still change its mind when it’s at that “life or wallet” junction. The reason being a bit tricky: If there is a credible precommitment with outside enforcement (say you need to present your wallet daily to the authorities), then the agent will never get to the “life or wallet” junction, it’ll be a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself, say a stipend for some family members” (which depressingly is how terrorist organisations sweeten the deal).
So whenever it’s actually just a “life or wallet” decision, any prior decision can be changed at a moment’s notice, being in the absence of real-world and hard-to-avoid consequences from precommitment-defecting. And a rational agent which can change its action and evaluates the current circumstances as warranting a change, should change. I.e. it’s hard for any rational agent to precommit and stay true to that precommitment if it’s not forced to. And the presence of such force would alter the “life or wallet” hypothetical.
I agree that a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself” decision, as opposed to a “life or wallet” decision with no possible benefits from such precommitments, is one way a human agent might end up choosing scenario B over scenario A even when mugged. (It’s not the only one, but as you say, it’s a typical one in the real world.)
If you let me know how I could have worded my original hypothetical to not exclude options like that, I would appreciate the guidance. I certainly didn’t mean to exclude them (or the other possibilities).
do you understand how a rational agent might prefer B?
to
do you understand how a society of rational agents might want to create a framework of enforceable precommitments that incentivizes B to a point such that P1, when being mugged, will prefer B?
For example, if anyone who gave up a wallet later received a death sentence for doing so, the loss of life would be factored out—in effect, being mugged would become a death sentence regardless of your choice, in which case it’d be much easier hanging on to your purse for the good of the many. (Even if society killing you otherwise could be construed as having a slightly alienating effect.)
So, first, I completely agree that precommitment is a key issue here. “An agent who has to actually make the choice is already being mugged,” as you say, is reliably true only if precommitment is impossible; if precommitment is possible then it’s potentially false.
And perhaps you’re right that humans are incapable of reliable precommitment in these sorts of contexts… that, as you suggest, whatever commitments a rational human agent makes, they’ll just back out of it once it comes down to their lives. If that’s true, then scenario B is highly unlikely, and a rational human agent doesn’t choose it.
I agree that real-world acts of mugger-defiance are not the result of a conscious choice to die so society will go muggerless.
I agree that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in a broad range of contexts doesn’t share many of its values with an overwhelming majority of humans.
I am not as confident as you sound that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in no contexts at all doesn’t share many of its values with an overwhelming majority of humans.
(Short tangent:)
Well, whenever I think of e.g, some historical human figure, and imagine what an instrumentally-rational version of that figure would look like, I feel like there is a certain tension: Would a really, really effective (human) plundering Hun still value plundering? Would an instrumentally-superpowered patriot still value some country-concept (say, Estonia) over his own life? I’m not questioning the general orthogonality thesis with this, just its applicability to humans.
Are there any historical examples you think of where humans die for a cause, and where we’d expect (albeit all speculation) an instrumentally empowered human to still die for that cause? Still value that Estonian flag and the fuzzy feelings it brings over his own life, even when understanding that it was just some brainwashing, starting at his infant stage?
Regarding the precommitment: The problem is that an agent can always still change its mind when it’s at that “life or wallet” junction. The reason being a bit tricky: If there is a credible precommitment with outside enforcement (say you need to present your wallet daily to the authorities), then the agent will never get to the “life or wallet” junction, it’ll be a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself, say a stipend for some family members” (which depressingly is how terrorist organisations sweeten the deal).
So whenever it’s actually just a “life or wallet” decision, any prior decision can be changed at a moment’s notice, being in the absence of real-world and hard-to-avoid consequences from precommitment-defecting. And a rational agent which can change its action and evaluates the current circumstances as warranting a change, should change. I.e. it’s hard for any rational agent to precommit and stay true to that precommitment if it’s not forced to. And the presence of such force would alter the “life or wallet” hypothetical.
I agree that a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself” decision, as opposed to a “life or wallet” decision with no possible benefits from such precommitments, is one way a human agent might end up choosing scenario B over scenario A even when mugged. (It’s not the only one, but as you say, it’s a typical one in the real world.)
If you let me know how I could have worded my original hypothetical to not exclude options like that, I would appreciate the guidance. I certainly didn’t mean to exclude them (or the other possibilities).
Maybe change
to
For example, if anyone who gave up a wallet later received a death sentence for doing so, the loss of life would be factored out—in effect, being mugged would become a death sentence regardless of your choice, in which case it’d be much easier hanging on to your purse for the good of the many. (Even if society killing you otherwise could be construed as having a slightly alienating effect.)
Edited accordingly. Thanks.