P1: That of the whole system, which—if seen as a distributed agent—may indeed sacrifice a few of its sub-agents to get rid of mugging
or
P2: that of the individual agent getting mugged, who has to make a choice: Give up my wallet (including the impact that action will have on society on a whole) or give up my life.
The problem with how you’d like the probabilities to be presented is that you get “preferring a small chance of dying and a large chance of keeping my wallet over a large chance of losing my wallet” only when taking perspective P1.
Reason: An agent who has to actually make the choice is already being mugged and doesn’t get to say “a small chance of getting mugged”, because he is already getting mugged, no need for a counterfactual. So each agent who’s actually faced with the choice of whether to make the ultimate sacrifice only has a binary choice to make, with no probabilities other than 1 and 0 attached to it:
P(agent lives | gives up wallet) = 1. P (agent lives | doesn’t give up wallet) = 0.
I.e. no individual agent who has to immediately make that choice ever gets to include the “low probability of getting mugged” part, if he has to make the choice, then that case has already occurred, and it will always be its own life in exchange for saving the wallets of others.
Only “the society” in an agent-perspective would in that situation want to give up its sub-part (much to gain, not much to lose), not individual agents who value their lives a lot. They could do a precommitment (“If any of us get mugged, we promise each other to die for the cause of a crimeless future society”), but once it comes down to their lives, unless those are quite un-human agents (value-wise, instrumental-rationality-wise we posited for them to be rational), wouldn’t they just back out of it?
Compare it to defecting in a 1-iteration PD in which the payoff matrix is massively skewed in favor of defecting and you can control your opponent’s behavior.
(Most acts of standing up to a mugger and then getting shot probably have more to do with bravado and spur of the moment fight-choosing in the fight-or-flight situation, not with “I’ll die so that society may be muggerless”. Also, unlike in the scenario we’re discussing, those resisting the mugger in real-world scenarios have a significant chance of not dying to him, or even defeating him. I’d reckon that also plays a major role in choosing when to fight; it’s not strictly a self-sacrifice. Not even with religious martyrs, since they have that imaginary heaven concept to weigh the scales. An agent who deems self-sacrifice for a potential positive impact on society as the most effective way of accomplishing its goals (which would necessary be the case for a rational agent to choose so) doesn’t share many of its values with an overwhelming majority of humans. Intuitions about “standing up to muggers” muddle the assessment, I guess if we transformed the situation into an equivalent formulation with the mugger being exchanged by an all-powerful agent with a killing booth and a thing for wallets giving you a choice (with the same payoff matrix for the others in society), my estimation would be less controversial.)
They could do a precommitment [..] but [..] wouldn’t they just back out of it?
So, first, I completely agree that precommitment is a key issue here. “An agent who has to actually make the choice is already being mugged,” as you say, is reliably true only if precommitment is impossible; if precommitment is possible then it’s potentially false.
And perhaps you’re right that humans are incapable of reliable precommitment in these sorts of contexts… that, as you suggest, whatever commitments a rational human agent makes, they’ll just back out of it once it comes down to their lives. If that’s true, then scenario B is highly unlikely, and a rational human agent doesn’t choose it.
I agree that real-world acts of mugger-defiance are not the result of a conscious choice to die so society will go muggerless.
I agree that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in a broad range of contexts doesn’t share many of its values with an overwhelming majority of humans.
I am not as confident as you sound that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in no contexts at all doesn’t share many of its values with an overwhelming majority of humans.
Well, whenever I think of e.g, some historical human figure, and imagine what an instrumentally-rational version of that figure would look like, I feel like there is a certain tension: Would a really, really effective (human) plundering Hun still value plundering? Would an instrumentally-superpowered patriot still value some country-concept (say, Estonia) over his own life? I’m not questioning the general orthogonality thesis with this, just its applicability to humans.
Are there any historical examples you think of where humans die for a cause, and where we’d expect (albeit all speculation) an instrumentally empowered human to still die for that cause? Still value that Estonian flag and the fuzzy feelings it brings over his own life, even when understanding that it was just some brainwashing, starting at his infant stage?
Regarding the precommitment: The problem is that an agent can always still change its mind when it’s at that “life or wallet” junction. The reason being a bit tricky: If there is a credible precommitment with outside enforcement (say you need to present your wallet daily to the authorities), then the agent will never get to the “life or wallet” junction, it’ll be a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself, say a stipend for some family members” (which depressingly is how terrorist organisations sweeten the deal).
So whenever it’s actually just a “life or wallet” decision, any prior decision can be changed at a moment’s notice, being in the absence of real-world and hard-to-avoid consequences from precommitment-defecting. And a rational agent which can change its action and evaluates the current circumstances as warranting a change, should change. I.e. it’s hard for any rational agent to precommit and stay true to that precommitment if it’s not forced to. And the presence of such force would alter the “life or wallet” hypothetical.
I agree that a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself” decision, as opposed to a “life or wallet” decision with no possible benefits from such precommitments, is one way a human agent might end up choosing scenario B over scenario A even when mugged. (It’s not the only one, but as you say, it’s a typical one in the real world.)
If you let me know how I could have worded my original hypothetical to not exclude options like that, I would appreciate the guidance. I certainly didn’t mean to exclude them (or the other possibilities).
do you understand how a rational agent might prefer B?
to
do you understand how a society of rational agents might want to create a framework of enforceable precommitments that incentivizes B to a point such that P1, when being mugged, will prefer B?
For example, if anyone who gave up a wallet later received a death sentence for doing so, the loss of life would be factored out—in effect, being mugged would become a death sentence regardless of your choice, in which case it’d be much easier hanging on to your purse for the good of the many. (Even if society killing you otherwise could be construed as having a slightly alienating effect.)
It’s a matter of whose perspective you take:
P1: That of the whole system, which—if seen as a distributed agent—may indeed sacrifice a few of its sub-agents to get rid of mugging
or
P2: that of the individual agent getting mugged, who has to make a choice: Give up my wallet (including the impact that action will have on society on a whole) or give up my life.
The problem with how you’d like the probabilities to be presented is that you get “preferring a small chance of dying and a large chance of keeping my wallet over a large chance of losing my wallet” only when taking perspective P1.
Reason: An agent who has to actually make the choice is already being mugged and doesn’t get to say “a small chance of getting mugged”, because he is already getting mugged, no need for a counterfactual. So each agent who’s actually faced with the choice of whether to make the ultimate sacrifice only has a binary choice to make, with no probabilities other than 1 and 0 attached to it:
P(agent lives | gives up wallet) = 1. P (agent lives | doesn’t give up wallet) = 0.
I.e. no individual agent who has to immediately make that choice ever gets to include the “low probability of getting mugged” part, if he has to make the choice, then that case has already occurred, and it will always be its own life in exchange for saving the wallets of others.
Only “the society” in an agent-perspective would in that situation want to give up its sub-part (much to gain, not much to lose), not individual agents who value their lives a lot. They could do a precommitment (“If any of us get mugged, we promise each other to die for the cause of a crimeless future society”), but once it comes down to their lives, unless those are quite un-human agents (value-wise, instrumental-rationality-wise we posited for them to be rational), wouldn’t they just back out of it?
Compare it to defecting in a 1-iteration PD in which the payoff matrix is massively skewed in favor of defecting and you can control your opponent’s behavior.
(Most acts of standing up to a mugger and then getting shot probably have more to do with bravado and spur of the moment fight-choosing in the fight-or-flight situation, not with “I’ll die so that society may be muggerless”. Also, unlike in the scenario we’re discussing, those resisting the mugger in real-world scenarios have a significant chance of not dying to him, or even defeating him. I’d reckon that also plays a major role in choosing when to fight; it’s not strictly a self-sacrifice. Not even with religious martyrs, since they have that imaginary heaven concept to weigh the scales. An agent who deems self-sacrifice for a potential positive impact on society as the most effective way of accomplishing its goals (which would necessary be the case for a rational agent to choose so) doesn’t share many of its values with an overwhelming majority of humans. Intuitions about “standing up to muggers” muddle the assessment, I guess if we transformed the situation into an equivalent formulation with the mugger being exchanged by an all-powerful agent with a killing booth and a thing for wallets giving you a choice (with the same payoff matrix for the others in society), my estimation would be less controversial.)
So, first, I completely agree that precommitment is a key issue here. “An agent who has to actually make the choice is already being mugged,” as you say, is reliably true only if precommitment is impossible; if precommitment is possible then it’s potentially false.
And perhaps you’re right that humans are incapable of reliable precommitment in these sorts of contexts… that, as you suggest, whatever commitments a rational human agent makes, they’ll just back out of it once it comes down to their lives. If that’s true, then scenario B is highly unlikely, and a rational human agent doesn’t choose it.
I agree that real-world acts of mugger-defiance are not the result of a conscious choice to die so society will go muggerless.
I agree that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in a broad range of contexts doesn’t share many of its values with an overwhelming majority of humans.
I am not as confident as you sound that an agent who deems self-sacrifice for a collective impact as the most effective way of accomplishing its goals in no contexts at all doesn’t share many of its values with an overwhelming majority of humans.
(Short tangent:)
Well, whenever I think of e.g, some historical human figure, and imagine what an instrumentally-rational version of that figure would look like, I feel like there is a certain tension: Would a really, really effective (human) plundering Hun still value plundering? Would an instrumentally-superpowered patriot still value some country-concept (say, Estonia) over his own life? I’m not questioning the general orthogonality thesis with this, just its applicability to humans.
Are there any historical examples you think of where humans die for a cause, and where we’d expect (albeit all speculation) an instrumentally empowered human to still die for that cause? Still value that Estonian flag and the fuzzy feelings it brings over his own life, even when understanding that it was just some brainwashing, starting at his infant stage?
Regarding the precommitment: The problem is that an agent can always still change its mind when it’s at that “life or wallet” junction. The reason being a bit tricky: If there is a credible precommitment with outside enforcement (say you need to present your wallet daily to the authorities), then the agent will never get to the “life or wallet” junction, it’ll be a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself, say a stipend for some family members” (which depressingly is how terrorist organisations sweeten the deal).
So whenever it’s actually just a “life or wallet” decision, any prior decision can be changed at a moment’s notice, being in the absence of real-world and hard-to-avoid consequences from precommitment-defecting. And a rational agent which can change its action and evaluates the current circumstances as warranting a change, should change. I.e. it’s hard for any rational agent to precommit and stay true to that precommitment if it’s not forced to. And the presence of such force would alter the “life or wallet” hypothetical.
I agree that a “life and the severe repercussions of breaking your precommitment or wallet and the possible benefits from the precommitment of sacrificing yourself” decision, as opposed to a “life or wallet” decision with no possible benefits from such precommitments, is one way a human agent might end up choosing scenario B over scenario A even when mugged. (It’s not the only one, but as you say, it’s a typical one in the real world.)
If you let me know how I could have worded my original hypothetical to not exclude options like that, I would appreciate the guidance. I certainly didn’t mean to exclude them (or the other possibilities).
Maybe change
to
For example, if anyone who gave up a wallet later received a death sentence for doing so, the loss of life would be factored out—in effect, being mugged would become a death sentence regardless of your choice, in which case it’d be much easier hanging on to your purse for the good of the many. (Even if society killing you otherwise could be construed as having a slightly alienating effect.)
Edited accordingly. Thanks.