I’m getting this more clearly figured out. In the language of ambient control, we have:
You-program, Mailer-program, World-program, Your utility, Mailer utility
“Mailer” here doesn’t mean anything. Anyone could be a mailer.
It is simpler with one mailer but this can be extended to a multiple-mailer situation.
We write your utility as a function of your actions and the mailer’s actions based on ambient control. This allows us to consider what would happen if you changed one action and left everything else constant. If you would have a lower utility, we define this to be a “sacrificial action”.
A “policy” is a strategy in which one plays a sacrificial action in a certain class of situation.
A “workable policy” is a policy where playing it will induce the mailer to model you as an agent that plays that policy for a significant proportion of the times you play together, either for:
causal reasons—they see you play the policy and deduce you will probably continue to play it, or they see you not play it and deduce that you probably won’t
acausal reasons—they accurately model you and predict that you will/won’t use the policy.
A “beneficial workable policy” is when this modeling will increase your utility.
Depending on the costs/benefits, a beneficial workable policy could be rational or irrational, determined using normal decision theory. The name people use for it is unrelated—people have given in to and stood up against blackmail, they have given in to and stood up against terrorism, they have helped those who helped them or not helped them.
Not responding to blackmail is a specific kind of policy that is frequently, when dealing with humans, workable. It deals with a conceptual category that humans create without fundamental decision-theoretic relevance.
We write your utility as a function of your actions and the mailer’s actions based on ambient control. This allows us to consider what would happen if you changed one action and left everything else constant.
It doesn’t (at least not by varying one argument of that function), because of explicit dependence bias (this time I’m certain of it). Your action can acausally control the other agent’s action, so if you only resolve uncertainty about the parameter of utility function that corresponds to your action, you are being logically rude by not taking into account possible inferences about the other agent’s actions (the same way as CDT is logically rude in only considering the inferences that align with definition of physical causality). Form this, “sacrificial action” is not well-defined.
I think you’re mostly right. This suggests that a better policy than ‘don’t respond to blackmail’ is ‘don’t respond to blackmail if and only if you believe the blackmailer to be someone who is capable of accurately modelling you’.
Unfortunately this only works if you have perfect knowledge of blackmailers and cannot be fooled by one who pretends to be less intelligent than they actually are.
This also suggests a possible meta-strategy for blackmailers, namely “don’t allow considerations of whether someone will pay to affect your decision of whether to blackmail them”, since if blackmailers were known to do this then “don’t pay blackmailers” would no longer work.
I would also suggest that while blackmail works with some agents and not others, it isn’t human-specific. For example, poison arrow frogs seem like a good example of evolution using a similar strategy, having an adaptation that is in no way directly beneficial (and presumably is at least a little costly) that exists purely to minimize the utility of animals which do not do what it wants.
Unfortunately this only works if you have perfect knowledge of blackmailers and cannot be fooled by one who pretends to be less intelligent than they actually are.
Not perfect knowledge, just some knowledge together with awareness that you can’t reason from it in certain otherwise applicable heuristic ways because of the incentives to deceive.
Can I take it that since you criticized a criticism of this hypothesis without offering a criticism of your own, that you believe that this hypothesis is correct?
My comment was entirely local, targeting a popular argument that demands perfect knowledge where any knowledge would suffice, similarly to the rhetoric device of demanding absolute certainty where you were already presented with plenty of evidence.
It’s evidence that you have seen the comment that he’s replying to, in which I lay out my hypothesis for the answer to your original question. (You’ve provided an answer which seems incomplete.)
I’m getting this more clearly figured out. In the language of ambient control, we have: You-program, Mailer-program, World-program, Your utility, Mailer utility
“Mailer” here doesn’t mean anything. Anyone could be a mailer.
It is simpler with one mailer but this can be extended to a multiple-mailer situation.
We write your utility as a function of your actions and the mailer’s actions based on ambient control. This allows us to consider what would happen if you changed one action and left everything else constant. If you would have a lower utility, we define this to be a “sacrificial action”.
A “policy” is a strategy in which one plays a sacrificial action in a certain class of situation.
A “workable policy” is a policy where playing it will induce the mailer to model you as an agent that plays that policy for a significant proportion of the times you play together, either for:
causal reasons—they see you play the policy and deduce you will probably continue to play it, or they see you not play it and deduce that you probably won’t
acausal reasons—they accurately model you and predict that you will/won’t use the policy.
A “beneficial workable policy” is when this modeling will increase your utility.
Depending on the costs/benefits, a beneficial workable policy could be rational or irrational, determined using normal decision theory. The name people use for it is unrelated—people have given in to and stood up against blackmail, they have given in to and stood up against terrorism, they have helped those who helped them or not helped them.
Not responding to blackmail is a specific kind of policy that is frequently, when dealing with humans, workable. It deals with a conceptual category that humans create without fundamental decision-theoretic relevance.
It doesn’t (at least not by varying one argument of that function), because of explicit dependence bias (this time I’m certain of it). Your action can acausally control the other agent’s action, so if you only resolve uncertainty about the parameter of utility function that corresponds to your action, you are being logically rude by not taking into account possible inferences about the other agent’s actions (the same way as CDT is logically rude in only considering the inferences that align with definition of physical causality). Form this, “sacrificial action” is not well-defined.
I think you’re mostly right. This suggests that a better policy than ‘don’t respond to blackmail’ is ‘don’t respond to blackmail if and only if you believe the blackmailer to be someone who is capable of accurately modelling you’.
Unfortunately this only works if you have perfect knowledge of blackmailers and cannot be fooled by one who pretends to be less intelligent than they actually are.
This also suggests a possible meta-strategy for blackmailers, namely “don’t allow considerations of whether someone will pay to affect your decision of whether to blackmail them”, since if blackmailers were known to do this then “don’t pay blackmailers” would no longer work.
I would also suggest that while blackmail works with some agents and not others, it isn’t human-specific. For example, poison arrow frogs seem like a good example of evolution using a similar strategy, having an adaptation that is in no way directly beneficial (and presumably is at least a little costly) that exists purely to minimize the utility of animals which do not do what it wants.
Not perfect knowledge, just some knowledge together with awareness that you can’t reason from it in certain otherwise applicable heuristic ways because of the incentives to deceive.
Yes, that’s what I meant. I have a bad habit of saying ‘perfect knowledge’ where I mean ‘enough knowledge’.
Can I take it that since you criticized a criticism of this hypothesis without offering a criticism of your own, that you believe that this hypothesis is correct?
What hypothesis?
My comment was entirely local, targeting a popular argument that demands perfect knowledge where any knowledge would suffice, similarly to the rhetoric device of demanding absolute certainty where you were already presented with plenty of evidence.
It’s evidence that you have seen the comment that he’s replying to, in which I lay out my hypothesis for the answer to your original question. (You’ve provided an answer which seems incomplete.)