Suppose you learn about physics and find that you are a robot. You learn that your source code is “A”. You also believe that you have free will; in particular, you may decide to take either action X or action Y.
My motivation for talking about logical counterfactuals has little to do with free will, even if the philosophical analysis of logical counterfactuals does.
The reason I want to talk about logical counterfactuals is as follows: suppose as above that I learn that I am a robot, and that my source code is “A”(which is presumed to be deterministic in this scenario), and that I have a decision to make between action X and action Y. In order to make that decision, I want to know which decision has better expected utility. The problem is that, in fact, I will either choose X or Y. Suppose without loss of generality that I will end up choosing action X. Then worlds in which I choose Y are logically incoherent, so how am I supposed to reason about the expected utility of choosing Y?
I’m not using “free will” to mean something distinct from “the ability of an agent, from its perspective, to choose one of multiple possible actions”. Maybe this usage is nonstandard but find/replace yields the right meaning.
Then worlds in which I choose Y are logically incoherent
From an omniscient point of view, or from your point of view? The typical agent has imperfect knowledge of both the inputs to their decision procedure, and the procedure itself. So long as an agent treats what it thinks is happening, as only one possibility, then there is not contradiction because possible-X is always compatible with possibly not-X.
From an omniscient point of view, yes. From my point of view, probably not, but there are still problems that arise relating to this, that can cause logic-based agents to get very confused.
Let A be an agent, considering options X and not-X. Suppose A |- Action=not-X → Utility=0. The naive approach to this would be to say: if A |- Action=X → Utility<0, A will do not-X, and if A |- Action=X → Utility>0, A will do X. Suppose further that A knows its source code, so it knows this is the case. Consider the statement G=(A |- G) → (Action=X → Utility<0). It can be constructed by using Godel-numbering and quines. Present A with the following argument:
Suppose for the sake of argument that A |- G. Then A |- (A |- G), since A knows its source code. Also, by definition of G, A |- (A |- G) → (Action=X → Utility<0). By modus ponens, A |- (Action=X → Utility<0). Therefore, by our assumption about A, A will do not-X: Action!=X. But, vacuously, this means that (Action=X → Utility<0). Since we have proved this by assuming A |- G, we know that (A |- G) → (Action=X → Utility<0), in other words, we know G.
The argument then goes, similarly to above: A |- G A |- (A |- G) A |- (A |- G) → (Action=X → Utility<0) A |- (Action=X → Utility<0) Action=Not-X
We proved this without knowing anything about X. This shows that naive logical implication can easily lead one astray. The standard solution to this problem is the chicken rule, making it so that if A ever proves which action it will take, it will immediately take the opposite action, which avoids the argument presented above, but is defeated by Troll Bridge, even when the agent has good logical uncertainty.
These problems seem to me to show that logical uncertainty about the action one will take, paired with logical implications about what the result will be if you take a particular action, are insufficient to describe a good decision theory.
My motivation for talking about logical counterfactuals has little to do with free will, even if the philosophical analysis of logical counterfactuals does.
The reason I want to talk about logical counterfactuals is as follows: suppose as above that I learn that I am a robot, and that my source code is “A”(which is presumed to be deterministic in this scenario), and that I have a decision to make between action X and action Y. In order to make that decision, I want to know which decision has better expected utility. The problem is that, in fact, I will either choose X or Y. Suppose without loss of generality that I will end up choosing action X. Then worlds in which I choose Y are logically incoherent, so how am I supposed to reason about the expected utility of choosing Y?
I’m not using “free will” to mean something distinct from “the ability of an agent, from its perspective, to choose one of multiple possible actions”. Maybe this usage is nonstandard but find/replace yields the right meaning.
I think using the term in that way, without explicitly defining it, makes the discussion more confused
From an omniscient point of view, or from your point of view? The typical agent has imperfect knowledge of both the inputs to their decision procedure, and the procedure itself. So long as an agent treats what it thinks is happening, as only one possibility, then there is not contradiction because possible-X is always compatible with possibly not-X.
From an omniscient point of view, yes. From my point of view, probably not, but there are still problems that arise relating to this, that can cause logic-based agents to get very confused.
Let A be an agent, considering options X and not-X. Suppose A |- Action=not-X → Utility=0. The naive approach to this would be to say: if A |- Action=X → Utility<0, A will do not-X, and if A |- Action=X → Utility>0, A will do X. Suppose further that A knows its source code, so it knows this is the case.
Consider the statement G=(A |- G) → (Action=X → Utility<0). It can be constructed by using Godel-numbering and quines. Present A with the following argument:
Suppose for the sake of argument that A |- G. Then A |- (A |- G), since A knows its source code. Also, by definition of G, A |- (A |- G) → (Action=X → Utility<0). By modus ponens, A |- (Action=X → Utility<0). Therefore, by our assumption about A, A will do not-X: Action!=X. But, vacuously, this means that (Action=X → Utility<0). Since we have proved this by assuming A |- G, we know that (A |- G) → (Action=X → Utility<0), in other words, we know G.
The argument then goes, similarly to above:
A |- G
A |- (A |- G)
A |- (A |- G) → (Action=X → Utility<0)
A |- (Action=X → Utility<0)
Action=Not-X
We proved this without knowing anything about X. This shows that naive logical implication can easily lead one astray. The standard solution to this problem is the chicken rule, making it so that if A ever proves which action it will take, it will immediately take the opposite action, which avoids the argument presented above, but is defeated by Troll Bridge, even when the agent has good logical uncertainty.
These problems seem to me to show that logical uncertainty about the action one will take, paired with logical implications about what the result will be if you take a particular action, are insufficient to describe a good decision theory.
I am not aware of a good reason to believe that a perfect decision theory is even possible, or that counterfactuals of any sort are the main obstacle.