It is easy to see that this idea of logical counterfactuals is unsatisfactory. For one, no good account of them has yet been given. For two, there is a sense in which no account could be given; reasoning about logically incoherent worlds can only be so extensive before running into logical contradiction.
I’ve been doing some work on this topic, and I am seeing two schools of thought on how to deal with the problem of logical contradictions you mention. To explain these, I’ll use an example counterfactual not involving agents and free will. Consider the counterfactual sentence: `if the vase had not been broken, the floor would not have been wet’. Now, how can we compute a truth value for this sentence?
School of thought 1 proceeds as follows: we know various facts about the world, like that the vase is broken and that the floor is wet. We also know general facts about vases, breaking, water, and floors. Now we add the extra fact that the vase is not broken to our knowledge base. Based on this extended body of knowledge, we compute the truth value of the claim ‘the floor is not wet’. Clearly, we are dealing with a knowledge base that contains mutually contradictory facts: the vase is both broken and it is not broken. Under normal mathematical systems of reasoning, this will allow us to prove any claim we like: the truth value of any sentence becomes 1, which is not what we want. Now, school 1 tries to solve this by coming up with new systems of reasoning that are tolerant of such internal contradictions, systems that will make computations that will produce the ‘obviously true’ conclusions only, of that will derive the `obviously true’ conclusions before deriving the `obviously false’ ones, or that compute probabilistic truth values such a way that those of the `obviously true’ conclusions are higher. In MIRI terminology, I believe this approach goes under the heading ‘decision theory’. I also interpret the two alternative solutions you mention above as following this school of thought. Personally, I find this solution approach not very promising or compelling.
School of thought 2, which includes Pearl’s version of counterfactual reasoning, says that if you want to reason (or if you want a machine to reason) in a counterfactual way, you should not just add facts to the body of knowledge you use. You need to delete or edit other facts in the knowledge base too, before you supply it to the reasoning engine, exactly to avoid inputting a knowledge base that has internal contradictions. For example, if you want to reason about ‘if the vase had not been broken’, one thing you definitely need to do is to first remove the statement (or any information leading to the conclusion that) `the vase is broken’ from the knowledge base that goes into your reasoning engine. You have to do this even though the fact that the vase is broken is obviously true for the current world you are in.
So school 2 avoids the problem of having to somehow build a reasoning engine that does the right thing even when a contradictory knowledge base is input. But it trades this for the problem of having to decide exactly what edits will be made to the knowledge base to eliminate the possibility of having such contradictions. In other words, if you want a machine to reason in a counterfactual way, you have to make choices about the specific edits you will make. Often, there are many possible choices, and different choices may lead to different probability distributions in the outcomes computed. This choice problem does not bother me that much, I see it as having design freedom. But if you are a philosopher of language trying to find a single obvious system of meaning for natural language counterfactual sentences, this choice problem might bother you a lot, you might be tempted to find some kind of representation-independent Occam’s razor that can be used to decide between counterfactual edits.
Overall, my feeling is that school 2 gives an account of logical counterfactuals that is good enough for my purposes in AGI safety work.
As a trivial school 1 edge case, one could design a reasoning engine that can deal with contradictory facts in its input knowledge base as follows: the engine first makes some school 2 edits on its input to remove the contradictions, and then proceeds calculating the requested truth value. So one could argue that the schools are not fundamentally different, though I do feel they are different in outlook, especially in their outlook on how necessary or useful it will be for AGI safety to resolve certain puzzles.
I was hoping somebody would come up with more schools… I think I could interpret the techniques of school 3 as a particular way to implement the `make some edits before you input it into the reasoning engine engine’ prescription of school 2, but maybe school 3 is different from school 2 in how it would describe its solution direction.
There is definitely also a school 4 (or maybe you would say this is the same one as school 3) which considers it to be an obvious truth that that when you run simulations or start up a sandbox, you can supply any starting world state that you like, and there is nothing strange or paradoxical about this. Specifically, if you are an agent considering a choice between taking actions A, B, and C as the next action, you can run different simulations to extrapolate the results of each. If a self-aware agent inside the simulation for action B computes the action that an optimal agent would have taken at the point in time where its simulation started was A, this agent cannot conclude there is a contradiction: such a conclusion would rest on making a category error. (See my answer in this post for a longer discussion of the topic.)
I’ve been doing some work on this topic, and I am seeing two schools of thought on how to deal with the problem of logical contradictions you mention. To explain these, I’ll use an example counterfactual not involving agents and free will. Consider the counterfactual sentence: `if the vase had not been broken, the floor would not have been wet’. Now, how can we compute a truth value for this sentence?
School of thought 1 proceeds as follows: we know various facts about the world, like that the vase is broken and that the floor is wet. We also know general facts about vases, breaking, water, and floors. Now we add the extra fact that the vase is not broken to our knowledge base. Based on this extended body of knowledge, we compute the truth value of the claim ‘the floor is not wet’. Clearly, we are dealing with a knowledge base that contains mutually contradictory facts: the vase is both broken and it is not broken. Under normal mathematical systems of reasoning, this will allow us to prove any claim we like: the truth value of any sentence becomes 1, which is not what we want. Now, school 1 tries to solve this by coming up with new systems of reasoning that are tolerant of such internal contradictions, systems that will make computations that will produce the ‘obviously true’ conclusions only, of that will derive the `obviously true’ conclusions before deriving the `obviously false’ ones, or that compute probabilistic truth values such a way that those of the `obviously true’ conclusions are higher. In MIRI terminology, I believe this approach goes under the heading ‘decision theory’. I also interpret the two alternative solutions you mention above as following this school of thought. Personally, I find this solution approach not very promising or compelling.
School of thought 2, which includes Pearl’s version of counterfactual reasoning, says that if you want to reason (or if you want a machine to reason) in a counterfactual way, you should not just add facts to the body of knowledge you use. You need to delete or edit other facts in the knowledge base too, before you supply it to the reasoning engine, exactly to avoid inputting a knowledge base that has internal contradictions. For example, if you want to reason about ‘if the vase had not been broken’, one thing you definitely need to do is to first remove the statement (or any information leading to the conclusion that) `the vase is broken’ from the knowledge base that goes into your reasoning engine. You have to do this even though the fact that the vase is broken is obviously true for the current world you are in.
So school 2 avoids the problem of having to somehow build a reasoning engine that does the right thing even when a contradictory knowledge base is input. But it trades this for the problem of having to decide exactly what edits will be made to the knowledge base to eliminate the possibility of having such contradictions. In other words, if you want a machine to reason in a counterfactual way, you have to make choices about the specific edits you will make. Often, there are many possible choices, and different choices may lead to different probability distributions in the outcomes computed. This choice problem does not bother me that much, I see it as having design freedom. But if you are a philosopher of language trying to find a single obvious system of meaning for natural language counterfactual sentences, this choice problem might bother you a lot, you might be tempted to find some kind of representation-independent Occam’s razor that can be used to decide between counterfactual edits.
Overall, my feeling is that school 2 gives an account of logical counterfactuals that is good enough for my purposes in AGI safety work.
As a trivial school 1 edge case, one could design a reasoning engine that can deal with contradictory facts in its input knowledge base as follows: the engine first makes some school 2 edits on its input to remove the contradictions, and then proceeds calculating the requested truth value. So one could argue that the schools are not fundamentally different, though I do feel they are different in outlook, especially in their outlook on how necessary or useful it will be for AGI safety to resolve certain puzzles.
What about school 3, the one that solves the problem with compartmentalisation/sandboxing?
I was hoping somebody would come up with more schools… I think I could interpret the techniques of school 3 as a particular way to implement the `make some edits before you input it into the reasoning engine engine’ prescription of school 2, but maybe school 3 is different from school 2 in how it would describe its solution direction.
There is definitely also a school 4 (or maybe you would say this is the same one as school 3) which considers it to be an obvious truth that that when you run simulations or start up a sandbox, you can supply any starting world state that you like, and there is nothing strange or paradoxical about this. Specifically, if you are an agent considering a choice between taking actions A, B, and C as the next action, you can run different simulations to extrapolate the results of each. If a self-aware agent inside the simulation for action B computes the action that an optimal agent would have taken at the point in time where its simulation started was A, this agent cannot conclude there is a contradiction: such a conclusion would rest on making a category error. (See my answer in this post for a longer discussion of the topic.)