Hmm. I’m not following. It seems like you follow the chain of reasoning and agree with the conclusion:
The algorithm doesn’t try to select an assignment with largest U(), but rather just outputs 5 if there’s a valid assignment with x>y, and 10 otherwise. Only p2 fulfills the condition, so it outputs 5.
This is exactly the point: it outputs 5. That’s bad! But the agent as written will look perfectly reasonable to anyone who has not thought about the spurious proof problem. So, we want general tools to avoid this kind of thing. For the case of proof-based agents, we have a pretty good tool, namely MUDT (the strategy of looking for the highest-utility such proof rather than any such proof). (However, this falls prey to the Troll Bridge problem, which looks pretty bad.)
Conditionals with false antecedents seem nonsensical from the perspective of natural language, but why is this a problem for the formal agent?
More generally, the problem is that for formal agents, false antecedents cause nonsensical reasoning. EG, for the material conditional (the usual logical version of conditionals), everything is true when reasoning from a false antecedent. For Bayesian conditionals (the usual probabilistic version of conditionals), probability zero events don’t even have conditionals (so you aren’t allowed to ask what follows from them).
Yet, we reason informally from false antecedents all the time, EG thinking about what would happen if
So, false antecedents cause greater problems for formal agents than for natural language.
For this particular problem, you could get rid of assignments with nonsensical values by also considering an algorithm with reversed outputs and then taking the intersection of valid assignments, since only (x=5,y=10) satisfies both algorithms.
The problem is also “solved” if the agent thinks only about the environment, ignoring its knowledge about its own source code. So if the agent can form an agent-environment boundary (a “cartesian boundary”) then the problem is already solved, no need to try reversed outputs.
The point here is to do decision theory without such a boundary. The agent just approaches problems with all of its knowledge, not differentiating between “itself” and “the environment”.
While I agree that the algorithm might output 5, I don’t share the intuition that it’s something that wasn’t ‘supposed’ to happen, so I’m not sure what problem it was meant to demonstrate. I thought of a few ways to interpret it, but I’m not sure which one, if any, was the intended interpretation:
a) The algorithm is defined to compute argmax, but it doesn’t output argmax because of false antecedents.
- but I would say that it’s not actually defined to compute argmax, therefore the fact that it doesn’t output argmax is not a problem.
b) Regardless of the output, the algorithm uses reasoning from false antecedents, which seems nonsensical from the perspective of someone who uses intuitive conditionals, which impedes its reasoning.
- it may indeed seem nonsensical, but if ‘seeming nonsensical’ doesn’t actually impede its ability to select actions wich highest utility (when it’s actually defined to compute argmax), then I would say that it’s also not a problem. Furthermore, wouldn’t MUDT be perfectly satisfied with the tuplep1:(x=0,y=10,A()=10,U()=10) ? It also uses ‘nonsensical’ reasoning ‘A()=5 ⇒ U()=0’ but still outputs action with highest utility.
c) Even when the use of false antecedents doesn’t impede its reasoning, the way it arrives at its conclusions is counterintuitive to humans, which means that we’re more likely to make a catastrophic mistake when reasoning about how the agent reasons.
- Maybe? I don’t have access to other people’s intuitions, but when I read the example, I didn’t have any intuitive feeling of what the algorithm would do, so instead I just calculated all assignments (x,y)∈{0,5,10}2, eliminated all inconsistent ones and proceeded from there. And this issue wouldn’t be unique to false antecedents, there are other perfectly valid pieces of logic that might nonetheless seem counterintuitive to humans, for example the puzzle with islanders and blue eyes.
Yet, we reason informally from false antecedents all the time, EG thinking about what would happen if
When I try to examine my own reasoning, I find that when I do so, I’m just selectively blind to certain details and so don’t notice any problems. For example: suppose the environment calculates “U=10 if action = A; U=0 if action = B” and I, being a utility maximizer, am deciding between actions A and B. Then I might imagine something like “I chose A and got 10 utils”, and “I chose B and got 0 utils”—ergo, I should choose A.
But actually, if I had thought deeper about the second case, I would also think “hm, because I’m determined to choose the action with highest reward I would not choose B. And yet I chose B. This is logically impossible! OH NO THIS TIMELINE IS INCONSISTENT!”—so I couldn’t actually coherently reason about what could happen if I chose B. And yet, I would still be left with the only consistent timeline where I choose A, which I would promptly follow, and get my maximum of 10 utils.
The problem is also “solved” if the agent thinks only about the environment, ignoring its knowledge about its own source code.
The idea with reversing the outputs and taking the assignment that is valid for both versions of the algorithm seemed to me to be closer to the notion “but what would actually happen if you actually acted differently”, i.e. avoiding seemingly nonsensical reasoning while preserving self-reflection. But I’m not sure when, if ever, this principle can be generalized.
While I agree that the algorithm might output 5, I don’t share the intuition that it’s something that wasn’t ‘supposed’ to happen, so I’m not sure what problem it was meant to demonstrate.
OK, this makes sense to me. Instead of your (A) and (B), I would offer the following two useful interpretations:
1: From a design perspective, the algorithm chooses 5 when 10 is better. I’m not saying it has “computed argmax incorrectly” (as in your A); an agent design isn’t supposed to compute argmax (argmax would be insufficient to solve this problem, because we’re not given the problem in the format of a function from our actions to scores), but it is supposed to “do well”. The usefulness of the argument rests on the weight of “someone might code an agent like this on accident, if they’re not familiar with spurious proofs”. Indeed, that’s the origin of this code snippet—something like this was seriously proposed at some point.
2: From a descriptive perspective, the code snippet is not a very good description of how humans would reason about a situation like this (for all the same reasons).
When I try to examine my own reasoning, I find that when I do so, I’m just selectively blind to certain details and so don’t notice any problems. For example: suppose the environment calculates “U=10 if action = A; U=0 if action = B” and I, being a utility maximizer, am deciding between actions A and B. Then I might imagine something like “I chose A and got 10 utils”, and “I chose B and got 0 utils”—ergo, I should choose A.
Right, this makes sense to me, and is an intuition which I many people share. The problem, then, is to formalize how to be “selectively blind” in an appropriate way such that you reliably get good results.
More generally, the problem is that for formal agents, false antecedents cause nonsensical reasoning
No, it’s contradictory assumptions. False but consistent assumptions are dual to consistent-and-true assumptions...so you can only infer a mutually consistent set of propositions from either.
To put it another way, a formal system has no way of knowing what would be true or false for reasons outside itself, so it has no way of reacting to a merely false statement. But a contradiction is definable within a formal system.
To.put it yet another way… contradiction in, contradiction out
Yep, agreed. I used the language “false antecedents” mainly because I was copying the language in the comment I replied to, but I really had in mind “demonstrably false antecedents”.
Hmm. I’m not following. It seems like you follow the chain of reasoning and agree with the conclusion:
This is exactly the point: it outputs 5. That’s bad! But the agent as written will look perfectly reasonable to anyone who has not thought about the spurious proof problem. So, we want general tools to avoid this kind of thing. For the case of proof-based agents, we have a pretty good tool, namely MUDT (the strategy of looking for the highest-utility such proof rather than any such proof). (However, this falls prey to the Troll Bridge problem, which looks pretty bad.)
More generally, the problem is that for formal agents, false antecedents cause nonsensical reasoning. EG, for the material conditional (the usual logical version of conditionals), everything is true when reasoning from a false antecedent. For Bayesian conditionals (the usual probabilistic version of conditionals), probability zero events don’t even have conditionals (so you aren’t allowed to ask what follows from them).
Yet, we reason informally from false antecedents all the time, EG thinking about what would happen if
So, false antecedents cause greater problems for formal agents than for natural language.
The problem is also “solved” if the agent thinks only about the environment, ignoring its knowledge about its own source code. So if the agent can form an agent-environment boundary (a “cartesian boundary”) then the problem is already solved, no need to try reversed outputs.
The point here is to do decision theory without such a boundary. The agent just approaches problems with all of its knowledge, not differentiating between “itself” and “the environment”.
While I agree that the algorithm might output 5, I don’t share the intuition that it’s something that wasn’t ‘supposed’ to happen, so I’m not sure what problem it was meant to demonstrate. I thought of a few ways to interpret it, but I’m not sure which one, if any, was the intended interpretation:
a) The algorithm is defined to compute argmax, but it doesn’t output argmax because of false antecedents.
- but I would say that it’s not actually defined to compute argmax, therefore the fact that it doesn’t output argmax is not a problem.
b) Regardless of the output, the algorithm uses reasoning from false antecedents, which seems nonsensical from the perspective of someone who uses intuitive conditionals, which impedes its reasoning.
- it may indeed seem nonsensical, but if ‘seeming nonsensical’ doesn’t actually impede its ability to select actions wich highest utility (when it’s actually defined to compute argmax), then I would say that it’s also not a problem. Furthermore, wouldn’t MUDT be perfectly satisfied with the tuplep1:(x=0,y=10,A()=10,U()=10) ? It also uses ‘nonsensical’ reasoning ‘A()=5 ⇒ U()=0’ but still outputs action with highest utility.
c) Even when the use of false antecedents doesn’t impede its reasoning, the way it arrives at its conclusions is counterintuitive to humans, which means that we’re more likely to make a catastrophic mistake when reasoning about how the agent reasons.
- Maybe? I don’t have access to other people’s intuitions, but when I read the example, I didn’t have any intuitive feeling of what the algorithm would do, so instead I just calculated all assignments (x,y)∈{0,5,10}2, eliminated all inconsistent ones and proceeded from there. And this issue wouldn’t be unique to false antecedents, there are other perfectly valid pieces of logic that might nonetheless seem counterintuitive to humans, for example the puzzle with islanders and blue eyes.
When I try to examine my own reasoning, I find that when I do so, I’m just selectively blind to certain details and so don’t notice any problems. For example: suppose the environment calculates “U=10 if action = A; U=0 if action = B” and I, being a utility maximizer, am deciding between actions A and B. Then I might imagine something like “I chose A and got 10 utils”, and “I chose B and got 0 utils”—ergo, I should choose A.
But actually, if I had thought deeper about the second case, I would also think “hm, because I’m determined to choose the action with highest reward I would not choose B. And yet I chose B. This is logically impossible! OH NO THIS TIMELINE IS INCONSISTENT!”—so I couldn’t actually coherently reason about what could happen if I chose B. And yet, I would still be left with the only consistent timeline where I choose A, which I would promptly follow, and get my maximum of 10 utils.
The idea with reversing the outputs and taking the assignment that is valid for both versions of the algorithm seemed to me to be closer to the notion “but what would actually happen if you actually acted differently”, i.e. avoiding seemingly nonsensical reasoning while preserving self-reflection. But I’m not sure when, if ever, this principle can be generalized.
OK, this makes sense to me. Instead of your (A) and (B), I would offer the following two useful interpretations:
1: From a design perspective, the algorithm chooses 5 when 10 is better. I’m not saying it has “computed argmax incorrectly” (as in your A); an agent design isn’t supposed to compute argmax (argmax would be insufficient to solve this problem, because we’re not given the problem in the format of a function from our actions to scores), but it is supposed to “do well”. The usefulness of the argument rests on the weight of “someone might code an agent like this on accident, if they’re not familiar with spurious proofs”. Indeed, that’s the origin of this code snippet—something like this was seriously proposed at some point.
2: From a descriptive perspective, the code snippet is not a very good description of how humans would reason about a situation like this (for all the same reasons).
Right, this makes sense to me, and is an intuition which I many people share. The problem, then, is to formalize how to be “selectively blind” in an appropriate way such that you reliably get good results.
No, it’s contradictory assumptions. False but consistent assumptions are dual to consistent-and-true assumptions...so you can only infer a mutually consistent set of propositions from either.
To put it another way, a formal system has no way of knowing what would be true or false for reasons outside itself, so it has no way of reacting to a merely false statement. But a contradiction is definable within a formal system.
To.put it yet another way… contradiction in, contradiction out
Yep, agreed. I used the language “false antecedents” mainly because I was copying the language in the comment I replied to, but I really had in mind “demonstrably false antecedents”.