While I agree that the algorithm might output 5, I don’t share the intuition that it’s something that wasn’t ‘supposed’ to happen, so I’m not sure what problem it was meant to demonstrate.
OK, this makes sense to me. Instead of your (A) and (B), I would offer the following two useful interpretations:
1: From a design perspective, the algorithm chooses 5 when 10 is better. I’m not saying it has “computed argmax incorrectly” (as in your A); an agent design isn’t supposed to compute argmax (argmax would be insufficient to solve this problem, because we’re not given the problem in the format of a function from our actions to scores), but it is supposed to “do well”. The usefulness of the argument rests on the weight of “someone might code an agent like this on accident, if they’re not familiar with spurious proofs”. Indeed, that’s the origin of this code snippet—something like this was seriously proposed at some point.
2: From a descriptive perspective, the code snippet is not a very good description of how humans would reason about a situation like this (for all the same reasons).
When I try to examine my own reasoning, I find that when I do so, I’m just selectively blind to certain details and so don’t notice any problems. For example: suppose the environment calculates “U=10 if action = A; U=0 if action = B” and I, being a utility maximizer, am deciding between actions A and B. Then I might imagine something like “I chose A and got 10 utils”, and “I chose B and got 0 utils”—ergo, I should choose A.
Right, this makes sense to me, and is an intuition which I many people share. The problem, then, is to formalize how to be “selectively blind” in an appropriate way such that you reliably get good results.
OK, this makes sense to me. Instead of your (A) and (B), I would offer the following two useful interpretations:
1: From a design perspective, the algorithm chooses 5 when 10 is better. I’m not saying it has “computed argmax incorrectly” (as in your A); an agent design isn’t supposed to compute argmax (argmax would be insufficient to solve this problem, because we’re not given the problem in the format of a function from our actions to scores), but it is supposed to “do well”. The usefulness of the argument rests on the weight of “someone might code an agent like this on accident, if they’re not familiar with spurious proofs”. Indeed, that’s the origin of this code snippet—something like this was seriously proposed at some point.
2: From a descriptive perspective, the code snippet is not a very good description of how humans would reason about a situation like this (for all the same reasons).
Right, this makes sense to me, and is an intuition which I many people share. The problem, then, is to formalize how to be “selectively blind” in an appropriate way such that you reliably get good results.