I think it’s worth examining more closely what it means to be “not a pure optimizer”. Formally, a VNM utility function is a rationalization of a coherent policy. Say that you have some idea about what your utility function is, U. Suppose you then decide to follow a policy that does not maximize U. Logically, it follows that U is not really your utility function; either your policy doesn’t coherently maximize any utility function, or it maximizes some other utility function. (Because the utility function is, by definition, a rationalization of the policy)
Failing to disambiguate these two notions of “the agent’s utility function” is a map-territory error.
Decision theories require, as input, a utility function to maximize, and output a policy. If a decision theory is adopted by an agent who is using it to determine their policy (rather than already knowing their policy), then they are operating on some preliminary idea about what their utility function is. Their “actual” utility function is dependent on their policy; it need not match up with their idea.
So, it is very much possible for an agent who is operating on an idea U of their utility function, to evaluate counterfactuals in which their true behavioral utility function is not U. Indeed, this is implied by the fact that utility functions are rationalizations of policies.
Let’s look at the “turn left/right” example. The agent is operating on a utility function idea U, which is higher the more the agent turns left. When they evaluate the policy of turning “right” on the 10th time, they must conclude that, in this hypothetical, either (a) “right” maximizes U, (b) they are maximizing some utility function other than U, or (c) they aren’t a maximizer at all.
The logical counterfactual framework says the answer is (a): that the fixed computation of U-maximization results in turning right, not left. But, this is actually the weirdest of the three worlds. It is hard to imagine ways that “right” maximizes U, whereas it is easy to imagine that the agent is maximizing a utility function other than U, or is not a maximizer.
Yes, the (b) and (c) worlds may be weird in a problematic way. However, it is hard to imagine these being nearly as weird as (a).
One way they could be weird is that an agent having a complex utility function is likely to have been produced by a different process than an agent with a simple utility function. So the more weird exceptional decisions you make, the greater the evidence is that you were produced by the sort of process that produces complex utility functions.
This is pretty similar to the smoking lesion problem, then. I expect that policy-dependent source code will have a lot in common with EDT, as they both consider “what sort of agent I am” to be a consequence of one’s policy. (However, as you’ve pointed out, there are important complications with the framing of the smoking lesion problem)
I think further disambiguation on this could benefit from re-analyzing the smoking lesion problem (or a similar problem), but I’m not sure if I have the right set of concepts for this yet.
OK, all of that made sense to me. I find the direction more plausible than when I first read your post, although it still seems like it’ll fall to the problem I sketched.
I both like and hate that it treats logical uncertainty in a radically different way from empirical uncertainty—like, because we have so far failed to find any way to treat the two uniformly (besides being entirely updateful that is); and hate, because it still feels so wrong for the two to be very different.
I think it’s worth examining more closely what it means to be “not a pure optimizer”. Formally, a VNM utility function is a rationalization of a coherent policy. Say that you have some idea about what your utility function is, U. Suppose you then decide to follow a policy that does not maximize U. Logically, it follows that U is not really your utility function; either your policy doesn’t coherently maximize any utility function, or it maximizes some other utility function. (Because the utility function is, by definition, a rationalization of the policy)
Failing to disambiguate these two notions of “the agent’s utility function” is a map-territory error.
Decision theories require, as input, a utility function to maximize, and output a policy. If a decision theory is adopted by an agent who is using it to determine their policy (rather than already knowing their policy), then they are operating on some preliminary idea about what their utility function is. Their “actual” utility function is dependent on their policy; it need not match up with their idea.
So, it is very much possible for an agent who is operating on an idea U of their utility function, to evaluate counterfactuals in which their true behavioral utility function is not U. Indeed, this is implied by the fact that utility functions are rationalizations of policies.
Let’s look at the “turn left/right” example. The agent is operating on a utility function idea U, which is higher the more the agent turns left. When they evaluate the policy of turning “right” on the 10th time, they must conclude that, in this hypothetical, either (a) “right” maximizes U, (b) they are maximizing some utility function other than U, or (c) they aren’t a maximizer at all.
The logical counterfactual framework says the answer is (a): that the fixed computation of U-maximization results in turning right, not left. But, this is actually the weirdest of the three worlds. It is hard to imagine ways that “right” maximizes U, whereas it is easy to imagine that the agent is maximizing a utility function other than U, or is not a maximizer.
Yes, the (b) and (c) worlds may be weird in a problematic way. However, it is hard to imagine these being nearly as weird as (a).
One way they could be weird is that an agent having a complex utility function is likely to have been produced by a different process than an agent with a simple utility function. So the more weird exceptional decisions you make, the greater the evidence is that you were produced by the sort of process that produces complex utility functions.
This is pretty similar to the smoking lesion problem, then. I expect that policy-dependent source code will have a lot in common with EDT, as they both consider “what sort of agent I am” to be a consequence of one’s policy. (However, as you’ve pointed out, there are important complications with the framing of the smoking lesion problem)
I think further disambiguation on this could benefit from re-analyzing the smoking lesion problem (or a similar problem), but I’m not sure if I have the right set of concepts for this yet.
OK, all of that made sense to me. I find the direction more plausible than when I first read your post, although it still seems like it’ll fall to the problem I sketched.
I both like and hate that it treats logical uncertainty in a radically different way from empirical uncertainty—like, because we have so far failed to find any way to treat the two uniformly (besides being entirely updateful that is); and hate, because it still feels so wrong for the two to be very different.