Yeah, my explanation of that wasn’t very good. Let me try again.
If there’s just one decision, the agent maximizes E[u(A,X)|f(X)=f∗]. But then we could define a behaviorally-equivalent utility function u′(A,f∗)=E[u(A,X)|f(X)=f∗]; there isn’t necessarily a sense in which the agent cares about X rather than f∗.
With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function u′(A,f∗1,...,f∗n). But if there’s enough decisions with enough different inputs then f∗1,...,f∗n may be bigger than X - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” X yields a model which is “more efficient”, in some sense.
Another way to put it: with just one decision, ~any u′(A,f∗) should be behaviorally equivalent to a E[u(A,X)|f(X)=f∗]-maximizer for some u,f. But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
edit: the numbers are wrong here; go see my distillation for the correct numbers
Proposed example to check my understanding:
Here, X=(x1,x2)∈X where X is the 10 black points representing possible worlds.
We have three different observations f1,f2,f3, each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations O.
It seems that
|X|=|f(X)|=10 while |O|=|f1(X)×f2(X)×f3(X)|=64: there are more combinations of partial observations than possible worlds.
Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values (f∗1,f∗2,f∗3)
Also, this notion of conditional expected utility actually constrains the behavior; for an action space A not all of the |A|64 policies which map O→A correspond to conditional expected utility maximization.
If we were not conditioning, there would be only |A×X| policies that are expected utility maximization.
If we are conditioning, it seems like there are |A|∑i|fi(X)|=|A|12 such policies—the agent is able to make decisions given 3 types of possible information i=1,2,3, and each possible type of information i has |fi(X)|=4.
So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
Yeah, my explanation of that wasn’t very good. Let me try again.
If there’s just one decision, the agent maximizes E[u(A,X)|f(X)=f∗]. But then we could define a behaviorally-equivalent utility function u′(A,f∗)=E[u(A,X)|f(X)=f∗]; there isn’t necessarily a sense in which the agent cares about X rather than f∗.
With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function u′(A,f∗1,...,f∗n). But if there’s enough decisions with enough different inputs then f∗1,...,f∗n may be bigger than X - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” X yields a model which is “more efficient”, in some sense.
Another way to put it: with just one decision, ~any u′(A,f∗) should be behaviorally equivalent to a E[u(A,X)|f(X)=f∗]-maximizer for some u,f. But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
edit: the numbers are wrong here; go see my distillation for the correct numbers
Proposed example to check my understanding:
Here, X=(x1,x2)∈X where X is the 10 black points representing possible worlds.
We have three different observations f1,f2,f3, each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations O.
It seems that
|X|=|f(X)|=10 while |O|=|f1(X)×f2(X)×f3(X)|=64: there are more combinations of partial observations than possible worlds.
Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values (f∗1,f∗2,f∗3)
Also, this notion of conditional expected utility actually constrains the behavior; for an action space A not all of the |A|64 policies which map O→A correspond to conditional expected utility maximization.
If we were not conditioning, there would be only |A×X| policies that are expected utility maximization.
If we are conditioning, it seems like there are |A|∑i|fi(X)|=|A|12 such policies—the agent is able to make decisions given 3 types of possible information i=1,2,3, and each possible type of information i has |fi(X)|=4.
So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
I didn’t check the math in your counting argument, but qualitatively that is correct.