I don’t understand why this setup needs multiple decisions (even after asking johnswentworth).
Thomas: Why doesn’t this setup work with a single decision (say, a poker player imagining her opponent raising, calling, or folding?)
John (as understood by me): If the agent only ever receives one piece of information, the sense in which it uses conditional probability is a bit trivial. Suppose the agent has an explicit world-model and U∋X is its set of all possible worlds. If the agent is only receiving a single piece of information f(X) which constrains the set of worlds to S⊆U, then the agent can have U=S, being unable to imagine any world inconsistent with what it sees. For this agent, conditioning on f is vacuous. But if the agent is making multiple decisions based on different information fi that constrain the possible worlds to different sets Si, it must be able to reason about a set of worlds larger than any particular Si.
Thomas: But doesn’t the agent need to do this for a single decision, given that it could observe either f∗ or some other information f∗′?
Here I don’t know what to respond, nor does my model of John. Maybe the answer is it doesn’t have to construct a lookup table for A(f∗) and can just act “on the fly”? This doesn’t make sense, because it could do the same thing across multiple decisions. Also, there’s a weird thing going on where the math in the post is a behavioral claim: “we can model the agent as using conditional expected value”, but the interpretation, including the second bullet point, references the agent’s possible structure.
Yeah, my explanation of that wasn’t very good. Let me try again.
If there’s just one decision, the agent maximizes E[u(A,X)|f(X)=f∗]. But then we could define a behaviorally-equivalent utility function u′(A,f∗)=E[u(A,X)|f(X)=f∗]; there isn’t necessarily a sense in which the agent cares about X rather than f∗.
With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function u′(A,f∗1,...,f∗n). But if there’s enough decisions with enough different inputs then f∗1,...,f∗n may be bigger than X - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” X yields a model which is “more efficient”, in some sense.
Another way to put it: with just one decision, ~any u′(A,f∗) should be behaviorally equivalent to a E[u(A,X)|f(X)=f∗]-maximizer for some u,f. But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
edit: the numbers are wrong here; go see my distillation for the correct numbers
Proposed example to check my understanding:
Here, X=(x1,x2)∈X where X is the 10 black points representing possible worlds.
We have three different observations f1,f2,f3, each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations O.
It seems that
|X|=|f(X)|=10 while |O|=|f1(X)×f2(X)×f3(X)|=64: there are more combinations of partial observations than possible worlds.
Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values (f∗1,f∗2,f∗3)
Also, this notion of conditional expected utility actually constrains the behavior; for an action space A not all of the |A|64 policies which map O→A correspond to conditional expected utility maximization.
If we were not conditioning, there would be only |A×X| policies that are expected utility maximization.
If we are conditioning, it seems like there are |A|∑i|fi(X)|=|A|12 such policies—the agent is able to make decisions given 3 types of possible information i=1,2,3, and each possible type of information i has |fi(X)|=4.
So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
I don’t understand why this setup needs multiple decisions (even after asking johnswentworth).
Thomas: Why doesn’t this setup work with a single decision (say, a poker player imagining her opponent raising, calling, or folding?)
John (as understood by me): If the agent only ever receives one piece of information, the sense in which it uses conditional probability is a bit trivial. Suppose the agent has an explicit world-model and U∋X is its set of all possible worlds. If the agent is only receiving a single piece of information f(X) which constrains the set of worlds to S⊆U, then the agent can have U=S, being unable to imagine any world inconsistent with what it sees. For this agent, conditioning on f is vacuous. But if the agent is making multiple decisions based on different information fi that constrain the possible worlds to different sets Si, it must be able to reason about a set of worlds larger than any particular Si.
Thomas: But doesn’t the agent need to do this for a single decision, given that it could observe either f∗ or some other information f∗′?
Here I don’t know what to respond, nor does my model of John. Maybe the answer is it doesn’t have to construct a lookup table for A(f∗) and can just act “on the fly”? This doesn’t make sense, because it could do the same thing across multiple decisions. Also, there’s a weird thing going on where the math in the post is a behavioral claim: “we can model the agent as using conditional expected value”, but the interpretation, including the second bullet point, references the agent’s possible structure.
Yeah, my explanation of that wasn’t very good. Let me try again.
If there’s just one decision, the agent maximizes E[u(A,X)|f(X)=f∗]. But then we could define a behaviorally-equivalent utility function u′(A,f∗)=E[u(A,X)|f(X)=f∗]; there isn’t necessarily a sense in which the agent cares about X rather than f∗.
With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function u′(A,f∗1,...,f∗n). But if there’s enough decisions with enough different inputs then f∗1,...,f∗n may be bigger than X - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” X yields a model which is “more efficient”, in some sense.
Another way to put it: with just one decision, ~any u′(A,f∗) should be behaviorally equivalent to a E[u(A,X)|f(X)=f∗]-maximizer for some u,f. But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
edit: the numbers are wrong here; go see my distillation for the correct numbers
Proposed example to check my understanding:
Here, X=(x1,x2)∈X where X is the 10 black points representing possible worlds.
We have three different observations f1,f2,f3, each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations O.
It seems that
|X|=|f(X)|=10 while |O|=|f1(X)×f2(X)×f3(X)|=64: there are more combinations of partial observations than possible worlds.
Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values (f∗1,f∗2,f∗3)
Also, this notion of conditional expected utility actually constrains the behavior; for an action space A not all of the |A|64 policies which map O→A correspond to conditional expected utility maximization.
If we were not conditioning, there would be only |A×X| policies that are expected utility maximization.
If we are conditioning, it seems like there are |A|∑i|fi(X)|=|A|12 such policies—the agent is able to make decisions given 3 types of possible information i=1,2,3, and each possible type of information i has |fi(X)|=4.
So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
I didn’t check the math in your counting argument, but qualitatively that is correct.