Thomas Kwa comments on [Request for Distillation] Coherence of Distributed Decisions With Different Inputs Implies Conditioning

Thomas Kwa 3 May 2022 20:39 UTC
3 points
I don’t understand why this setup needs multiple decisions (even after asking johnswentworth).
- Thomas: Why doesn’t this setup work with a single decision (say, a poker player imagining her opponent raising, calling, or folding?)
- John (as understood by me): If the agent only ever receives one piece of information, the sense in which it uses conditional probability is a bit trivial. Suppose the agent has an explicit world-model and $U ∋ X$ is its set of all possible worlds. If the agent is only receiving a single piece of information $f (X)$ which constrains the set of worlds to $S \subseteq U$ , then the agent can have U=S, being unable to imagine any world inconsistent with what it sees. For this agent, conditioning on f is vacuous. But if the agent is making multiple decisions based on different information $f_{i}$ that constrain the possible worlds to different sets $S_{i}$ , it must be able to reason about a set of worlds larger than any particular $S_{i}$ .
- Thomas: But doesn’t the agent need to do this for a single decision, given that it could observe either $f^{*}$ or some other information ${f^{*}}^{'}$ ?
- Here I don’t know what to respond, nor does my model of John. Maybe the answer is it doesn’t have to construct a lookup table for $A (f^{*})$ and can just act “on the fly”? This doesn’t make sense, because it could do the same thing across multiple decisions. Also, there’s a weird thing going on where the math in the post is a behavioral claim: “we can model the agent as using conditional expected value”, but the interpretation, including the second bullet point, references the agent’s possible structure.
What links here?
- Deriving Conditional Expected Utility from Pareto-Efficient Decisions by Thomas Kwa (5 May 2022 3:21 UTC; 24 points)
- johnswentworth 3 May 2022 22:41 UTC
  3 points
  Parent
  Yeah, my explanation of that wasn’t very good. Let me try again.
  If there’s just one decision, the agent maximizes $E [u (A, X) | f (X) = f^{*}]$ . But then we could define a behaviorally-equivalent utility function $u^{'} (A, f^{*}) = E [u (A, X) | f (X) = f^{*}]$ ; there isn’t necessarily a sense in which the agent cares about $X$ rather than $f^{*}$ .
  With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function $u^{'} (A, f_{1}^{*}, . . ., f_{n}^{*})$ . But if there’s enough decisions with enough different inputs then $f_{1}^{*}, . . ., f_{n}^{*}$ may be bigger than $X$ - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” $X$ yields a model which is “more efficient”, in some sense.
  Another way to put it: with just one decision, ~any $u^{'} (A, f^{*})$ should be behaviorally equivalent to a $E [u (A, X) | f (X) = f^{*}]$ -maximizer for some $u, f$ . But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
  - Thomas Kwa 4 May 2022 2:48 UTC
    3 points
    Parent
    edit: the numbers are wrong here; go see my distillation for the correct numbers
    Proposed example to check my understanding:
    Here, $X = (x_{1}, x_{2}) \in X$ where $X$ is the 10 black points representing possible worlds.
    We have three different observations $f_{1}, f_{2}, f_{3}$ , each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations $O$ .
    It seems that
    $| X | = | f (X) | = 10$ while $| O | = | f_{1} (X) \times f_{2} (X) \times f_{3} (X) | = 64$ : there are more combinations of partial observations than possible worlds.
    Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values $(f_{1}^{*}, f_{2}^{*}, f_{3}^{*})$
    Also, this notion of conditional expected utility actually constrains the behavior; for an action space $A$ not all of the $| A |^{64}$ policies which map $O \to A$ correspond to conditional expected utility maximization.
    If we were not conditioning, there would be only $| A \times X |$ policies that are expected utility maximization.
    If we are conditioning, it seems like there are $| A |^{\sum_{i} | f_{i} (X) |} = | A |^{12}$ such policies—the agent is able to make decisions given 3 types of possible information $i = 1, 2, 3$ , and each possible type of information i has $| f_{i} (X) | = 4$ .
    So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
    - johnswentworth 4 May 2022 3:42 UTC
      2 points
      Parent
      I didn’t check the math in your counting argument, but qualitatively that is correct.