johnswentworth comments on [Request for Distillation] Coherence of Distributed Decisions With Different Inputs Implies Conditioning

johnswentworth 3 May 2022 22:41 UTC
3 points
Yeah, my explanation of that wasn’t very good. Let me try again.
If there’s just one decision, the agent maximizes $E [u (A, X) | f (X) = f^{*}]$ . But then we could define a behaviorally-equivalent utility function $u^{'} (A, f^{*}) = E [u (A, X) | f (X) = f^{*}]$ ; there isn’t necessarily a sense in which the agent cares about $X$ rather than $f^{*}$ .
With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function $u^{'} (A, f_{1}^{*}, . . ., f_{n}^{*})$ . But if there’s enough decisions with enough different inputs then $f_{1}^{*}, . . ., f_{n}^{*}$ may be bigger than $X$ - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” $X$ yields a model which is “more efficient”, in some sense.
Another way to put it: with just one decision, ~any $u^{'} (A, f^{*})$ should be behaviorally equivalent to a $E [u (A, X) | f (X) = f^{*}]$ -maximizer for some $u, f$ . But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
- Thomas Kwa 4 May 2022 2:48 UTC
  3 points
  Parent
  edit: the numbers are wrong here; go see my distillation for the correct numbers
  Proposed example to check my understanding:
  Here, $X = (x_{1}, x_{2}) \in X$ where $X$ is the 10 black points representing possible worlds.
  We have three different observations $f_{1}, f_{2}, f_{3}$ , each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations $O$ .
  It seems that
  - $| X | = | f (X) | = 10$ while $| O | = | f_{1} (X) \times f_{2} (X) \times f_{3} (X) | = 64$ : there are more combinations of partial observations than possible worlds.
  - Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values $(f_{1}^{*}, f_{2}^{*}, f_{3}^{*})$
  - Also, this notion of conditional expected utility actually constrains the behavior; for an action space $A$ not all of the $| A |^{64}$ policies which map $O \to A$ correspond to conditional expected utility maximization.
    If we were not conditioning, there would be only $| A \times X |$ policies that are expected utility maximization.
    If we are conditioning, it seems like there are $| A |^{\sum_{i} | f_{i} (X) |} = | A |^{12}$ such policies—the agent is able to make decisions given 3 types of possible information $i = 1, 2, 3$ , and each possible type of information i has $| f_{i} (X) | = 4$ .
    So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
  - johnswentworth 4 May 2022 3:42 UTC
    2 points
    Parent
    I didn’t check the math in your counting argument, but qualitatively that is correct.