Thomas Kwa comments on Why Subagents?

Thomas Kwa 30 Apr 2022 17:40 UTC
15 points
The number of subagents required to represent a partial preference ordering is the order dimension of the poset. If it’s not $O (log n)$ in the number of states, this would be bad for the subagents hypothesis! There are exponentially many possible states of the world, and superlogarithmic order dimension would mean agents have a number of subagents superlinear in the number of atoms in the world. So what are the order dimensions of posets we care about? I found the following results with a brief search:
- The order dimension of a poset is less than or equal to its width (the size of the largest set of pairwise incomparable elements). Source.
  - This doesn’t seem like a useful upper bound. If you have two sacred values, lives and beauty, then there are likely to be arbitrarily many incomparable states on the lives-beauty Pareto frontier, but the order dimension is two.
- This paper finds the following bounds for order dimension of a random poset $P_{n, p}$ (defined by taking all edges in a random graph with n vertices where each edge has probability p, orienting them, then taking the transitive closure). If $p log log n \to \infty$ , the following result holds almost surely:
  - $(1 - ϵ) \sqrt{\frac{log n}{log (1 / q)}} \leq dim P_{n, p} \leq (1 + ϵ) \sqrt{\frac{4 log n}{3 log (1 / q)}}$ where $q = 1 - p$ .
  - The order dimension of a random poset decreases as p increases. We should expect agents in the real world to have reasonably high $p$ , since refusing to make a large proportion of trades is probably bad for reward.
  - If $p = 0.99$ , then $dim P_{n, p} \leq (1 + ϵ) 0.95 \sqrt{log n}$
  - If $p = 0.5$ , then $dim P_{n, p} \leq (1 + ϵ) 2.43 \sqrt{log n}$
  - If $p = 0.01$ , we have $15.1 (1 - ϵ) \sqrt{log n} \leq dim P_{n, p} \leq 20.2 (1 + ϵ) \sqrt{log n}$
  - This is still way too many subagents (~sqrt of number of atoms in the world) to actually make sense as e.g. a model of humans, but at least it can physically fit in an agent.
  - Of course, this is just a heuristic argument, and if partial preference orderings in real life have some special structure, the conclusion might differ.
- Nora Belrose 5 Jun 2022 18:02 UTC
  4 points
  Parent
  Of course, this is just a heuristic argument, and if partial preference orderings in real life have some special structure, the conclusion might differ.
  Hmm I may be missing something here, but I suspect that “partial preference orderings in real life have some special structure” in the relevant sense, is very likely true. Human preferences don’t appear to be a random sample from the set of all possible partial orders over “world states” (or more accurately, human models of worlds).
  First of all, if you model human preferences as a vector-valued utility function (i.e. one element of the vector per subagent) it seems that it has to be continuous, and probably Lipschitz, in the sense that we’re limited in how much we can care about small changes in the world state. There’s probably some translation of this property into graph theory that I’m not aware of.
  Also, it seems like there’s one or a handful of preferred factorizations of our world model into axes-of-value, and different subagents will care about different factors/axes. More specifically, it appears that human preferences have a strong tendency to track the same abstractions that we use for empirical prediction of the world; as John says, human values are a function of humans’ latent variables. If you stop believing that souls and afterlives exist as a matter of science, it’s hard to continue sincerely caring about what happens to your soul after you die. We also don’t tend to care about weird contrived properties with no explanatory/predictive power like “grue” (green before 1 January 2030 and blue afterward).
  To the extent this is the case, it should dramatically– exponentially, I think– reduce the number of posets that are really possible and therefore the number of subagents needed to describe them.