harfe comments on What do coherence arguments actually prove about agentic behavior?

harfe Feb 3, 2025, 4:26 PM
9 points
2

Some of these are very easy to prove; here’s my favorite example. An agent has a fixed utility function and performs Pareto-optimally on that utility function across multiple worlds (so “utility in each world” is the set of objectives). Then there’s a normal vector (or family of normal vectors) to the Pareto surface at whatever point the agent achieves. (You should draw a picture at this point in order for this to make sense.) That normal vector’s components will all be nonnegative (because Pareto surface), and the vector is defined only up to normalization, so we can interpret that normal vector as a probability distribution. That also makes sense intuitively: larger components of that vector (i.e. higher probabilities) indicate that the agent is “optimizing relatively harder” for utility in those worlds. This says nothing at all about how the agent will update, and we’d need a another couple sentences to argue that the agent maximizes expected utility under the distribution, but it does give the prototypical mental picture behind the “Pareto-optimal → probabilities” idea.

Here is an example (to point out a missing assumption): Lets say you are offered to bet on the result of a coin flip for $1$ dollar. You get $3$ dollars if you win, and your utility function is linear in dollars. You have three actions: “Heads”, “Tails”, and “Pass”. Then “Pass” performs Pareto-optimally across multiple worlds. But “Pass” does not maximize expected utility under any distribution.

I think what is needed for the result is an additional convexity-like assumption about the utilities. This could be the set of achievable utility vectors is convex'', or even something weaker like every convex combination of achievable utility vectors is dominated by an achievable utility vector” (here, by utility vector I mean $(u_{w})_{w \in W}$ if $u_{w}$ is the utility of world $w$ ). If you already accept the concept of expected utility maximization, then you could also use mixed strategies to get the convexity-like assumption (but that is not useful if the point is to motivate using probabilities and expected utility maximization).

Or: even if you do expect powerful agents to be approximately Pareto-optimal, presumably they will be approximately Pareto optimal, not exactly Pareto-optimal. What can we say about coherence then?

The underlying math statement of some of these kind of results about Pareto-optimality seems to be something like this:

If $¯ x$ is Pareto-optimal wrt utilities $u_{i}$ , $i = 1, \dots n$ and a convexity assumption (e.g. the set ${(u_{i} (x))_{i = 1}^{n} : x}$ is convex, or something with mixed strategies) holds, then there is a probability distribution $μ$ so that $¯ x$ is optimal for $U (x) = E_{i \sim μ} u_{i} (x)$ .

I think there is a (relatively simple) approximate version of this, where we start out with approximate Pareto-optimality.

We say that $¯ x$ is Pareto $ε$ —optimal if there is no (strong) Pareto-improvement by more than $ε$ (that is, there is no $x$ with $u_{i} (x) > u_{i} (¯ x) + ε$ for all $i$ ).

Claim: If $¯ x$ is Pareto $ε$ —optimal and the convexity assumption holds, then there is a probability distribution $μ$ so that $¯ x$ is $ε$ -optimal for $U (x) = E_{i \sim μ} u_{i} (x)$ .

Rough proof: Define $Y := {(u_{i} (x))_{i = 1}^{n} : x}$ and $¯ ¯¯ ¯ Y$ as the closure of $Y$ . Let $~ y \in ¯ ¯¯ ¯ Y$ be of the form $~ y = (u_{i} (¯ x) + δ)_{i = 1}^{n}$ for the largest $δ$ such that $~ y \in ¯ ¯¯ ¯ Y$ . We know that $δ \leq ε$ . Now $~ y$ is Pareto-optimal for $Y$ , and by the non-approximate version there exists a probability distribution $μ$ so that $~ y$ is optimal for $y \mapsto E_{i \sim μ} y_{i}$ . Then, for any $x$ , we have $\mathbb{E}{i\sim\mu} u_i(x) \leq \mathbb{E}{i\sim\mu} \tilde y_i = \mathbb{E}{i\sim\mu} (u_i(\bar x) + \delta)\le \varepsilon + \mathbb{E}{i\sim\mu} u_i(\bar x), $ that is, $¯ x$ is $ε$ -optimal for $U$ .
- johnswentworth Feb 3, 2025, 4:57 PM
  10 points
  0
  Parent
  If you already accept the concept of expected utility maximization, then you could also use mixed strategies to get the convexity-like assumption (but that is not useful if the point is to motivate using probabilities and expected utility maximization).
  That is indeed what I had in mind when I said we’d need another couple sentences to argue that the agent maximizes expected utility under the distribution. It is less circular than it might seem at first glance, because two importantly different kinds of probabilities are involved: uncertainty over the environment (which is what we’re deriving), and uncertainty over the agent’s own actions arising from mixed strategies.