Deriving the Geometric Utilitarian Weights

This is a supplemental post to Geometric Utilitarianism (And Why It Matters), in which I show how I derived the weights which make any Pareto optimal point optimal according to the geometric weighted average. This is a subproblem of the proof laid out in the first post of this sequence, and the main post describes why that problem is interesting.

Overview

So how are we going to calculate weights which make optimal among ?

The idea here is to identify the Harsanyi hyperplane , which contains all of the joint utilities which satisfy . Where are the weights which make our chosen point optimal with respect to . And we’re going to calculate new weights which make optimal with respect to . It turns out it’s sufficient to make optimal among , and will also be optimal across our entire feasible set .

In terms of calculus, we’re going to be constructing a function , which tells us about how moving around on changes . And we’re going to choose weights which make the gradient equal 0 at . This makes it a local optimum, and it will turn out to be a global maximum across , which in turn will make it a global maximum across .

Geometrically, we can think of that as the surface gradient of across . And so in terms of the overall gradient , we’re designing so that is perpendicular to at .

Parameterizing the Harsanyi Hyperplane

When thinking about moving around on the Harsanyi hyperplane , we have a linear constraint that says no matter which we pick, we know that . If we know lies on , we can calculate the the -th agent’s utility from the first utilities. We’ll be referring to these first n-1 utilities a lot, so let’s call them . So for all .

and are both symmetrical with respect to shuffling the indices of agents around, so without loss of generality we’ll assume that the n-th agent is one we’re assigning positive Harsanyi weight to: > 0. This is necessary for the reconstruction to work for all .

So we can think of as a function , where the -th output is for . We can use the -th output to reconstruct given like this: . This lets us move around to pick however we want, and the function will map that to its image, helpfully also called !

Alright, now we have and we also have , which is the geometric weighted average whose gradient we’re trying to design through our choice of . So let’s compose them together to form . And since we want to be an optimum of across the hyperplane , we can set the gradient , where are the first utilities of our target joint utility . Solving this equation for will give us the weights we need!

This looks like to solving a family of equations. Where we’re holding the weights constant for the purposes of differentiation, but we’ll be solving for the weights that make the derivative 0 at .

How Does G Change as We Change These Parameters?

Ok, we’ve built up a few layers of abstraction, so let’s start unpacking. By the chain rule, and using the notation that is the -th element of the output of :

How does our point on H change as we change these parameters?

Let’s start by computing .

For the first n-1 terms this is simple, because simply returns . So is 1 when , and 0 otherwise, which we can represent using the Kronecker delta . And .

Geometrically, this is telling us about the slope of . Note that:

  • is constant and doesn’t depend on our choice of

  • (We can never increase agent ’s utility by increasing another agent’s utility. This is always true at the Pareto frontier.)

How Does G Change as We Move Around on H?

We can start solving for by substituting in the definition of G:

.

From here we can apply the n-factor product rule:

.

Thankfully, whenever , leaving just . We can also notice , leaving us with the much nicer .

It will be important later that this partial derivative is undefined when , aka wherever any agent is receiving their least feasible utility.

Writing function arguments explicitly:

Putting These Terms Together

Let’s start putting these together. We can start by breaking apart the two cases of , like this:

Here’s one reason why it’s useful to know about the Kronecker delta : it filters out all but the -th element of a sum: . When you’re working in Einstein notation (which is great by the way), you just write it as and you can think of the ’s as “cancelling”.

That leaves us with:

And we know , so let’s plug that in:

And that is the family of equations that we want to all be 0 when . (This causes .) We’ll call this gradient to remind ourselves that this is the gradient of where we’re holding the weights constant.

Solving for the Geometric Weights

Ok, now we can set , and solve for , for :

This is still a system of linear equations we need to solve, since each for depends on , which in turn satisfies . So let’s solve it for !

Remembering that , we can notice that:

This lets us simplify down to

And now we can plug that back into the formula for all the other !

Well isn’t that convenient! The formula for all has the same form, and we can think of it like starting with the Harsanyi weights (which make p optimal according to , along with anything else with the same Harsanyi score ), and then tweaking them to get to target in particular.

We can simplify our formula by noting that

To make the formula a little prettier, and to get some extra geometric insight, we can introduce the element-wise product , where .

Here’s a good opportunity to make sure our weights sum up to 1:

Great! is acting like a normalization term, and we can think of as telling us which direction points in. This vector of weights is then scaled to land on the hypersurface of weights that sum to 1, known as the standard simplex , which we’ll discuss more later.

We can also think of as a function denoted as which returns the Harsanyi weights for in the context of a compact, convex subset . This is it, so let’s make a new heading to find it later!

How to Calculate Weights for p

We now have a formula for , which we can write as

Or we can suppress function arguments and simply write

Where is the element-wise product of and : and is the dot product

For a single component , we have

Note that isn’t defined when .

Is this a problem? Not really! , the Harsanyi aggregate utility of p when has been chosen to make p optimal under . When this is 0, it means the individual utilities must all be 0 and the entire feasible set must be a single point at the origin. When that happens, any weights will make optimal according to or . Feel free to use any convention that works for your application, if we’re in a context where is defined we can inherit . If is shrinking towards becoming a single point, we can use .

Checking Our Solution

Assuming we calculated correctly, we can verify that these weights lead to . This requires for the first utilities, so let’s check that:

Success! is an optimum of among . But is it unique?

P Is the Unique Optimum of G When Weights Are Positive

Let’s see how and influenced the outcome here, and keep track of the critical points which can make or undefined. These are the only points which can be extrema, and for each we need to check if it is a minimum or maximum among . ( doesn’t have any saddle points, and doesn’t have any boundaries of its own to worry about. Where meets the boundary of ’s domain, the axes, there.)

For example, whenever any individual utility , is undefined, which causes to be undefined. But note that these will be minimal points of , unless . To find maximal points of across we need

If or are 0, then is undefined, and we’ll check later if these can still be optimal. We assumed that the index refers to an agent with , in order to prevent that exact case from breaking our entire solution.

If we were handed from some external source, we could solve this equation to see which happened to be optimal. But we designed , so let’s see what we caused to be optimal.

If then is undefined. This only happens when is a single point, in which case indeed the unique optimum of .

Here we’re going to be careful about which weights can be 0. We’ll again use the fact that to safely divide it from both sides.

Here again we can see that solves this family of n-1 equations. And this is very exciting because this is our first maximum of ! Are there any other solutions?

Each of these equations is satisfied when one of the following is true:

In other words, assigning an agent 0 Harsanyi weight (and thus geometric weight ) can allow to have multiple optima among , which can give it multiple optima among .

What about when all geometric weights are positive? Are there any other solutions to that second family of n-1 equations?

Having all positive geometric weights implies having all positive Harsanyi weights , and all positive individual utilities . It also implies that any optimum of will have all positive individual utilities . This lets us freely divide by any of these terms, without needing to worry that we might be dividing by 0.

Since and are both positive, we can think of as a scaled version of .

How does this scalar influence the other terms in these equations?

This forms a line from the origin to , which only intersects at . (Since scaling up or down changes .) So when all geometric weights are positive, is the unique optimum of among !

When , is also 0, so doesn’t affect . We can start with , and then freely vary the utilities of any agent with 0 weight and remain optimal.

Interactive Implementations

We can also check our entire calculation, including those pages of calculus, by actually implementing our solution and seeing if it works! A graphing calculator is sufficient to check this in 2 and 3 dimensions. We can show all the points which satisfy and they should trace out the contours of , showing all the joint utilities which have the same score as .

In 2 dimension, the graph looks like this:

Geometric Weight Calculation 2D
Check out an interactive version here!

The Harsanyi hyperplane is a line, and the contour curves are skewed hyperbolas.

As expected, taking out of the positive quadrant violates our assumption that utilities are non-negative, leading to invalid settings for . Similarly, if has a positive slope, this violates our assumption that is on the Pareto frontier. (A positive slope implies that we can make both players better off simultaneously. If we calculate anyway, a positive slope implies that is negative for the agent on the x axis.) This allows to pass up through the hyperbola at another point other than , but this never happens when .

With 3 agents, the graph looks like this:

Geometric Weight Calculation 3D
Interactive version here

In 3 dimensions, the Harsanyi hyperplane is a plane, and the contour surfaces are skewed hyperboloids.

We can move around on the hyperplane, and this changes , which changes where the contour touches . We can see that always lies at the intersection of this contour curve and , and this is a visual proof that maximizes among . And when corresponds to all agents having positive Harsanyi weight , this intersection only happens at !