This is maxw∈Wu(w), right? And then you might just constrain the subset of W which the agent can search over?
Exactly.
One toy model to conceptualize what a “compact criterion” might look like: imagine we take a second-order expansion of u around some u-maximal world-state w∗. Then, the eigendecomposition of the Hessian of u around w∗ tells us which directions-of-change in the world state u cares about a little or a lot. If the constraints lock the accessible world-states into the directions which u doesn’t care about much (i.e. eigenvalues near 0), then any accessible world-state near w∗ compatible with the constraints will have near-maximal u. On the other hand, if the constraints allow variation in directions which u does care about a lot (i.e. large eigenvalues), then u will be fragile to perturbations to u’ which move the u’-optimal world-state along those directions.
That toy model has a very long list of problems with it, but I think it conveys roughly what kind of things are involved in modelling value fragility.
Exactly.
One toy model to conceptualize what a “compact criterion” might look like: imagine we take a second-order expansion of u around some u-maximal world-state w∗. Then, the eigendecomposition of the Hessian of u around w∗ tells us which directions-of-change in the world state u cares about a little or a lot. If the constraints lock the accessible world-states into the directions which u doesn’t care about much (i.e. eigenvalues near 0), then any accessible world-state near w∗ compatible with the constraints will have near-maximal u. On the other hand, if the constraints allow variation in directions which u does care about a lot (i.e. large eigenvalues), then u will be fragile to perturbations to u’ which move the u’-optimal world-state along those directions.
That toy model has a very long list of problems with it, but I think it conveys roughly what kind of things are involved in modelling value fragility.