I especially like “try to maximize values according to models which, according to human beliefs, track the things we care about well”. I ended up at a similar point when thinking about the problem. It seems like we ultimately have to use this approach, at some level, in order for all the type signatures to line up. (Though this doesn’t rule out entirely different approaches at other levels, as long as we expect those approaches to track the things we care about well.)
On amplified values, I think there’s a significant piece absent from the discussion here (possibly intentionally). It’s not just about precision of values, it’s about evaluating the value function at all.
Model/example: a Bayesian utility maximizer does not need to be able to evaluate its utility function, it only needs to be able to decide which of two options has higher utility. If e.g. the utility function is ∑if(Xi), and a decision only effects X3, then the agent doesn’t need to evaluate the sum at all; it only needs to calculate f(X3) for each option. This is especially relevant in a world where most actions don’t effect most of the world (or if they do, the effects are drowned out by noise) - which is exactly the sort of world we live in. Most of my actions do not effect a random person in Mumbai (and to the extent there is an effect, it’s drowned out by noise). Even if I value the happiness of that random person in Mumbai, I never need to think about them, because my actions don’t significantly impact them in any way I can predict.
As you say, the issue isn’t just “we can’t evaluate our values precisely”. The issue is that we probably do not and cannot evaluate our values at all. We only ever evaluate comparisons, and only between actions with a relatively simple diff.
Applying this to amplification: amplification is not about evaluating our values more precisely, it’s about comparing actions with more complicated diffs, or actions where more complicated information is relevant to the diff. The things you say in the post are still basically correct, but this gives a more accurate mental picture of what amplification needs to achieve.
Great post!
I especially like “try to maximize values according to models which, according to human beliefs, track the things we care about well”. I ended up at a similar point when thinking about the problem. It seems like we ultimately have to use this approach, at some level, in order for all the type signatures to line up. (Though this doesn’t rule out entirely different approaches at other levels, as long as we expect those approaches to track the things we care about well.)
On amplified values, I think there’s a significant piece absent from the discussion here (possibly intentionally). It’s not just about precision of values, it’s about evaluating the value function at all.
Model/example: a Bayesian utility maximizer does not need to be able to evaluate its utility function, it only needs to be able to decide which of two options has higher utility. If e.g. the utility function is ∑if(Xi), and a decision only effects X3, then the agent doesn’t need to evaluate the sum at all; it only needs to calculate f(X3) for each option. This is especially relevant in a world where most actions don’t effect most of the world (or if they do, the effects are drowned out by noise) - which is exactly the sort of world we live in. Most of my actions do not effect a random person in Mumbai (and to the extent there is an effect, it’s drowned out by noise). Even if I value the happiness of that random person in Mumbai, I never need to think about them, because my actions don’t significantly impact them in any way I can predict.
As you say, the issue isn’t just “we can’t evaluate our values precisely”. The issue is that we probably do not and cannot evaluate our values at all. We only ever evaluate comparisons, and only between actions with a relatively simple diff.
Applying this to amplification: amplification is not about evaluating our values more precisely, it’s about comparing actions with more complicated diffs, or actions where more complicated information is relevant to the diff. The things you say in the post are still basically correct, but this gives a more accurate mental picture of what amplification needs to achieve.