One other issue, which I’m not sure you’ve touched on, is the fact that variables in the real world are rarely completely independent. That is to say, increasing a variable, say V(1) may in fact lower other variables V(2)...V(N), including one of the 5 “important” variables that are highly weighted. For example, if I value both a clean environment and maximizing my civilization’s energy production, I have to balance the fact that maximizing energy production might involve strip-mining a forest or two, lowering the amount of clean environment available to the people.
Secondly, how does this model deal with adversarial agents? One of the reasons that Goodhart’s Law is so pervasive in the real world is that the systems it applies to often have an adversarial component. That is to say, there are agents who notice that you are pouring energy into V. In the past, all of this energy would have gone straight into U, but now that agents realize that there is a surplus of energy, they divert some of it to their own ends, reducing or even eliminating the total surplus that goes into U.
Finally, how well does this model deal with the fact that human values might change over time? If the set of 100 things the humans care about changes over time, how does that affect the expectation calculation?
>variables in the real world are rarely completely independent
To some extent, the diminishing returns of investing the agent’s “budget” captures this non-independence dynamic (increasing one variable must reduce some other, because there is less budget to go along). More complicated trade-offs seem to be modelleable in a similar way.
>Secondly, how does this model deal with adversarial agents?
It doesn’t, not really.
>Finally, how well does this model deal with the fact that human values might change over time?
One other issue, which I’m not sure you’ve touched on, is the fact that variables in the real world are rarely completely independent. That is to say, increasing a variable, say V(1) may in fact lower other variables V(2)...V(N), including one of the 5 “important” variables that are highly weighted. For example, if I value both a clean environment and maximizing my civilization’s energy production, I have to balance the fact that maximizing energy production might involve strip-mining a forest or two, lowering the amount of clean environment available to the people.
Secondly, how does this model deal with adversarial agents? One of the reasons that Goodhart’s Law is so pervasive in the real world is that the systems it applies to often have an adversarial component. That is to say, there are agents who notice that you are pouring energy into V. In the past, all of this energy would have gone straight into U, but now that agents realize that there is a surplus of energy, they divert some of it to their own ends, reducing or even eliminating the total surplus that goes into U.
Finally, how well does this model deal with the fact that human values might change over time? If the set of 100 things the humans care about changes over time, how does that affect the expectation calculation?
>variables in the real world are rarely completely independent
To some extent, the diminishing returns of investing the agent’s “budget” captures this non-independence dynamic (increasing one variable must reduce some other, because there is less budget to go along). More complicated trade-offs seem to be modelleable in a similar way.
>Secondly, how does this model deal with adversarial agents?
It doesn’t, not really.
>Finally, how well does this model deal with the fact that human values might change over time?
It doesn’t; those are more advanced considerations; see eg https://www.lesswrong.com/posts/Y2LhX3925RodndwpC/resolving-human-values-completely-and-adequately