It seems inelegant to me that utility functions are created for specific situations, while these clearly aren’t the same as that of the agent in total among all of their decisions. For instance, a model may estimate an agent’s expected utility from the result of a specific intervention, but this clearly isn’t quite right; the agent has a much more complicated utility function outside this intervention. According to a specific model, “Not having an intervention” could set “Utility = 0″; but for any real agent, it’s quite likely their life wouldn’t actually have 0 utility without the intervention.
It seems like it’s important to distinguish that a utility score in a model is very particular to the scenario for that model, and does not represent a universal utility function for the agents in question.
Let U be an agent’s true utility function across a very wide assortment of possible states, and ^U be the utility function used for the sake of the model. I believe that ^U is supposed to approximate U in some way; perhaps they should be related by an affine transformation.
The important thing for a utility function, as it is typically used (in decision models), is probably not that ^U=U, but rather, that decisions made within the specific context of ^U approximate those made using U.
Here, I use brackets to describe “The expected value, according to a utility function”, and D to describe the set of decisions made conditional on a specific utility function being used for decision making.
Then, we can represent this supposed estimation with:
Related to this, one common argument against utility maximization is that “we still cannot precisely measure utility”. But here, it’s perhaps more clear that we don’t need to. What’s important for decision making is that we have models that we can expect will help us maximize our true utility functions, even if we really don’t know much about what they really are.
You got the basic idea across, which is a big deal.
Though whether it’s A or B isn’t clear:
A) “this isn’t all of the utility function, but its everything that’s relevant to making decisions about this right now”. ^U doesn’t have to be U, or even a good approximation in every situation—just (good enough) in the situations we use it.
Building a building? A desire for things to not fall on people’s heads becomes relevant (and knowledge of how to do that).
Writing a program that writes programs? It’d be nice if it didn’t produce malware.
Both desires usually exist—and usually aren’t relevant. Models of utility for most situations won’t include them.
B) The cost of computing the utility function more exactly in the case exceeds the (expected) gains.
I think I agree with you. There’s a lot of messiness with using ^U and often I’m sure that this approximation leads to decision errors in many real cases. I’d also agree that better approximations of ^U would be costly and are often not worth the effort.
Similar to how there’s a term for “Expected value of perfect information”, there could be an equivalent for the expected value of a utility function, even outside of uncertainty of parameterized that were thought to be included. Really, there could be calculations for “expected benefit from improvements to a model”, though of course this would be difficult to parameterize (how would you declare that a model has been changed a lot vs. a little? If I introduce 2 new parameters, but these parameters aren’t that important, then how big of a deal should this be considered in expectation?)
The model has changed when the decisions it is used to make change. If the model ‘reverses’ and suggests doing the opposite/something different in every case from what it previously recommended, then it has ‘completely changed’.
(This might be roughly the McNamara fallacy, of declaring that things that ‘can’t be measured’ aren’t important.)
EDIT: Also, if there’s a set of information consisting of a bunch of pieces, A, B, and C, and incorporating all but one of them doesn’t have a big impact on the model, but the last piece does, whichever piece that is, ‘this metric’ could lead to overestimating the importance of whichever piece happened to be last, when it’s A, B, and C together that made an impact. It ‘has this issue’ because the metric by itself is meant to notice ‘changes in the model over time’, not figure out why/solve attribution.
It seems inelegant to me that utility functions are created for specific situations, while these clearly aren’t the same as that of the agent in total among all of their decisions. For instance, a model may estimate an agent’s expected utility from the result of a specific intervention, but this clearly isn’t quite right; the agent has a much more complicated utility function outside this intervention. According to a specific model, “Not having an intervention” could set “Utility = 0″; but for any real agent, it’s quite likely their life wouldn’t actually have 0 utility without the intervention.
It seems like it’s important to distinguish that a utility score in a model is very particular to the scenario for that model, and does not represent a universal utility function for the agents in question.
Let U be an agent’s true utility function across a very wide assortment of possible states, and ^U be the utility function used for the sake of the model. I believe that ^U is supposed to approximate U in some way; perhaps they should be related by an affine transformation.
The important thing for a utility function, as it is typically used (in decision models), is probably not that ^U=U, but rather, that decisions made within the specific context of ^U approximate those made using U.
Here, I use brackets to describe “The expected value, according to a utility function”, and D to describe the set of decisions made conditional on a specific utility function being used for decision making.
Then, we can represent this supposed estimation with:
⟨D(^U)⟩U∼⟨D(U)⟩U
Related to this, one common argument against utility maximization is that “we still cannot precisely measure utility”. But here, it’s perhaps more clear that we don’t need to. What’s important for decision making is that we have models that we can expect will help us maximize our true utility functions, even if we really don’t know much about what they really are.
I delve into that here: https://www.lesswrong.com/posts/Lb3xCRW9usoXJy9M2/platonic-rewards-reward-features-and-rewards-as-information#Extending_the_problem
Oh fantastic, thanks for the reference!
^U and ^U look to be the same.
Thanks! Fixed.
I’m sure the bottom notation could be improved, but am not sure the best way. In general I’m trying to get better at this kind of mathematics.
You got the basic idea across, which is a big deal.
Though whether it’s A or B isn’t clear:
A) “this isn’t all of the utility function, but its everything that’s relevant to making decisions about this right now”. ^U doesn’t have to be U, or even a good approximation in every situation—just (good enough) in the situations we use it.
Building a building? A desire for things to not fall on people’s heads becomes relevant (and knowledge of how to do that).
Writing a program that writes programs? It’d be nice if it didn’t produce malware.
Both desires usually exist—and usually aren’t relevant. Models of utility for most situations won’t include them.
B) The cost of computing the utility function more exactly in the case exceeds the (expected) gains.
isn’t clear.
I think I agree with you. There’s a lot of messiness with using ^U and often I’m sure that this approximation leads to decision errors in many real cases. I’d also agree that better approximations of ^U would be costly and are often not worth the effort.
Similar to how there’s a term for “Expected value of perfect information”, there could be an equivalent for the expected value of a utility function, even outside of uncertainty of parameterized that were thought to be included. Really, there could be calculations for “expected benefit from improvements to a model”, though of course this would be difficult to parameterize (how would you declare that a model has been changed a lot vs. a little? If I introduce 2 new parameters, but these parameters aren’t that important, then how big of a deal should this be considered in expectation?)
The model has changed when the decisions it is used to make change. If the model ‘reverses’ and suggests doing the opposite/something different in every case from what it previously recommended, then it has ‘completely changed’.
(This might be roughly the McNamara fallacy, of declaring that things that ‘can’t be measured’ aren’t important.)
EDIT: Also, if there’s a set of information consisting of a bunch of pieces, A, B, and C, and incorporating all but one of them doesn’t have a big impact on the model, but the last piece does, whichever piece that is, ‘this metric’ could lead to overestimating the importance of whichever piece happened to be last, when it’s A, B, and C together that made an impact. It ‘has this issue’ because the metric by itself is meant to notice ‘changes in the model over time’, not figure out why/solve attribution.