Remember that not only are the scales freely varying, but so are the zero-points. Any normalization scheme that doesn’t take this into account won’t work.
What you want is not just to average two utility functions, you want a way of averaging two utility functions that doesn’t care about affine transformations. One way to do this is to use the difference between two special states as “currency” to normalize the utility functions against both shifts and scale changes. But then they’re not normalized against the vagaries of those special states, which can mean trouble.
It can be ignored only so long as your method of averaging two utility functions is to just add them together. The instant you try to normalize them, it becomes important.
Remember that not only are the scales freely varying, but so are the zero-points. Any normalization scheme that doesn’t take this into account won’t work.
There is good reason to pay attention to scale but not to zero-points: any normalization scheme that you come up with will be invariant with respect to adding constants to the utility functions unless you intentionally contrive one not to be. Normalization schemes that are invariant with respect to multiplying the utility functions by positive constants are harder.
There is good reason to pay attention to scale but not to zero-points: any normalization scheme that you come up with will be invariant with respect to adding constants to the utility functions unless you intentionally contrive one not to be.
Making such modifications to the utilities of outcomes will result in distorted expected utilities and different behaviour.
I meant invariant with respect to adding the same constant to the utility of every outcome in some utility function, not invariant with respect to adding some constant to one outcome (that would be silly).
Hm, good point. We can just use the relative utilities. Or, equivalently, we just have to restrict ourselves to the class of normalizations that are only a function of the relative utilities. These may not be “any” normalization scheme, but they’re pretty easy to use.
E.g. for the utility function (dollar, apple, hamburger) → (1010,1005,1000), instead we could write it as (D-A, A-H) → (5, 5). Then if we wanted to average it with the function (1,2,3), which could also be written (-1, −1), we’d get (2,2). So on average you’d still prefer the dollar.
I don’t understand. What do you mean by averaging two utility functions?
Can you should how the offsets cause trouble when you try to do normalization?
Can you show why we should investigate normalization at all? The question is always “What preference does this scheme correspond to, and why would I want that?”
Suppose I have three hypotheses for the Correct Utility (whatever that means), over three choices (e.g. choice one is a dollar, choice two is an apple, choice three is a hamburger): (1, 2, 3), (0, 500, −1000), and (1010, 1005, 1000). Except of course, they all have some unknown offset ‘c’, and come unknown scale factor ‘k’.
Suppose I take the numbers at face value and just average them weighted by some probabilities (the answer to your first question) - if I think they’re all about equally plausible, the composite utility function is (337, 501, 1). So I like the apple best, then the dollar, then the hamburger.
But what if these utility functions were written down by me while I was in 3 different moods, and I don’t want to just take the numbers at face value? What if I look back and think “man, I really liked using big numbers when I wrote #2, but that’s just an artifact, I didn’t really hate hamburgers 1000 times as much as I liked a dollar when I wrote #1. And I really liked everything when I wrote #3 - but I didn’t actually like a dollar 1000 times more than when I wrote #1, I just gave everything a bonus because I liked everything. Time to normalize!”
First, I try to normalize without taking offsets into account (now we’re starting the answer to your second question). I say “Let’s take function 1 as our scale, and just divide everything down until the biggest absolute value is 3.” Okay then, the functions become (1,2,3), (0, 1.5, −3), (3, 2.985, 2.97). If I then take the weighted average, the composite utility function is (1.3, 2.2, 1.0). Now I still like the apple best, then the hamburger, then the dollar, but maybe this time I like the hamburger almost as much as dollar, and so will make different (more sensible, perhaps) bets than before. There a variety of possible normalizations (normalizing the average, normalizing the absolute value sum, etc), but someone had a post exploring this (was it Stuart Armstrong? I can’t find the post, sorry) and didn’t really find a best choice.
However, there’s a drawback to just scaling everything down—it totally washed out utility function #3′s impact on the final answer. Imagine that I dismissed function #2 (probability = 0) - now whether I like the dollar more than the hamburger or not depends totally on whether or not I scale down function #3.
So I decide to shift everything, then scale it, so I don’t wash out the effect of function 3. So I make the dollar the 0-point for all the functions, then rescale everything so the biggest absolute value is 2. Then the functions become (0,1,2), (0,1,-2), (0, −1, −2). Then I average them to get (0, 1⁄3, −2/3). Yet again, I like the apple best, but again the ratios are different so I’ll make different bets than before.
Hm, I could have chosen better examples. But I’m too lazy to redo the math for better clarity—if you want something more dramatic, imagine function #3 had a larger apparent scale than #2, and so the composite choice shifted from looking like #3 to #2 as you normalized.
I am in total agreement with whatever point it seems like you just made, which seems to be that normalization schemes are madness.
What you “did” there is full of type errors and treating the scales and offsets as significant and whatnot. That is not allowed, and you seemed to be claiming that it is not allowed.
I guess it must be unclear what the point of OP was, though, because I was assuming that such things were not allowed as well.
What I did in the OP was completely decouple things from the arbitrary scale and offset that the utility functions come with by saying we have a utility function U’, and U’ agrees with moral theory m on object level preferences conditional on moral theory m being correct. This gives us an unknown scale and offset for each utility function that masks out the arbitraryness of each utility function’s native scale and shift. Then that scale and shift are to be adjusted so that we get relative utilities at the end that are consistent with whatever preferences we want to have.
I hope that clarifies things? But it probably doesn’t.
What you “did” there is full of type errors and treating the scales and offsets as significant and whatnot. That is not allowed, and you seemed to be claiming that it is not allowed.
Hm. You definitely did communicate that, but I guess maybe I’m pointing out a math mistake—it seems to me that you called the problem of arbitrary offsets solved too early. Though in your example it wasn’t a problem because you only had two outcomes and one outcome was always the zero point.
As I realized later because of Alex, the upshot is that to really deal with the problem of offsets you have to (at least de facto) normalize the relative utilities, not the utilities themselves. (On pain of stupidity)
Though in your example it wasn’t a problem because you only had two outcomes and one outcome was always the zero point.
I think my procedure does not run into trouble even with three options and other offsets. I don’t feel like trying it just now, but if you want to demonstrate how it goes wrong, please do.
the upshot is that to really deal with the problem of offsets you have to (at least de facto) normalize the relative utilities, not the utilities themselves. (On pain of stupidity)
Remember that not only are the scales freely varying, but so are the zero-points. Any normalization scheme that doesn’t take this into account won’t work.
What you want is not just to average two utility functions, you want a way of averaging two utility functions that doesn’t care about affine transformations. One way to do this is to use the difference between two special states as “currency” to normalize the utility functions against both shifts and scale changes. But then they’re not normalized against the vagaries of those special states, which can mean trouble.
I think that the post proves that the offsets can be ignored, at the beginning of the “Offsets and Scales” section.
It can be ignored only so long as your method of averaging two utility functions is to just add them together. The instant you try to normalize them, it becomes important.
Ah, I see what you’re saying. Any proposed normalisation scheme should not depend on the offsets.
There is good reason to pay attention to scale but not to zero-points: any normalization scheme that you come up with will be invariant with respect to adding constants to the utility functions unless you intentionally contrive one not to be. Normalization schemes that are invariant with respect to multiplying the utility functions by positive constants are harder.
Making such modifications to the utilities of outcomes will result in distorted expected utilities and different behaviour.
I meant invariant with respect to adding the same constant to the utility of every outcome in some utility function, not invariant with respect to adding some constant to one outcome (that would be silly).
Hm, good point. We can just use the relative utilities. Or, equivalently, we just have to restrict ourselves to the class of normalizations that are only a function of the relative utilities. These may not be “any” normalization scheme, but they’re pretty easy to use.
E.g. for the utility function (dollar, apple, hamburger) → (1010,1005,1000), instead we could write it as (D-A, A-H) → (5, 5). Then if we wanted to average it with the function (1,2,3), which could also be written (-1, −1), we’d get (2,2). So on average you’d still prefer the dollar.
I don’t understand. What do you mean by averaging two utility functions?
Can you should how the offsets cause trouble when you try to do normalization?
Can you show why we should investigate normalization at all? The question is always “What preference does this scheme correspond to, and why would I want that?”
Sure.
Suppose I have three hypotheses for the Correct Utility (whatever that means), over three choices (e.g. choice one is a dollar, choice two is an apple, choice three is a hamburger): (1, 2, 3), (0, 500, −1000), and (1010, 1005, 1000). Except of course, they all have some unknown offset ‘c’, and come unknown scale factor ‘k’.
Suppose I take the numbers at face value and just average them weighted by some probabilities (the answer to your first question) - if I think they’re all about equally plausible, the composite utility function is (337, 501, 1). So I like the apple best, then the dollar, then the hamburger.
But what if these utility functions were written down by me while I was in 3 different moods, and I don’t want to just take the numbers at face value? What if I look back and think “man, I really liked using big numbers when I wrote #2, but that’s just an artifact, I didn’t really hate hamburgers 1000 times as much as I liked a dollar when I wrote #1. And I really liked everything when I wrote #3 - but I didn’t actually like a dollar 1000 times more than when I wrote #1, I just gave everything a bonus because I liked everything. Time to normalize!”
First, I try to normalize without taking offsets into account (now we’re starting the answer to your second question). I say “Let’s take function 1 as our scale, and just divide everything down until the biggest absolute value is 3.” Okay then, the functions become (1,2,3), (0, 1.5, −3), (3, 2.985, 2.97). If I then take the weighted average, the composite utility function is (1.3, 2.2, 1.0). Now I still like the apple best, then the hamburger, then the dollar, but maybe this time I like the hamburger almost as much as dollar, and so will make different (more sensible, perhaps) bets than before. There a variety of possible normalizations (normalizing the average, normalizing the absolute value sum, etc), but someone had a post exploring this (was it Stuart Armstrong? I can’t find the post, sorry) and didn’t really find a best choice.
However, there’s a drawback to just scaling everything down—it totally washed out utility function #3′s impact on the final answer. Imagine that I dismissed function #2 (probability = 0) - now whether I like the dollar more than the hamburger or not depends totally on whether or not I scale down function #3.
So I decide to shift everything, then scale it, so I don’t wash out the effect of function 3. So I make the dollar the 0-point for all the functions, then rescale everything so the biggest absolute value is 2. Then the functions become (0,1,2), (0,1,-2), (0, −1, −2). Then I average them to get (0, 1⁄3, −2/3). Yet again, I like the apple best, but again the ratios are different so I’ll make different bets than before.
Hm, I could have chosen better examples. But I’m too lazy to redo the math for better clarity—if you want something more dramatic, imagine function #3 had a larger apparent scale than #2, and so the composite choice shifted from looking like #3 to #2 as you normalized.
I am in total agreement with whatever point it seems like you just made, which seems to be that normalization schemes are madness.
What you “did” there is full of type errors and treating the scales and offsets as significant and whatnot. That is not allowed, and you seemed to be claiming that it is not allowed.
I guess it must be unclear what the point of OP was, though, because I was assuming that such things were not allowed as well.
What I did in the OP was completely decouple things from the arbitrary scale and offset that the utility functions come with by saying we have a utility function U’, and U’ agrees with moral theory m on object level preferences conditional on moral theory m being correct. This gives us an unknown scale and offset for each utility function that masks out the arbitraryness of each utility function’s native scale and shift. Then that scale and shift are to be adjusted so that we get relative utilities at the end that are consistent with whatever preferences we want to have.
I hope that clarifies things? But it probably doesn’t.
Hm. You definitely did communicate that, but I guess maybe I’m pointing out a math mistake—it seems to me that you called the problem of arbitrary offsets solved too early. Though in your example it wasn’t a problem because you only had two outcomes and one outcome was always the zero point.
As I realized later because of Alex, the upshot is that to really deal with the problem of offsets you have to (at least de facto) normalize the relative utilities, not the utilities themselves. (On pain of stupidity)
I think my procedure does not run into trouble even with three options and other offsets. I don’t feel like trying it just now, but if you want to demonstrate how it goes wrong, please do.
I don’t understand what you are saying here.