The problem here is that “learning how utils are measured” may be worth more than one util! So your actual reward would be less than one util because it would have to account for the utils from learning about utils.
On the other hand if we do not know the value of learning about utils (we could get between 0 and 1 utils of information about utils) we end up with a variable number of extra utils between 0 and 1. But if we don’t know how many utils it is, we will get very little information out of it, so it’s utility is likely to approach one, unless learning how utils work is worth so much that even a tiny fraction of knowledge is worth almost one util.
Okay so let’s make this more concrete: Say you opt for n=1. Omega gives you $1. How much would you pay to know that $1 = 1 util? I might pay $20 for that information. So if $1 = 1 util omega has actually given me 21 utils. But Omega is giving me 21 utils, which is 20 more than he promised, a contradiction!
It might be possible to describe this sort of system using differential equations, find some equilibria, and decide where utility is like that but if what you receive ends up being something like “you decide to buy a pet dog” this really isn’t that useful.
One non-contradictive way this could happen is that I pick n=1, and then Omega says: “The mere knowledge this outcome carries with it is worth 10 utils to you. I will therefore subject you to five seconds of torture to bring your total utility gained down to 1 util.”
If the game only happens once, we might not want to do this. However, if this game is repeated, or if other people are likely to be faced with this decision, then it makes sense to do this the first time. Then we could try to figure out what the optimal value of n is.
To continue with the same example: suppose I found out that this knowledge is worth 10 utils to me. Then I get a second chance to bet. Since I’ll never meet Omega again (and presumably never again need to use these units) this knowledge must boost my expected outcome from the bet by 10 utils. We already know that my actions in a state of ignorance are to pick n=1 which has an expected value of 1 util. So my optimal actions ought to be such that my expected outcome is 11 utils, which happens approximately when n=6 (if we can pick non-integer values for n, we can get this result more exactly).
I’m not really sure what’s being calculated in that last paragraph there. Knowing the measurement of a single util seems to be valuable OUTSIDE of this problem. Inside the problem, the optimal actions (which is to say the actions with highest expected value) continues to be writing the busy beaver function as fast as possible, &c.
Also, if Omega balances out utils with positive and negative utils, why is e more likely to torture you for five seconds and tell you “this is −9 utils” than, say, torture you for 300 years then grant you an additional 300 years of life in which you have a safe nanofactory and an Iron Man suit?
It seems to me that the vast majority of actions Omega could take would be completely inscrutable, and give us very little knowledge about the actual value of utils.
A better example might be the case in which waiting for one second at a traffic light is worth one util, and after your encounter Omega disappears without a word. Omega then begins circulating a picture of a kitten on the internet. Three years later, a friend of yours links you the picture just before you leave for work. Having to tell them to stop sending you adorable pictures when you’re about to leave is the same value as seeing the adorable picture, and the one second later that you get out the door is a second you do not have to spend waiting at a traffic light.
If this is how utils work then I begin to understand why we have to break out the busy beaver function… in order to get an outcome that is akin to $1000 out of this game, you would need to win around 2^20 utils (by my rough and highly subjective estimate). A 5% chance of $1000 is MUCH MUCH better than a guarantee of waiting one second less of waiting at a traffic light.
n=1 to maximize the probability I find out how utils are measured.
You can’t outthink a tautology. You don’t care about maximising the probability of finding out how utils are measured more than utils themselves by definition.
True, I am being somewhat flippant. However, I’m being upvoted more than I am when I take my time thinking about a comment, so I must be doing something right.
Unless you meant that comma to be a period or perhaps a semi-colon you miss the point. (To Agree with something that is different to what is said is to make an alliance straw man.)
Flippant or not you are signalling—and worse, perpetuating- a common confusion about decision theory.
However, I’m being upvoted more than I am when I take my time thinking about a comment, so I must be doing something right.
“I may be wrong but I am approved of!” A troubling sentiment, albeit practical.
I probably meant that comma to be a period, then. I agree with what you said; I think that I was being flippant in ignoring that point in my first comment.
“I may be wrong but I am approved of!” A troubling sentiment, albeit practical.
I do think that I am approved of because I’m not entirely wrong. Measuring utility is complicated. I think it’s possible that a half-serious comment that touches on the issue actually contributes more than a comment that worked out all the complications in depth. Maybe it starts more conversation and makes people think more.
n=1 to maximize the probability I find out how utils are measured.
The problem here is that “learning how utils are measured” may be worth more than one util! So your actual reward would be less than one util because it would have to account for the utils from learning about utils.
On the other hand if we do not know the value of learning about utils (we could get between 0 and 1 utils of information about utils) we end up with a variable number of extra utils between 0 and 1. But if we don’t know how many utils it is, we will get very little information out of it, so it’s utility is likely to approach one, unless learning how utils work is worth so much that even a tiny fraction of knowledge is worth almost one util.
Okay so let’s make this more concrete: Say you opt for n=1. Omega gives you $1. How much would you pay to know that $1 = 1 util? I might pay $20 for that information. So if $1 = 1 util omega has actually given me 21 utils. But Omega is giving me 21 utils, which is 20 more than he promised, a contradiction!
It might be possible to describe this sort of system using differential equations, find some equilibria, and decide where utility is like that but if what you receive ends up being something like “you decide to buy a pet dog” this really isn’t that useful.
One non-contradictive way this could happen is that I pick n=1, and then Omega says: “The mere knowledge this outcome carries with it is worth 10 utils to you. I will therefore subject you to five seconds of torture to bring your total utility gained down to 1 util.”
If the game only happens once, we might not want to do this. However, if this game is repeated, or if other people are likely to be faced with this decision, then it makes sense to do this the first time. Then we could try to figure out what the optimal value of n is.
To continue with the same example: suppose I found out that this knowledge is worth 10 utils to me. Then I get a second chance to bet. Since I’ll never meet Omega again (and presumably never again need to use these units) this knowledge must boost my expected outcome from the bet by 10 utils. We already know that my actions in a state of ignorance are to pick n=1 which has an expected value of 1 util. So my optimal actions ought to be such that my expected outcome is 11 utils, which happens approximately when n=6 (if we can pick non-integer values for n, we can get this result more exactly).
I’m not really sure what’s being calculated in that last paragraph there. Knowing the measurement of a single util seems to be valuable OUTSIDE of this problem. Inside the problem, the optimal actions (which is to say the actions with highest expected value) continues to be writing the busy beaver function as fast as possible, &c.
Also, if Omega balances out utils with positive and negative utils, why is e more likely to torture you for five seconds and tell you “this is −9 utils” than, say, torture you for 300 years then grant you an additional 300 years of life in which you have a safe nanofactory and an Iron Man suit?
It seems to me that the vast majority of actions Omega could take would be completely inscrutable, and give us very little knowledge about the actual value of utils.
A better example might be the case in which waiting for one second at a traffic light is worth one util, and after your encounter Omega disappears without a word. Omega then begins circulating a picture of a kitten on the internet. Three years later, a friend of yours links you the picture just before you leave for work. Having to tell them to stop sending you adorable pictures when you’re about to leave is the same value as seeing the adorable picture, and the one second later that you get out the door is a second you do not have to spend waiting at a traffic light.
If this is how utils work then I begin to understand why we have to break out the busy beaver function… in order to get an outcome that is akin to $1000 out of this game, you would need to win around 2^20 utils (by my rough and highly subjective estimate). A 5% chance of $1000 is MUCH MUCH better than a guarantee of waiting one second less of waiting at a traffic light.
I seem to have digressed.
You can’t outthink a tautology. You don’t care about maximising the probability of finding out how utils are measured more than utils themselves by definition.
True, I am being somewhat flippant. However, I’m being upvoted more than I am when I take my time thinking about a comment, so I must be doing something right.
Unless you meant that comma to be a period or perhaps a semi-colon you miss the point. (To Agree with something that is different to what is said is to make an alliance straw man.)
Flippant or not you are signalling—and worse, perpetuating- a common confusion about decision theory.
“I may be wrong but I am approved of!” A troubling sentiment, albeit practical.
I probably meant that comma to be a period, then. I agree with what you said; I think that I was being flippant in ignoring that point in my first comment.
I do think that I am approved of because I’m not entirely wrong. Measuring utility is complicated. I think it’s possible that a half-serious comment that touches on the issue actually contributes more than a comment that worked out all the complications in depth. Maybe it starts more conversation and makes people think more.
Thankyou. I value being comprehended. :)