Utility functions and probabilities are entangled
Originally posted as an EA forum comment.
Suppose that there are two effective altruist billionaires, April and Autumn. Originally they were funding AMF because they thought funding AI alignment would be 0.001% likely to work and solving alignment would be as good as saving 10 billion lives, which is an expected value of 100,000 lives, lower than they could get by funding AMF.
After being in the EA community a while, they switched to funding alignment research for different reasons.
April updated upwards on tractability. She thinks research on AI alignment is 10% likely to work, and solving alignment is as good as saving 10 billion lives.
Autumn now buys longtermist moral arguments. She thinks research on AI alignment is 0.001% likely to work, and solving alignment is as good as saving 100 trillion lives.
Both of them assign the same expected utility to alignment -- 1 billion lives. As such, they will make the same funding decisions. So even though April made an epistemic update and Autumn a moral update, we cannot distinguish between them from behavior alone.
This extends to a general principle: actions are driven by a combination of your values and subjective probabilities, and any given action is consistent with many different combinations of utility function and probability distribution.
As a second example, suppose Bart is an investor who makes risk-averse decisions (say, invests in bonds rather than stocks). He might do this for two reasons:
He would get a lot of disutility from losing money (maybe it’s his retirement fund).
He irrationally believes the probability of losing money is higher than it actually is (maybe he is biased because he grew up during a financial crash).
These different combinations of probability and utility inform the same risk-averse behavior. In fact, probability and utility are so interchangeable that professional traders—just about the most calibrated, rational people with regard to probability of losing money, and who are only risk-averse for reason (1) -- often model financial products as if losing money is more likely than it actually is, because it makes the math easier.[1]
Implication
When observing that someone’s behavior has changed, it’s not obvious what changes are value drift vs. epistemic updates. You have to have some information besides behavior alone, or you’ll have this extra degree of freedom in making interpretations.
- ^
Formally, this is using the risk-neutral probability measure rather than the real probability measure to price products; in finance circles it’s apparently said that such people are living in ” world”.
This seems relevant.
Yup. Seems to be an instance of the general principle that black-box agents can be assigned any values whatsoever.
Why can’t we use wrong probabilities in real life?
There are various circumstances where I want to “rotate” between probabilities and utilities in real life, in ways that still prescribe the correct decisions. For example, if I have a startup idea and want to maximize my expected profit, I’d be much more emotionally comfortable with thinking it has a 90% chance of making $1 billion than a 10% chance of making $9 billion. So why can’t we use wrong probabilities in real life?I think there are three major reasons why this doesn’t always work.
Idea: with a bounded world-model, messing with your probabilities changes probabilities of other things.
Idea: It’s just easier to have the correct policy if your probabilities are aligned with frequentist probabilities. Despite the extra degree of freedom, we don’t know of any better algorithm that lets us update towards the correct combination of probability and utility, other than getting each right individually.
You can no longer use evidence the same way.
Suppose you believe your startup is 90% to succeed, rather than the true probability of 10%.
A naive Bayesian update 2:1 towards the startup working now brings you to ~95%, not ~18%. Your model gives a smaller change in expected utility than reality.
To get the correct expected utility, the new probability of your startup succeeding has to be 164%. Something has gone wrong when you estimate a probability of 164%. I think you basically have to say probability of this is upper bounded at 900%, which makes sense because it’s not magic, it’s just shuffling probabilities around
It also seems like you are not actually utility indifferent as between a 90% chance of $1b and a 9% chance of $9b—the former seems far more valuable to me because once you have about ~$10m, the rest is just points on a scoreboard. So to the extent that you are more emotionally comfortable with 90%/$1b I think it’s actually because the expected utility of 90%/$1b is almost 10 times higher than the expected utility of 9%/$9b. And so to the extent you are setting your motivation based on these 2 things, you are importantly fooling yourself here.
TBH, for the equation “Util(90% chance of $1b) = Util(9% chance of $X)”, I don’t think there is any finite X that can solve that equation.
This is true, but even worse:
every policy is consistent with every belief-state given convoluted enough rewards/values.
every posterior distribution is consistent with every evidence given convoluted enough prior distribution. So to make sense of an agent or even to recognize one, you must have pretty strong priors / extra info about them