I think me using the term “valid” was a very poor choice and saying “worth considering” was confusing. I agree that how you act on your beliefs/evidence should be down to the maximum expected utility and I think this is where the problems lie.
Definition below taken from Artificial Intelligence: A Modern Approach by Russell and Norvig.
The probability of outcome s′, given evidence observations e, where a stands for the event that action a is executed. The agent’s preferences are captured by a utility function, U(s), which assigns a single number to express the desirability of a state. Expected utility of an action given the evidence, EU(a|e).
EU(a|e)=Σs′P(Result(a)=s′|a,e)U(s′)
The principle of maximum expected utility (MEU) says that a rational agent should choose the action that maximizes the agent’s expected utility: action=argmaxaEU(a|e).
If we use this definition what would we fill in to be the utility of the outcome of going extinct? Probably something like U(extinct)=0; the associated action might be something like not doing anything about AI alignment in this case. What would be enough (counter)evidence such that the action following from the principle of MEU would be to ‘risk’ the extinction? Unless I just overlooked something, I believe that e has to be 0 which is, as you said, not a probability in Bayesian probability theory. I hope this makes it more clear what I was trying to get at.
Your example of disjunctive style argument is very helpful. I guess you would state that none of them are 100% ‘proof’ of the earth being round but add (varying degrees of) probability to that hypothesis being true. That would mean that there is some very small probability that it might be flat. So then we would, with above expected utility function, never fly an airplane with associated actions for a flat earth as we would deem it very likely to crash and burn.
I would add to your last creationist point
low quality of each individual argument given the extreme burden of proof associated.
I think that the first paragraph after the block quote is highly confused.
Your actions depend on your utility function, the actions you have available and the probabilities you assign to various outcomes, conditional on various actions. Lets look at a few examples. (Numbers contrived and made up.)
These examples are deliberately constructed to show that expected utility theory doesn’t blindly output “Work on AI risk” regardless of input. Other assumptions would favour working on AI risk.
You are totally selfish, and are old. The field of AI is moving slowly enough that it looks like not much will happen in your lifetime. You have a strong dislike of doing anything resembling AI safety work, and there isn’t much you could do. If you were utterly confidant AI wouldn’t come in your lifetime, you would have no reason to care. But, probabilities aren’t 0. So lets say you think there is a 1% chance of AI in your lifetime, and a 1 in a million chance that your efforts will make the difference between aligned and unaligned AI. U(Rest of life doing AI safety)=1. U(wiped out by killer AI)=0, U(Rest of life having fun)=2 and U(Living in FAI utopia)=10. Then the expected utility of having fun is 2*0.99+0.01*x*10 and the expected utility of AI safety work is 1*0.99+0.01*(x+0.000001)*10 where x is the chance of FAI. The latter expected utility is lower.
You are a perfect total utilitarian, and highly competent. You estimate that the difference between galactic utopia and extinction is so large that all other bits of utility are negligible in comparison. You estimate that if you work on Biotech safety, there is a 6% chance of AI doom, a 5% chance of bioweapon doom, and the remaining 89% chance of galactic utopia. You also estimate that if you work on AI safety there is a 5.9% chance of AI doom and a 20% chance of bioweapon doom, leaving only a 74.1% chance of galactic utopia. (You are really good at biosafety in particular) You choose to work on the biotech.
You are an average utilitarian, taking your utility function to be U=pleasure/(pleasure+suffering) over all minds you consider to be capable of such feelings. If a galactic utopia occurs, its size is huge enough to wash out everything that has happened on earth so far leaving a utility of basically 1. You thing there is a 0.1% chance of this happening. You think humans on average experience 2x as much pleasure as suffering, and farm animals on average experience 2x as much suffering as pleasure, and there are an equal number of each. Hence in the 99.9% case where AI wipes us out, the utility is exactly 0.5. However, you have a chance to reduce the number of farm animals to ever exist by 10%, leaving a utility of (2+0.9)/(2+0.9+ 1+1.8)=0.509. This increases your expected utility by 0.009. An opportunity to increase the chance of FAI galactic utopia from 0.1% to 1.1% is only worth 0.005, (a 1% chance of going from U=0.5 to U=1) Therefore reducing the number of farm animals to exist takes priority.
Thank you for those examples. I think this shows that the way I used a utility function but without placing it in a ‘real’ situation, i.e. not some locked-off situation without much in terms of viable alternative actions with some utility, is a fallacy.
I suppose then that I conflated the “What can I know?” with the “What must I do?”, separating a belief from an associated action (I think) resolves most of the conflicts that I saw.
Utilities in decision theory are both scale and translation invariant. It makes no sense to ask what the utility of going extinct “would be” in isolation from the utilities of every other outcome. All that matters are ratios of differences of utilities, since those are all that are relevant to finding the argmax of the linear combination of utilities.
I’m not sure what you mean by “I believe that e has to be 0”, since e is a set of observations, not a number. Maybe you meant P(e) = 0? But this makes no sense either since then conditional probabilities are undefined.
I meant P(e) = 0 and the point was to show that that does not make sense. But I think Donald has shown me exactly where I went wrong. You cannot have a utility function and then not place it in a context within which you have other feasible actions. See my response to Hobson.
I think me using the term “valid” was a very poor choice and saying “worth considering” was confusing. I agree that how you act on your beliefs/evidence should be down to the maximum expected utility and I think this is where the problems lie.
Definition below taken from Artificial Intelligence: A Modern Approach by Russell and Norvig.
If we use this definition what would we fill in to be the utility of the outcome of going extinct? Probably something like U(extinct)=0; the associated action might be something like not doing anything about AI alignment in this case. What would be enough (counter)evidence such that the action following from the principle of MEU would be to ‘risk’ the extinction? Unless I just overlooked something, I believe that e has to be 0 which is, as you said, not a probability in Bayesian probability theory. I hope this makes it more clear what I was trying to get at.
Your example of disjunctive style argument is very helpful. I guess you would state that none of them are 100% ‘proof’ of the earth being round but add (varying degrees of) probability to that hypothesis being true. That would mean that there is some very small probability that it might be flat. So then we would, with above expected utility function, never fly an airplane with associated actions for a flat earth as we would deem it very likely to crash and burn.
I would add to your last creationist point
I think that the first paragraph after the block quote is highly confused.
Your actions depend on your utility function, the actions you have available and the probabilities you assign to various outcomes, conditional on various actions. Lets look at a few examples. (Numbers contrived and made up.)
These examples are deliberately constructed to show that expected utility theory doesn’t blindly output “Work on AI risk” regardless of input. Other assumptions would favour working on AI risk.
You are totally selfish, and are old. The field of AI is moving slowly enough that it looks like not much will happen in your lifetime. You have a strong dislike of doing anything resembling AI safety work, and there isn’t much you could do. If you were utterly confidant AI wouldn’t come in your lifetime, you would have no reason to care. But, probabilities aren’t 0. So lets say you think there is a 1% chance of AI in your lifetime, and a 1 in a million chance that your efforts will make the difference between aligned and unaligned AI. U(Rest of life doing AI safety)=1. U(wiped out by killer AI)=0, U(Rest of life having fun)=2 and U(Living in FAI utopia)=10. Then the expected utility of having fun is 2*0.99+0.01*x*10 and the expected utility of AI safety work is 1*0.99+0.01*(x+0.000001)*10 where x is the chance of FAI. The latter expected utility is lower.
You are a perfect total utilitarian, and highly competent. You estimate that the difference between galactic utopia and extinction is so large that all other bits of utility are negligible in comparison. You estimate that if you work on Biotech safety, there is a 6% chance of AI doom, a 5% chance of bioweapon doom, and the remaining 89% chance of galactic utopia. You also estimate that if you work on AI safety there is a 5.9% chance of AI doom and a 20% chance of bioweapon doom, leaving only a 74.1% chance of galactic utopia. (You are really good at biosafety in particular) You choose to work on the biotech.
You are an average utilitarian, taking your utility function to be U=pleasure/(pleasure+suffering) over all minds you consider to be capable of such feelings. If a galactic utopia occurs, its size is huge enough to wash out everything that has happened on earth so far leaving a utility of basically 1. You thing there is a 0.1% chance of this happening. You think humans on average experience 2x as much pleasure as suffering, and farm animals on average experience 2x as much suffering as pleasure, and there are an equal number of each. Hence in the 99.9% case where AI wipes us out, the utility is exactly 0.5. However, you have a chance to reduce the number of farm animals to ever exist by 10%, leaving a utility of (2+0.9)/(2+0.9+ 1+1.8)=0.509. This increases your expected utility by 0.009. An opportunity to increase the chance of FAI galactic utopia from 0.1% to 1.1% is only worth 0.005, (a 1% chance of going from U=0.5 to U=1) Therefore reducing the number of farm animals to exist takes priority.
Thank you for those examples. I think this shows that the way I used a utility function but without placing it in a ‘real’ situation, i.e. not some locked-off situation without much in terms of viable alternative actions with some utility, is a fallacy.
I suppose then that I conflated the “What can I know?” with the “What must I do?”, separating a belief from an associated action (I think) resolves most of the conflicts that I saw.
Utilities in decision theory are both scale and translation invariant. It makes no sense to ask what the utility of going extinct “would be” in isolation from the utilities of every other outcome. All that matters are ratios of differences of utilities, since those are all that are relevant to finding the argmax of the linear combination of utilities.
I’m not sure what you mean by “I believe that e has to be 0”, since e is a set of observations, not a number. Maybe you meant P(e) = 0? But this makes no sense either since then conditional probabilities are undefined.
I meant P(e) = 0 and the point was to show that that does not make sense. But I think Donald has shown me exactly where I went wrong. You cannot have a utility function and then not place it in a context within which you have other feasible actions. See my response to Hobson.