I have been waiting for someone to formalize this objection to Pascal’s mugging for a long time, and I’m very happy that now that it’s been done it’s been done very well.
What precisely is the objection to Pascal’s Mugging in the post? Just that the probability for the mugger being able to deliver goes down with N? This objection has been given thousands of times, and the counter response is that the probability can’t go down fast enough to outweigh increase in utility. This is formalised here.
The paper is really useless. The entire methodology of requiring some non-zero computable bound on the probability that the function with a given godel number will turn out to be correct is deeply flawed. The failure is really about the inability of a computable function to check if two Godel numbers code for the same function not about utilities and probability. Similarly insisting that the Utilities be bounded below by a computable function on the GODEL NUMBERS of the computable functions is unrealistic.
Note that one implicitly expects that if you consider longer and longer sequences of good events followed by nothing the utility will continue to rise. They basically rule out all the reasonable unbounded utility functions by fiat by requiring the infinite sequence of good events to have finite utility.
I mean consider the following really simply model. At each time step I either receive a 1 or a 0 bit from the environment. The utility is the number of consequtive 1′s that appear before the first 0. The probability measure is the standard coin flip measure. Everything is nice and every Borel set of outcomes has a well defined expected value but the utility function goes off to infinity and indeed is undefined on the infinite sequence of 1′s.
Awful paper but hard for non-experts to see where it gets the model wrong.
The right analysis is simply that we want a utility function that is L1 integrable on the space of outcomes with respect to the probability measure. That is enough to get rid of Pascal’s mugging.
The post’s argument is more substantive than that the probability for the mugger to deliver goes down with N. Did you read the section of the post titled “Pascal’s Mugging”? I haven’t read the de Blanc paper that you link to but I would guess that he doesn’t assume a (log)-normal prior for the effectiveness of actions and so doesn’t Bayesian adjust the quantities downward as sharply as the present post suggests that one should.
The argument is that simple numbers like 3^^^3 should be considered much more likely than random numbers with a similar size, since they have short descriptions and so the mechanisms by which that many people (or whatever) hang in the balance are less complex. For instance you’re more likely to win a prize of $1,000,000 than $743,328 even though the former is larger. de Blanc considers priors of this form, of which the normal isn’t an example.
Surely an action is more likely to have an expected value of saving 3.2 lives than pi lives; the distribution of values of actions is probably not literally log normal partially for the reason that you just gave, but I think that a log-normal distribution is much closer to the truth than a distribution which assigns probabilities strictly by Kolmogorov complexity. Here I’d recur to my response to cousin it’s comment.
Surely an action is more likely to have an expected value of saving 3.2 lives than pi lives
I’m not so sure. Do you mean (3.2 lives|pi lives) to log(3^^^3) digits of precision? If you don’t, I think it misleads intuition to think about the probability of an action saving 3.2 lives, to two decimal places; vs. pi lives, to indefinite precision.
I can’t think of any right now, but I feel like if I really put my creativity to work for long enough, I could think of more ways to save 3.14159265358979323846264 lives than 3.20000000000000000000000 lives.
I meant 3.2 lives to arbitrary precision vs. pi lives to arbitrary precision. Anyway, my point was that there’s going to be some deviation from a log-normal distribution on account of contingent features of the universe that we live in (mathematical, physical, biological, etc.) but that probably a log-normal distribution is a closer approximation to the truth than what one would hope to come up with a systematic analysis of the complexity of the numbers involved.
The argument is that simple numbers like 3^^^3 should be considered much more likely than random numbers with a similar size, since they have short descriptions and so the mechanisms by which that many people (or whatever) hang in the balance are less complex.
Consider the options A = “a proposed action affects 3^^^3 people” and B = “the number 3^^^3 was made up to make a point”. Given my knowledge about the mechanisms that affect people in the real world and about the mechanisms people use to make points in arguments, I would say that the likelihood of A versus B is hugely in favor of B. This is because the relevant probabilities for calculating the likelihood scale (for large values and up to a first order approximation) with the size of the number in question for option A and the complexity of the number for option B. I didn’t read de Blanc’s paper further than the abstract, but from that and your description of the paper it seems that its setting is far more abstract and uninformative than the setting of Pascal’s mugging, in which we also have the background knowledge of our usual life experience.
I mean that using a probability distribution rather than just saying numbers clearly dispels a naive pascal’s mugging. I am open to the possibility that more heavily contrived Pascal’s Muggings may exist that can still exploit an unbounded utility function but I’ll read that paper and see what I think after that.
Edit: From the abstract:
The agent has a utility function on outputs from the environment. We show
that if this utility function is bounded below in absolute value by an unbounded
computable function, then the expected utility of any input is undefined.
This implies that a computable utility function will have convergent expected
utilities iff that function is bounded.
What this sounds like it is saying is that literally any action under an unbounded utility function has undefined utility. In that case it just says that unbounded utility functions are useless from the perspective of decision theory. I’m not sure how it constitutes evidence that the problem of Pascal’s Mugging is unresolved.
Suppose that the probability I assign to the mugger being able to deliver is equal to 1 / ((utility delivered if the mugger is telling the truth) ^ 2). Wouldn’t that be a probability that goes down fast enough to outweigh the increase in utility?
I’m afraid that I don’t remember the details of the paper I linked to above, you’ll have to look at it to see why they don’t consider that a valid distribution (perhaps because the things that the mugger says have to be counted as evidence, and this can’t decrease that quickly for some reason? I’m afraid I don’t remember.)
I have been waiting for someone to formalize this objection to Pascal’s mugging for a long time, and I’m very happy that now that it’s been done it’s been done very well.
???
What precisely is the objection to Pascal’s Mugging in the post? Just that the probability for the mugger being able to deliver goes down with N? This objection has been given thousands of times, and the counter response is that the probability can’t go down fast enough to outweigh increase in utility. This is formalised here.
The paper is really useless. The entire methodology of requiring some non-zero computable bound on the probability that the function with a given godel number will turn out to be correct is deeply flawed. The failure is really about the inability of a computable function to check if two Godel numbers code for the same function not about utilities and probability. Similarly insisting that the Utilities be bounded below by a computable function on the GODEL NUMBERS of the computable functions is unrealistic.
Note that one implicitly expects that if you consider longer and longer sequences of good events followed by nothing the utility will continue to rise. They basically rule out all the reasonable unbounded utility functions by fiat by requiring the infinite sequence of good events to have finite utility.
I mean consider the following really simply model. At each time step I either receive a 1 or a 0 bit from the environment. The utility is the number of consequtive 1′s that appear before the first 0. The probability measure is the standard coin flip measure. Everything is nice and every Borel set of outcomes has a well defined expected value but the utility function goes off to infinity and indeed is undefined on the infinite sequence of 1′s.
Awful paper but hard for non-experts to see where it gets the model wrong.
The right analysis is simply that we want a utility function that is L1 integrable on the space of outcomes with respect to the probability measure. That is enough to get rid of Pascal’s mugging.
The post’s argument is more substantive than that the probability for the mugger to deliver goes down with N. Did you read the section of the post titled “Pascal’s Mugging”? I haven’t read the de Blanc paper that you link to but I would guess that he doesn’t assume a (log)-normal prior for the effectiveness of actions and so doesn’t Bayesian adjust the quantities downward as sharply as the present post suggests that one should.
The argument is that simple numbers like 3^^^3 should be considered much more likely than random numbers with a similar size, since they have short descriptions and so the mechanisms by which that many people (or whatever) hang in the balance are less complex. For instance you’re more likely to win a prize of $1,000,000 than $743,328 even though the former is larger. de Blanc considers priors of this form, of which the normal isn’t an example.
Surely an action is more likely to have an expected value of saving 3.2 lives than pi lives; the distribution of values of actions is probably not literally log normal partially for the reason that you just gave, but I think that a log-normal distribution is much closer to the truth than a distribution which assigns probabilities strictly by Kolmogorov complexity. Here I’d recur to my response to cousin it’s comment.
I’m not so sure. Do you mean (3.2 lives|pi lives) to log(3^^^3) digits of precision? If you don’t, I think it misleads intuition to think about the probability of an action saving 3.2 lives, to two decimal places; vs. pi lives, to indefinite precision.
I can’t think of any right now, but I feel like if I really put my creativity to work for long enough, I could think of more ways to save 3.14159265358979323846264 lives than 3.20000000000000000000000 lives.
I meant 3.2 lives to arbitrary precision vs. pi lives to arbitrary precision. Anyway, my point was that there’s going to be some deviation from a log-normal distribution on account of contingent features of the universe that we live in (mathematical, physical, biological, etc.) but that probably a log-normal distribution is a closer approximation to the truth than what one would hope to come up with a systematic analysis of the complexity of the numbers involved.
Consider the options A = “a proposed action affects 3^^^3 people” and B = “the number 3^^^3 was made up to make a point”. Given my knowledge about the mechanisms that affect people in the real world and about the mechanisms people use to make points in arguments, I would say that the likelihood of A versus B is hugely in favor of B. This is because the relevant probabilities for calculating the likelihood scale (for large values and up to a first order approximation) with the size of the number in question for option A and the complexity of the number for option B. I didn’t read de Blanc’s paper further than the abstract, but from that and your description of the paper it seems that its setting is far more abstract and uninformative than the setting of Pascal’s mugging, in which we also have the background knowledge of our usual life experience.
The setting in my paper allows you to have any finite amount of background knowledge.
I mean that using a probability distribution rather than just saying numbers clearly dispels a naive pascal’s mugging. I am open to the possibility that more heavily contrived Pascal’s Muggings may exist that can still exploit an unbounded utility function but I’ll read that paper and see what I think after that.
Edit: From the abstract:
What this sounds like it is saying is that literally any action under an unbounded utility function has undefined utility. In that case it just says that unbounded utility functions are useless from the perspective of decision theory. I’m not sure how it constitutes evidence that the problem of Pascal’s Mugging is unresolved.
(Yes, I know this is an old post.)
Suppose that the probability I assign to the mugger being able to deliver is equal to 1 / ((utility delivered if the mugger is telling the truth) ^ 2). Wouldn’t that be a probability that goes down fast enough to outweigh the increase in utility?
I’m afraid that I don’t remember the details of the paper I linked to above, you’ll have to look at it to see why they don’t consider that a valid distribution (perhaps because the things that the mugger says have to be counted as evidence, and this can’t decrease that quickly for some reason? I’m afraid I don’t remember.)