In the case of literal mugging by someone who forgot their gun and decided to talk about matrix instead, the large half is that if you pay you have less money for a potential mugger who, when asked for a proof, said, ok, and made a display appear in front of you, showing something impressive.
I’m thinking there’s three entirely different issues here:
1: Priors may be too high for some reason (e.g. 2^-(theory length) priors do not lend to a converging sum). It looks like invalid actions resulting from legitimate probability assignment and legitimate expected utility calculation, but it really isn’t—the sum does not converge and it’s apparent sign depends to the order of summation. It’s just a case of bad math being bad.
2: Low probability that comes from huge number of alternative scenarios or shakiness of the argument also relates to inability to evaluate other actions sufficiently—invalid expected utility estimate due to partial sum. The total is unreasonably biased by the choice of the terms which are summed (in the expected utility calculation).
3: Ignoring general tendency of actions conditional on evidence to have higher utility than actions not conditional on evidence. (As is the case for that literal mugger example). Not considering alternatives conditional on evidence (e.g. “decide to pay only to a mugger with a proof” is a valid action). The utility of assets (money you have) is easy to under-evaluate because it requires modelling of the future situations and your responses.
edit: also it’s obviously wrong to just ignore low probability scenarios with some cost. When observing proper lab safety precautions, or looking both ways when crossing the street, you’re doing just that. Likewise for not playing other variations of Russian Roulette. The issue tends to arise when scenarios are purely speculative, which makes me think that it’s speculations that are to blame—you assign too high probability to a speculation, and then, when estimating the utility sum, you neglect to apply appropriate adjustment (if the scenario was chosen at random, regression to the mean) for the incompleteness of your sum.
In the case of literal mugging by someone who forgot their gun and decided to talk about matrix instead, the large half is that if you pay you have less money for a potential mugger who, when asked for a proof, said, ok, and made a display appear in front of you, showing something impressive.
I remember this being one of the solutions people came up with in some of the very early discussions about Pascal’s mugging, but it is generally considered highly unsatisfactory. To keep from an action that would be seen positive-expected-sum by itself because one’s worried that “some Matrix Lord may appear with evidence in the future requiring my resources”, only worsens the problem transforming it into the muggerless and worse variety of Pascal’s mugging—which would prevent you from ever using any resources for any reasons, even ones considered prudent.
E.g. “Should I install a fire-alarm with 100 dollars for the purposes of early warning in cases of a fire?” “No, I will have then less resources in case a Matrix-Lord comes with evidence and requires them of me.” A mind that utilized such a logic would no longer even need a mugger in the first place to fall into insanity...
Besides even if we specified “require evidence before allocating resources” what is the limiting factor for what sort of evidence is to be considered good enough?
You might die before you meet some matrix lord, you know. Fire alarm wise, you’re in the clear. And if you have #1, it’s not pascal’s mugging situation, it’s “your utility function does not work at all” situation, you need to either use bounded utility or use speed prior (which makes priors for that smaller).
edit: and even if your priors are correct, you’re still facing the problem that your sums are not complete.
E.g. “Should I install a fire-alarm with 100 dollars for the purposes of early warning in cases of a fire?” “No, I will have then less resources in case a Matrix-Lord comes with evidence and requires them of me.” A mind that utilized such a logic would no longer even need a mugger in the first place to fall into insanity...
I am reminded of the Island Where Dreams Come True in The Voyage of the Dawn Treader, which is exactly what its name says. Not daydreams or longings, but all of your worst nightmares. Having once imagined a thing calls it into existence there.
The muggerless mugging follows from giving a hypothesis some credence just because you imagined it, otherwise called the Solomonoff prior. I recall Eliezer writing here some years ago that he did not have a solution. I don’t know if he has since found one.
Priors may be too high for some reason (e.g. 2^-(theory length) priors do not lend to a converging sum).
I’ve mentioned elsewhere that this is generally what causes it. The problem is, is that really a good enough reason to use different priors? Consider the similar situation where someone rejects the 2^-(theory length) priors on the basis that it would say God doesn’t exist, and they don’t want to deal with that.
It’s just a case of bad math being bad.
Are you saying you can get around it just by using better math, instead of messing with priors?
The problem is, is that really a good enough reason to use different priors?
Sum not converging is reason enough; its not that there’s potential “pascal’s mugging” problem, it’s that the utility is undefined entirely.
Consider the similar situation where someone rejects the 2^-(theory length) priors on the basis that it would say God doesn’t exist, and they don’t want to deal with that.
That prior doesn’t say God doesn’t exist; some very incompetent people who explain said prior tell that it does, but the fact is that we do not know and will never know. At most, Gods are not much longer to encode than universes where intelligent life evolves, anyway (hence the Gods in form of superintelligences, owners of our simulation and so on).
Are you saying you can get around it just by using better math, instead of messing with priors?
What do you mean? The “bad math” is this idea that utility is even well defined given a dubious prior where it is not well defined. It’s not like humans use theory-length prior, anyway.
What you can do is use “speed prior” or variation thereof. It discounts for size of universe (-ish), making the sum converge.
Note that it still leaves any practical agent with a potential problem, in that arguments by potentially hostile parties may bias it’s approximations of the utility, bu providing speculations which involve large, but not physically impossible under known laws of physics, utilities, which are highly speculative and thus the approximate utility calculations do not equally adjust both sides of the utility comparisons.
Sum not converging is reason enough; its not that there’s potential “pascal’s mugging” problem, it’s that the utility is undefined entirely.
For any prior with infinitely many possibilities, you can come up with some non-converging utility function. Does that mean we can change how likely things are by changing what we want?
The other strategy is to change your utility function, but that doesn’t seem right either. Should I care less about 3^^^3 people just because it’s a situation that might actually come up?
For any prior with infinitely many possibilities, you can come up with some non-converging utility function. Does that mean we can change how likely things are by changing what we want?
Prior is not how likely things are. It’s just a way to slice the probability of 1 among the competing hypotheses. You allocate slices by length, you get that length based prior, you allocate slices by runtime and length, you get the speed prior.
Ideally you’d want to quantify all symmetries in the evidence and somehow utilize those, so that you immediately get prior of 1⁄6 for a side of symmetric die when you can’t make predictions. But the theory-length prior doesn’t do that either.
The other strategy is to change your utility function, but that doesn’t seem right either. Should I care less about 3^^^3 people just because it’s a situation that might actually come up?
It seems to me that such situation really should get unlikely faster than 2^-length gets small.
Prior is not how likely things are. It’s just a way to slice the probability of 1 among the competing hypotheses.
And I could allocate it so that there is almost certainly a god, or even so there is certainly a god. That wouldn’t be a good idea though, would it?
It seems to me that such situation really should get unlikely faster than 2^-length gets small.
What would you suggest to someone who had a different utility function, where you run into this problem when using the speed prior?
Also, the speed prior looks bad. It predicts the universe should be small and short-lived. This is not what we have observed.
Do you think there is a universe outside of our past light cone? It would increase the program length to limit it to that, but not nearly as much as it would decrease the run time.
And I could allocate it so that there is almost certainly a god, or even so there is certainly a god. That wouldn’t be a good idea though, would it?
There isn’t single “solomonoff induction”, choice of the machine is arbitrary and for some machines the simplest way to encode our universe is through some form of god (the creator/owner of a simulation, if you wish). In any case the prior for universe with god is not that much smaller than prior for universe without, because you can obtain a sentient being simply by picking data out of any universe where such evolves. Note that these models with some god work just fine, and no, even though I am an atheist, I don’t see what’s the big deal.
Also, the speed prior looks bad. It predicts the universe should be small and short-lived. This is not what we have observed.
The second source of problems is attribution of reality to internals of the prediction method. I don’t sure it is valid for either prior. Laws of the universe are most concisely expressed as properties which hold everywhere rather than as calculation rules of some kind; the rules are derived as alternate structures that share same properties.
In the case of literal mugging by someone who forgot their gun and decided to talk about matrix instead, the large half is that if you pay you have less money for a potential mugger who, when asked for a proof, said, ok, and made a display appear in front of you, showing something impressive.
I’m thinking there’s three entirely different issues here:
1: Priors may be too high for some reason (e.g. 2^-(theory length) priors do not lend to a converging sum). It looks like invalid actions resulting from legitimate probability assignment and legitimate expected utility calculation, but it really isn’t—the sum does not converge and it’s apparent sign depends to the order of summation. It’s just a case of bad math being bad.
2: Low probability that comes from huge number of alternative scenarios or shakiness of the argument also relates to inability to evaluate other actions sufficiently—invalid expected utility estimate due to partial sum. The total is unreasonably biased by the choice of the terms which are summed (in the expected utility calculation).
3: Ignoring general tendency of actions conditional on evidence to have higher utility than actions not conditional on evidence. (As is the case for that literal mugger example). Not considering alternatives conditional on evidence (e.g. “decide to pay only to a mugger with a proof” is a valid action). The utility of assets (money you have) is easy to under-evaluate because it requires modelling of the future situations and your responses.
edit: also it’s obviously wrong to just ignore low probability scenarios with some cost. When observing proper lab safety precautions, or looking both ways when crossing the street, you’re doing just that. Likewise for not playing other variations of Russian Roulette. The issue tends to arise when scenarios are purely speculative, which makes me think that it’s speculations that are to blame—you assign too high probability to a speculation, and then, when estimating the utility sum, you neglect to apply appropriate adjustment (if the scenario was chosen at random, regression to the mean) for the incompleteness of your sum.
I remember this being one of the solutions people came up with in some of the very early discussions about Pascal’s mugging, but it is generally considered highly unsatisfactory. To keep from an action that would be seen positive-expected-sum by itself because one’s worried that “some Matrix Lord may appear with evidence in the future requiring my resources”, only worsens the problem transforming it into the muggerless and worse variety of Pascal’s mugging—which would prevent you from ever using any resources for any reasons, even ones considered prudent.
E.g. “Should I install a fire-alarm with 100 dollars for the purposes of early warning in cases of a fire?” “No, I will have then less resources in case a Matrix-Lord comes with evidence and requires them of me.” A mind that utilized such a logic would no longer even need a mugger in the first place to fall into insanity...
Besides even if we specified “require evidence before allocating resources” what is the limiting factor for what sort of evidence is to be considered good enough?
You might die before you meet some matrix lord, you know. Fire alarm wise, you’re in the clear. And if you have #1, it’s not pascal’s mugging situation, it’s “your utility function does not work at all” situation, you need to either use bounded utility or use speed prior (which makes priors for that smaller).
edit: and even if your priors are correct, you’re still facing the problem that your sums are not complete.
I am reminded of the Island Where Dreams Come True in The Voyage of the Dawn Treader, which is exactly what its name says. Not daydreams or longings, but all of your worst nightmares. Having once imagined a thing calls it into existence there.
The muggerless mugging follows from giving a hypothesis some credence just because you imagined it, otherwise called the Solomonoff prior. I recall Eliezer writing here some years ago that he did not have a solution. I don’t know if he has since found one.
I’ve mentioned elsewhere that this is generally what causes it. The problem is, is that really a good enough reason to use different priors? Consider the similar situation where someone rejects the 2^-(theory length) priors on the basis that it would say God doesn’t exist, and they don’t want to deal with that.
Are you saying you can get around it just by using better math, instead of messing with priors?
Sum not converging is reason enough; its not that there’s potential “pascal’s mugging” problem, it’s that the utility is undefined entirely.
That prior doesn’t say God doesn’t exist; some very incompetent people who explain said prior tell that it does, but the fact is that we do not know and will never know. At most, Gods are not much longer to encode than universes where intelligent life evolves, anyway (hence the Gods in form of superintelligences, owners of our simulation and so on).
What do you mean? The “bad math” is this idea that utility is even well defined given a dubious prior where it is not well defined. It’s not like humans use theory-length prior, anyway.
What you can do is use “speed prior” or variation thereof. It discounts for size of universe (-ish), making the sum converge.
Note that it still leaves any practical agent with a potential problem, in that arguments by potentially hostile parties may bias it’s approximations of the utility, bu providing speculations which involve large, but not physically impossible under known laws of physics, utilities, which are highly speculative and thus the approximate utility calculations do not equally adjust both sides of the utility comparisons.
For any prior with infinitely many possibilities, you can come up with some non-converging utility function. Does that mean we can change how likely things are by changing what we want?
The other strategy is to change your utility function, but that doesn’t seem right either. Should I care less about 3^^^3 people just because it’s a situation that might actually come up?
Prior is not how likely things are. It’s just a way to slice the probability of 1 among the competing hypotheses. You allocate slices by length, you get that length based prior, you allocate slices by runtime and length, you get the speed prior.
Ideally you’d want to quantify all symmetries in the evidence and somehow utilize those, so that you immediately get prior of 1⁄6 for a side of symmetric die when you can’t make predictions. But the theory-length prior doesn’t do that either.
It seems to me that such situation really should get unlikely faster than 2^-length gets small.
And I could allocate it so that there is almost certainly a god, or even so there is certainly a god. That wouldn’t be a good idea though, would it?
What would you suggest to someone who had a different utility function, where you run into this problem when using the speed prior?
Also, the speed prior looks bad. It predicts the universe should be small and short-lived. This is not what we have observed.
Do you think there is a universe outside of our past light cone? It would increase the program length to limit it to that, but not nearly as much as it would decrease the run time.
There isn’t single “solomonoff induction”, choice of the machine is arbitrary and for some machines the simplest way to encode our universe is through some form of god (the creator/owner of a simulation, if you wish). In any case the prior for universe with god is not that much smaller than prior for universe without, because you can obtain a sentient being simply by picking data out of any universe where such evolves. Note that these models with some god work just fine, and no, even though I am an atheist, I don’t see what’s the big deal.
The second source of problems is attribution of reality to internals of the prediction method. I don’t sure it is valid for either prior. Laws of the universe are most concisely expressed as properties which hold everywhere rather than as calculation rules of some kind; the rules are derived as alternate structures that share same properties.