The real trick of pascals wager is the idea that they’re generally no more likely than their opposite.
Technically true, but half of them are more likely then their opposite, and the other half are less likely. If the payoff is large enough, that difference will be sufficient to cause trouble.
Probability is in the mind. You always know which is more likely. It’s the one you think is more likely.
People are sort of built to set probabilities to Schelling points which would make it difficult, but you’d still have some intuition or something pointing a little in one direction.
If you’re going to be betting based on what you think is less likely, then I would like to play with you too.
OK, telegraphic writing and reading is failing me. I’m looking for the meaning behind this and I get stuck on the idea that P_MoreLikely = 1 - P_LessLikely at least if there are only two choices, so I can’t figure out the important difference between betting on what is more likely and betting on what is less likely.
The point of my post is that probability is hardly just what you think it is. And that there are plenty of ways that people actually think about the probability of poker hands that turn out to be quite consistent with their losing money. Far be it from me to infer publicly that that means they were “wrong” about such a subjective thing as probability. But I am happy to collect their money.
Probability is in the mind. If you know a coin is biased, but you don’t know which way it’s biased, then the first flip is fair. If you suspect that it’s biased towards heads, then it’s biased towards heads.
You could also think of yourself as a coin. Nobody is stupid enough to be biased towards wrong. You’d have to be smart to manage that. You might have biases in each individual decision that make you consistently wrong, but if you have a bucket of coins, and you know that they all are biased but more are biased towards landing on heads then landing on tails, then if you take a coin out of the bucket and flip it, it’s biased towards heads.
If you know you’re not logically omniscient, the correct action isn’t to set all probabilities to 50%. It’s to try and find your biases and correct for them, but use whatever you have at your disposal until then.
I read the probability post you referenced. The question is WHAT is in your mind. If one person has a whole hell of a lot more correctly determined Bayesian conclusions about poker hands than another, and the two of them play poker, they will both bet based on what is in their heads. The one with the better refined knowledge about poker hands will take money, on average, from the one with the worser knowledge. If the game is fixed that might change things, but if the game is fixed and neither of them has prior knowledge of this, it is still more likely the knowledgable player will figure out how the game is fixed, and how to exploit that, than the less knowledgable player.
So if we disagree about the probability of something, do you just agree that for you the probability is p and for me it is p’? I don’t. The frequentist interpretation of probability doesn’t exist because people are idiots, rather it exists because for a very broad range of things it provides an excellent map of the world. If I think I am going to be just as good at poker because me and my opponent both have heads and probability is just in our heads, and my opponent simply knows more about the odds of poker, I will lose. We both just had probabilties in our heads, though. And if my opponent had known LESS about poker, it would have appeared that mine were at least as good as his. But someone who thinks probabilities are whatever he thinks they are is precisely the kind of person you want to bet against. Not being a frequentist does not excuse you from the very real distributions of outcomes the world will give you in dealing out cards from a shuffled deck.
If you know a coin is biased, but you don’t know which way it’s biased, then the first flip is fair.
By that you mean you would not expect to do better betting on heads vs tails. OK.
If you suspect that it’s biased towards heads, then it’s biased towards heads.
No, your suspicions can not bend reality. If it comes up heads first, then you would think it more probable that it is biased towards heads than that it is biased towards tails. You can’t even assign a numerical probability other than >50% to it coming up heads a 2nd time without knowing more about how it might be biased. Is it biased in a way which gives it runs (more likely to hit heads a 2nd time after hitting it the first?) Is it biased in a way that gives it at most a 5% deviation from fair? Even having access to a very long sequence of results from the biased coin doesn’t let you easily determine what the bias is. What if it is biased in a way so that every 67th flip is heads? How long before you notice that?
Yes detecting bias is important, but so is figuring the odds when games are fair, when things are as they seem to be. There is a tremendous amount of money to be made and lost playing fair games.
In the case of literal mugging by someone who forgot their gun and decided to talk about matrix instead, the large half is that if you pay you have less money for a potential mugger who, when asked for a proof, said, ok, and made a display appear in front of you, showing something impressive.
I’m thinking there’s three entirely different issues here:
1: Priors may be too high for some reason (e.g. 2^-(theory length) priors do not lend to a converging sum). It looks like invalid actions resulting from legitimate probability assignment and legitimate expected utility calculation, but it really isn’t—the sum does not converge and it’s apparent sign depends to the order of summation. It’s just a case of bad math being bad.
2: Low probability that comes from huge number of alternative scenarios or shakiness of the argument also relates to inability to evaluate other actions sufficiently—invalid expected utility estimate due to partial sum. The total is unreasonably biased by the choice of the terms which are summed (in the expected utility calculation).
3: Ignoring general tendency of actions conditional on evidence to have higher utility than actions not conditional on evidence. (As is the case for that literal mugger example). Not considering alternatives conditional on evidence (e.g. “decide to pay only to a mugger with a proof” is a valid action). The utility of assets (money you have) is easy to under-evaluate because it requires modelling of the future situations and your responses.
edit: also it’s obviously wrong to just ignore low probability scenarios with some cost. When observing proper lab safety precautions, or looking both ways when crossing the street, you’re doing just that. Likewise for not playing other variations of Russian Roulette. The issue tends to arise when scenarios are purely speculative, which makes me think that it’s speculations that are to blame—you assign too high probability to a speculation, and then, when estimating the utility sum, you neglect to apply appropriate adjustment (if the scenario was chosen at random, regression to the mean) for the incompleteness of your sum.
In the case of literal mugging by someone who forgot their gun and decided to talk about matrix instead, the large half is that if you pay you have less money for a potential mugger who, when asked for a proof, said, ok, and made a display appear in front of you, showing something impressive.
I remember this being one of the solutions people came up with in some of the very early discussions about Pascal’s mugging, but it is generally considered highly unsatisfactory. To keep from an action that would be seen positive-expected-sum by itself because one’s worried that “some Matrix Lord may appear with evidence in the future requiring my resources”, only worsens the problem transforming it into the muggerless and worse variety of Pascal’s mugging—which would prevent you from ever using any resources for any reasons, even ones considered prudent.
E.g. “Should I install a fire-alarm with 100 dollars for the purposes of early warning in cases of a fire?” “No, I will have then less resources in case a Matrix-Lord comes with evidence and requires them of me.” A mind that utilized such a logic would no longer even need a mugger in the first place to fall into insanity...
Besides even if we specified “require evidence before allocating resources” what is the limiting factor for what sort of evidence is to be considered good enough?
You might die before you meet some matrix lord, you know. Fire alarm wise, you’re in the clear. And if you have #1, it’s not pascal’s mugging situation, it’s “your utility function does not work at all” situation, you need to either use bounded utility or use speed prior (which makes priors for that smaller).
edit: and even if your priors are correct, you’re still facing the problem that your sums are not complete.
E.g. “Should I install a fire-alarm with 100 dollars for the purposes of early warning in cases of a fire?” “No, I will have then less resources in case a Matrix-Lord comes with evidence and requires them of me.” A mind that utilized such a logic would no longer even need a mugger in the first place to fall into insanity...
I am reminded of the Island Where Dreams Come True in The Voyage of the Dawn Treader, which is exactly what its name says. Not daydreams or longings, but all of your worst nightmares. Having once imagined a thing calls it into existence there.
The muggerless mugging follows from giving a hypothesis some credence just because you imagined it, otherwise called the Solomonoff prior. I recall Eliezer writing here some years ago that he did not have a solution. I don’t know if he has since found one.
Priors may be too high for some reason (e.g. 2^-(theory length) priors do not lend to a converging sum).
I’ve mentioned elsewhere that this is generally what causes it. The problem is, is that really a good enough reason to use different priors? Consider the similar situation where someone rejects the 2^-(theory length) priors on the basis that it would say God doesn’t exist, and they don’t want to deal with that.
It’s just a case of bad math being bad.
Are you saying you can get around it just by using better math, instead of messing with priors?
The problem is, is that really a good enough reason to use different priors?
Sum not converging is reason enough; its not that there’s potential “pascal’s mugging” problem, it’s that the utility is undefined entirely.
Consider the similar situation where someone rejects the 2^-(theory length) priors on the basis that it would say God doesn’t exist, and they don’t want to deal with that.
That prior doesn’t say God doesn’t exist; some very incompetent people who explain said prior tell that it does, but the fact is that we do not know and will never know. At most, Gods are not much longer to encode than universes where intelligent life evolves, anyway (hence the Gods in form of superintelligences, owners of our simulation and so on).
Are you saying you can get around it just by using better math, instead of messing with priors?
What do you mean? The “bad math” is this idea that utility is even well defined given a dubious prior where it is not well defined. It’s not like humans use theory-length prior, anyway.
What you can do is use “speed prior” or variation thereof. It discounts for size of universe (-ish), making the sum converge.
Note that it still leaves any practical agent with a potential problem, in that arguments by potentially hostile parties may bias it’s approximations of the utility, bu providing speculations which involve large, but not physically impossible under known laws of physics, utilities, which are highly speculative and thus the approximate utility calculations do not equally adjust both sides of the utility comparisons.
Sum not converging is reason enough; its not that there’s potential “pascal’s mugging” problem, it’s that the utility is undefined entirely.
For any prior with infinitely many possibilities, you can come up with some non-converging utility function. Does that mean we can change how likely things are by changing what we want?
The other strategy is to change your utility function, but that doesn’t seem right either. Should I care less about 3^^^3 people just because it’s a situation that might actually come up?
For any prior with infinitely many possibilities, you can come up with some non-converging utility function. Does that mean we can change how likely things are by changing what we want?
Prior is not how likely things are. It’s just a way to slice the probability of 1 among the competing hypotheses. You allocate slices by length, you get that length based prior, you allocate slices by runtime and length, you get the speed prior.
Ideally you’d want to quantify all symmetries in the evidence and somehow utilize those, so that you immediately get prior of 1⁄6 for a side of symmetric die when you can’t make predictions. But the theory-length prior doesn’t do that either.
The other strategy is to change your utility function, but that doesn’t seem right either. Should I care less about 3^^^3 people just because it’s a situation that might actually come up?
It seems to me that such situation really should get unlikely faster than 2^-length gets small.
Prior is not how likely things are. It’s just a way to slice the probability of 1 among the competing hypotheses.
And I could allocate it so that there is almost certainly a god, or even so there is certainly a god. That wouldn’t be a good idea though, would it?
It seems to me that such situation really should get unlikely faster than 2^-length gets small.
What would you suggest to someone who had a different utility function, where you run into this problem when using the speed prior?
Also, the speed prior looks bad. It predicts the universe should be small and short-lived. This is not what we have observed.
Do you think there is a universe outside of our past light cone? It would increase the program length to limit it to that, but not nearly as much as it would decrease the run time.
And I could allocate it so that there is almost certainly a god, or even so there is certainly a god. That wouldn’t be a good idea though, would it?
There isn’t single “solomonoff induction”, choice of the machine is arbitrary and for some machines the simplest way to encode our universe is through some form of god (the creator/owner of a simulation, if you wish). In any case the prior for universe with god is not that much smaller than prior for universe without, because you can obtain a sentient being simply by picking data out of any universe where such evolves. Note that these models with some god work just fine, and no, even though I am an atheist, I don’t see what’s the big deal.
Also, the speed prior looks bad. It predicts the universe should be small and short-lived. This is not what we have observed.
The second source of problems is attribution of reality to internals of the prediction method. I don’t sure it is valid for either prior. Laws of the universe are most concisely expressed as properties which hold everywhere rather than as calculation rules of some kind; the rules are derived as alternate structures that share same properties.
Technically true, but half of them are more likely then their opposite, and the other half are less likely. If the payoff is large enough, that difference will be sufficient to cause trouble.
This would matter if you KNEW which half was which. Which you generally don’t.
Probability is in the mind. You always know which is more likely. It’s the one you think is more likely.
People are sort of built to set probabilities to Schelling points which would make it difficult, but you’d still have some intuition or something pointing a little in one direction.
I would very much enjoy playing poker with you for money.
If you’re going to be betting based on what you think is less likely, then I would like to play with you too.
OK, telegraphic writing and reading is failing me. I’m looking for the meaning behind this and I get stuck on the idea that P_MoreLikely = 1 - P_LessLikely at least if there are only two choices, so I can’t figure out the important difference between betting on what is more likely and betting on what is less likely.
The point of my post is that probability is hardly just what you think it is. And that there are plenty of ways that people actually think about the probability of poker hands that turn out to be quite consistent with their losing money. Far be it from me to infer publicly that that means they were “wrong” about such a subjective thing as probability. But I am happy to collect their money.
Probability is in the mind. If you know a coin is biased, but you don’t know which way it’s biased, then the first flip is fair. If you suspect that it’s biased towards heads, then it’s biased towards heads.
You could also think of yourself as a coin. Nobody is stupid enough to be biased towards wrong. You’d have to be smart to manage that. You might have biases in each individual decision that make you consistently wrong, but if you have a bucket of coins, and you know that they all are biased but more are biased towards landing on heads then landing on tails, then if you take a coin out of the bucket and flip it, it’s biased towards heads.
If you know you’re not logically omniscient, the correct action isn’t to set all probabilities to 50%. It’s to try and find your biases and correct for them, but use whatever you have at your disposal until then.
I read the probability post you referenced. The question is WHAT is in your mind. If one person has a whole hell of a lot more correctly determined Bayesian conclusions about poker hands than another, and the two of them play poker, they will both bet based on what is in their heads. The one with the better refined knowledge about poker hands will take money, on average, from the one with the worser knowledge. If the game is fixed that might change things, but if the game is fixed and neither of them has prior knowledge of this, it is still more likely the knowledgable player will figure out how the game is fixed, and how to exploit that, than the less knowledgable player.
So if we disagree about the probability of something, do you just agree that for you the probability is p and for me it is p’? I don’t. The frequentist interpretation of probability doesn’t exist because people are idiots, rather it exists because for a very broad range of things it provides an excellent map of the world. If I think I am going to be just as good at poker because me and my opponent both have heads and probability is just in our heads, and my opponent simply knows more about the odds of poker, I will lose. We both just had probabilties in our heads, though. And if my opponent had known LESS about poker, it would have appeared that mine were at least as good as his. But someone who thinks probabilities are whatever he thinks they are is precisely the kind of person you want to bet against. Not being a frequentist does not excuse you from the very real distributions of outcomes the world will give you in dealing out cards from a shuffled deck.
By that you mean you would not expect to do better betting on heads vs tails. OK.
No, your suspicions can not bend reality. If it comes up heads first, then you would think it more probable that it is biased towards heads than that it is biased towards tails. You can’t even assign a numerical probability other than >50% to it coming up heads a 2nd time without knowing more about how it might be biased. Is it biased in a way which gives it runs (more likely to hit heads a 2nd time after hitting it the first?) Is it biased in a way that gives it at most a 5% deviation from fair? Even having access to a very long sequence of results from the biased coin doesn’t let you easily determine what the bias is. What if it is biased in a way so that every 67th flip is heads? How long before you notice that?
Yes detecting bias is important, but so is figuring the odds when games are fair, when things are as they seem to be. There is a tremendous amount of money to be made and lost playing fair games.
In the case of literal mugging by someone who forgot their gun and decided to talk about matrix instead, the large half is that if you pay you have less money for a potential mugger who, when asked for a proof, said, ok, and made a display appear in front of you, showing something impressive.
I’m thinking there’s three entirely different issues here:
1: Priors may be too high for some reason (e.g. 2^-(theory length) priors do not lend to a converging sum). It looks like invalid actions resulting from legitimate probability assignment and legitimate expected utility calculation, but it really isn’t—the sum does not converge and it’s apparent sign depends to the order of summation. It’s just a case of bad math being bad.
2: Low probability that comes from huge number of alternative scenarios or shakiness of the argument also relates to inability to evaluate other actions sufficiently—invalid expected utility estimate due to partial sum. The total is unreasonably biased by the choice of the terms which are summed (in the expected utility calculation).
3: Ignoring general tendency of actions conditional on evidence to have higher utility than actions not conditional on evidence. (As is the case for that literal mugger example). Not considering alternatives conditional on evidence (e.g. “decide to pay only to a mugger with a proof” is a valid action). The utility of assets (money you have) is easy to under-evaluate because it requires modelling of the future situations and your responses.
edit: also it’s obviously wrong to just ignore low probability scenarios with some cost. When observing proper lab safety precautions, or looking both ways when crossing the street, you’re doing just that. Likewise for not playing other variations of Russian Roulette. The issue tends to arise when scenarios are purely speculative, which makes me think that it’s speculations that are to blame—you assign too high probability to a speculation, and then, when estimating the utility sum, you neglect to apply appropriate adjustment (if the scenario was chosen at random, regression to the mean) for the incompleteness of your sum.
I remember this being one of the solutions people came up with in some of the very early discussions about Pascal’s mugging, but it is generally considered highly unsatisfactory. To keep from an action that would be seen positive-expected-sum by itself because one’s worried that “some Matrix Lord may appear with evidence in the future requiring my resources”, only worsens the problem transforming it into the muggerless and worse variety of Pascal’s mugging—which would prevent you from ever using any resources for any reasons, even ones considered prudent.
E.g. “Should I install a fire-alarm with 100 dollars for the purposes of early warning in cases of a fire?” “No, I will have then less resources in case a Matrix-Lord comes with evidence and requires them of me.” A mind that utilized such a logic would no longer even need a mugger in the first place to fall into insanity...
Besides even if we specified “require evidence before allocating resources” what is the limiting factor for what sort of evidence is to be considered good enough?
You might die before you meet some matrix lord, you know. Fire alarm wise, you’re in the clear. And if you have #1, it’s not pascal’s mugging situation, it’s “your utility function does not work at all” situation, you need to either use bounded utility or use speed prior (which makes priors for that smaller).
edit: and even if your priors are correct, you’re still facing the problem that your sums are not complete.
I am reminded of the Island Where Dreams Come True in The Voyage of the Dawn Treader, which is exactly what its name says. Not daydreams or longings, but all of your worst nightmares. Having once imagined a thing calls it into existence there.
The muggerless mugging follows from giving a hypothesis some credence just because you imagined it, otherwise called the Solomonoff prior. I recall Eliezer writing here some years ago that he did not have a solution. I don’t know if he has since found one.
I’ve mentioned elsewhere that this is generally what causes it. The problem is, is that really a good enough reason to use different priors? Consider the similar situation where someone rejects the 2^-(theory length) priors on the basis that it would say God doesn’t exist, and they don’t want to deal with that.
Are you saying you can get around it just by using better math, instead of messing with priors?
Sum not converging is reason enough; its not that there’s potential “pascal’s mugging” problem, it’s that the utility is undefined entirely.
That prior doesn’t say God doesn’t exist; some very incompetent people who explain said prior tell that it does, but the fact is that we do not know and will never know. At most, Gods are not much longer to encode than universes where intelligent life evolves, anyway (hence the Gods in form of superintelligences, owners of our simulation and so on).
What do you mean? The “bad math” is this idea that utility is even well defined given a dubious prior where it is not well defined. It’s not like humans use theory-length prior, anyway.
What you can do is use “speed prior” or variation thereof. It discounts for size of universe (-ish), making the sum converge.
Note that it still leaves any practical agent with a potential problem, in that arguments by potentially hostile parties may bias it’s approximations of the utility, bu providing speculations which involve large, but not physically impossible under known laws of physics, utilities, which are highly speculative and thus the approximate utility calculations do not equally adjust both sides of the utility comparisons.
For any prior with infinitely many possibilities, you can come up with some non-converging utility function. Does that mean we can change how likely things are by changing what we want?
The other strategy is to change your utility function, but that doesn’t seem right either. Should I care less about 3^^^3 people just because it’s a situation that might actually come up?
Prior is not how likely things are. It’s just a way to slice the probability of 1 among the competing hypotheses. You allocate slices by length, you get that length based prior, you allocate slices by runtime and length, you get the speed prior.
Ideally you’d want to quantify all symmetries in the evidence and somehow utilize those, so that you immediately get prior of 1⁄6 for a side of symmetric die when you can’t make predictions. But the theory-length prior doesn’t do that either.
It seems to me that such situation really should get unlikely faster than 2^-length gets small.
And I could allocate it so that there is almost certainly a god, or even so there is certainly a god. That wouldn’t be a good idea though, would it?
What would you suggest to someone who had a different utility function, where you run into this problem when using the speed prior?
Also, the speed prior looks bad. It predicts the universe should be small and short-lived. This is not what we have observed.
Do you think there is a universe outside of our past light cone? It would increase the program length to limit it to that, but not nearly as much as it would decrease the run time.
There isn’t single “solomonoff induction”, choice of the machine is arbitrary and for some machines the simplest way to encode our universe is through some form of god (the creator/owner of a simulation, if you wish). In any case the prior for universe with god is not that much smaller than prior for universe without, because you can obtain a sentient being simply by picking data out of any universe where such evolves. Note that these models with some god work just fine, and no, even though I am an atheist, I don’t see what’s the big deal.
The second source of problems is attribution of reality to internals of the prediction method. I don’t sure it is valid for either prior. Laws of the universe are most concisely expressed as properties which hold everywhere rather than as calculation rules of some kind; the rules are derived as alternate structures that share same properties.