What exactly do you mean “guaranteed to be optimal”? If you assign probabilities correctly and you understand your situation->utility mapping, your optimal choice is that which maximizes the utility in the probability-weighted futures you will experience. That’s as optimal as it gets, the word “guarantee” is confusing. It’s kind of implied by the name—if you make any other choice, you expect to get less utility.
I don’t think post-hoc modification of the probability is what I’m saying. I’m recommending sane priors that give you consistent beliefs that don’t include non-infinitesimal chances of insane-magnitude events.
In any finite case, probability is involved. You could, just by chance, lose every bet you take. So no method is guaranteed to be optimal. (Unless you of course specify what criterion to decide what probability distribution of utilities is optimal. Like the one with the highest mean or median, or some other estimator.)
However, in the infinite case (which I admit is kind of hand wavy, but this is just the simplest intuitive explanation), the probabilities cancel out. You will win exactly 50% of bets you take at 50% odds. So you will win more than you lose on average, on any bet you take with positive expected utility. And lose more than you win on any with negative expected utility. Therefore taking any bet with positive expected utility is optimal (not counting the complexities involved with losing money that could be invested at other bets at a future time.)
I’m recommending sane priors that give you consistent beliefs that don’t include non-infinitesimal chances of insane-magnitude events.
Solomonoff induction is a perfectly sane prior. It will produce perfectly reasonable predictions, and I’d expect it to be right far more than any other method. But it doesn’t arbitrarily discount hypotheses just because they contain more magnitude.
But it’s not specific to Solomonoff induction. I don’t believe that high magnitude events are impossible. An example stolen from EY, what if we discover the laws of physics allow large number amounts of computation? Or that the universe is infinite? Or endless? And/or that entropy can be reversed, or energy created? Or any number of other possibilities?
If so, then we humans, alive today, have the possibility of creating and influencing an infinite number of other beings. Every decision you make today would be a Pascal’s wager situation, with a possibility of affecting an infinite number of beings for an endless amount of time.
We can’t argue about priors of course. If you really believe that, as a prior, that situation is impossible, no argument can convince you otherwise. But I don’t think you really do believe it’s impossible, or that it’s a good model of reality. It seems more like a convenient way of avoiding the mugging.
Why not? In any practical statistical problem, a Bayesian must choose a prior, therefore must think about what prior would be appropriate. To think is to have an argument with oneself, and when two statisticians discuss the matter, they are arguing with each other.
So people manifestly do argue about priors; which you say is impossible.
If two agents have different priors, then they will always come to different conclusions, even if they have the same evidence and arguments available to them. It’s the lowest level of debate. You can’t go any further.
Choosing a good prior for a statistical model is a very different thing than actually talking about your own prior. If parent comment really believes, a priori, that something has 1/3^^^3 probability, then no argument or evidence could convince him otherwise.
Choosing a good prior for a statistical model is a very different thing than actually talking about your own prior.
Can you clarify what you mean by “your own prior”, contrasting it with “choosing a good prior for a statistical model”?
This is what I think you mean. A prior for a statistical model, in the practice of Bayesian statistics on practical problems, is a distribution over a class of hypotheses ℋ (often, a distribution for the parameters of a statistical model), which one confronts with the data D to compute a posterior distribution over ℋ by Bayes’ theorem. A good prior is a compromise between summarising existing knowledge on the subject and being open to substantial update away from that knowledge, so that the data have a chance to be heard, even if they contradict it. The data may show the original choice of prior to have been wrong (e.g. the prior specified parameters for a certain family of distributions, while the data clearly do not belong to that family for any value of the parameters). In that case, the prior must be changed. This is called model checking. It is a process that in a broad and intuitive sense can be considered to be in the spirit of Bayesian reasoning, but mathematically is not an application of Bayes’ theorem.
The broad and intuitive sense is that the process of model checking can be imagined as involving a prior that is prior to the one expressed as a distribution over ℋ. The activity of updating a prior over ℋ by the data D was all carried out conditional upon the hypothesis that the truth lay within ℋ, but when that hypothesis proves untenable, one must enlarge ℋ to some larger class. And then, if the data remain obstinately poorly modelled, to a larger class still. But there cannot be an infinite regress: ultimately (according to the view I am extrapolating from your remarks so far) you must have some prior over all possible hypotheses, a universal prior, beyond which you cannot go. This is what you are referring to as “your own prior”. It is an unchangeable part of you: even if you can get to see what it is, you are powerless to change it, for to change it you would have to have some prior over an even larger class, but there is no larger class.
Is this what you mean?
ETA: See also Robin Hanson’s paper putting limitations on the possibility of rational agents having different priors. Irrational agents, of course, need not have priors at all, nor need they perform Bayesian reasoning.
No it is not. If you put different numbers into the prior, then the probability produced by a bayesian update will always be different. If the evidence is strong enough, it might not matter too much. But if one of the priors is many orders of magnitude difference, then it matters quite a lot.
If people have different priors both for the hypothesis and for the evidence, it is obvious, as Lumifer said, that those can combine to give the same posterior for the hypothesis, given the evidence, since I can make the posterior any value I like by setting the prior for the evidence appropriately.
You don’t get to set the prior for the evidence. Your prior distribution over the evidence is determined by your prior over the hypotheses by P(E) = the sum of P(E|H)P(H) over all hypotheses H. For each H, the distribution P(E|H) over all E is what H is.
But the point holds, that the same evidence can update different priors to identical posteriors. For example, one of the four aces from a deck is selected, not necessarily from a uniform distribution. Person A’s prior distribution for which ace it is is spades 0.4, clubs 0.4, heart 0.1, diamonds 0.1. B’s prior is 0.2, 0.2, 0.3,0.3. Evidence is given to them: the card is red. They reach the same posterior: spades 0, clubs 0, hearts 0.5, diamonds 0.5.
Some might quibble over the use of probabilities of zero, and there may well be a theorem to say that if all the distributions involved are everywhere nonzero the mapping of priors to posteriors is 1-1, but the underlying point will remain in a different form: for any separation between posteriors, however small, and any separation between the priors, however large, some observation is strong enough evidence to transform those priors into posteriors that close. (I have not actually proved this as a theorem, but something along those lines should be true.)
What exactly do you mean “guaranteed to be optimal”? If you assign probabilities correctly and you understand your situation->utility mapping, your optimal choice is that which maximizes the utility in the probability-weighted futures you will experience. That’s as optimal as it gets, the word “guarantee” is confusing. It’s kind of implied by the name—if you make any other choice, you expect to get less utility.
I don’t think post-hoc modification of the probability is what I’m saying. I’m recommending sane priors that give you consistent beliefs that don’t include non-infinitesimal chances of insane-magnitude events.
In any finite case, probability is involved. You could, just by chance, lose every bet you take. So no method is guaranteed to be optimal. (Unless you of course specify what criterion to decide what probability distribution of utilities is optimal. Like the one with the highest mean or median, or some other estimator.)
However, in the infinite case (which I admit is kind of hand wavy, but this is just the simplest intuitive explanation), the probabilities cancel out. You will win exactly 50% of bets you take at 50% odds. So you will win more than you lose on average, on any bet you take with positive expected utility. And lose more than you win on any with negative expected utility. Therefore taking any bet with positive expected utility is optimal (not counting the complexities involved with losing money that could be invested at other bets at a future time.)
Solomonoff induction is a perfectly sane prior. It will produce perfectly reasonable predictions, and I’d expect it to be right far more than any other method. But it doesn’t arbitrarily discount hypotheses just because they contain more magnitude.
But it’s not specific to Solomonoff induction. I don’t believe that high magnitude events are impossible. An example stolen from EY, what if we discover the laws of physics allow large number amounts of computation? Or that the universe is infinite? Or endless? And/or that entropy can be reversed, or energy created? Or any number of other possibilities?
If so, then we humans, alive today, have the possibility of creating and influencing an infinite number of other beings. Every decision you make today would be a Pascal’s wager situation, with a possibility of affecting an infinite number of beings for an endless amount of time.
We can’t argue about priors of course. If you really believe that, as a prior, that situation is impossible, no argument can convince you otherwise. But I don’t think you really do believe it’s impossible, or that it’s a good model of reality. It seems more like a convenient way of avoiding the mugging.
Why not? In any practical statistical problem, a Bayesian must choose a prior, therefore must think about what prior would be appropriate. To think is to have an argument with oneself, and when two statisticians discuss the matter, they are arguing with each other.
So people manifestly do argue about priors; which you say is impossible.
If two agents have different priors, then they will always come to different conclusions, even if they have the same evidence and arguments available to them. It’s the lowest level of debate. You can’t go any further.
Choosing a good prior for a statistical model is a very different thing than actually talking about your own prior. If parent comment really believes, a priori, that something has 1/3^^^3 probability, then no argument or evidence could convince him otherwise.
Can you clarify what you mean by “your own prior”, contrasting it with “choosing a good prior for a statistical model”?
This is what I think you mean. A prior for a statistical model, in the practice of Bayesian statistics on practical problems, is a distribution over a class of hypotheses ℋ (often, a distribution for the parameters of a statistical model), which one confronts with the data D to compute a posterior distribution over ℋ by Bayes’ theorem. A good prior is a compromise between summarising existing knowledge on the subject and being open to substantial update away from that knowledge, so that the data have a chance to be heard, even if they contradict it. The data may show the original choice of prior to have been wrong (e.g. the prior specified parameters for a certain family of distributions, while the data clearly do not belong to that family for any value of the parameters). In that case, the prior must be changed. This is called model checking. It is a process that in a broad and intuitive sense can be considered to be in the spirit of Bayesian reasoning, but mathematically is not an application of Bayes’ theorem.
The broad and intuitive sense is that the process of model checking can be imagined as involving a prior that is prior to the one expressed as a distribution over ℋ. The activity of updating a prior over ℋ by the data D was all carried out conditional upon the hypothesis that the truth lay within ℋ, but when that hypothesis proves untenable, one must enlarge ℋ to some larger class. And then, if the data remain obstinately poorly modelled, to a larger class still. But there cannot be an infinite regress: ultimately (according to the view I am extrapolating from your remarks so far) you must have some prior over all possible hypotheses, a universal prior, beyond which you cannot go. This is what you are referring to as “your own prior”. It is an unchangeable part of you: even if you can get to see what it is, you are powerless to change it, for to change it you would have to have some prior over an even larger class, but there is no larger class.
Is this what you mean?
ETA: See also Robin Hanson’s paper putting limitations on the possibility of rational agents having different priors. Irrational agents, of course, need not have priors at all, nor need they perform Bayesian reasoning.
That’s just plain false, and obviously so.
No it is not. If you put different numbers into the prior, then the probability produced by a bayesian update will always be different. If the evidence is strong enough, it might not matter too much. But if one of the priors is many orders of magnitude difference, then it matters quite a lot.
If people have different priors both for the hypothesis and for the evidence, it is obvious, as Lumifer said, that those can combine to give the same posterior for the hypothesis, given the evidence, since I can make the posterior any value I like by setting the prior for the evidence appropriately.
You don’t get to set the prior for the evidence. Your prior distribution over the evidence is determined by your prior over the hypotheses by P(E) = the sum of P(E|H)P(H) over all hypotheses H. For each H, the distribution P(E|H) over all E is what H is.
But the point holds, that the same evidence can update different priors to identical posteriors. For example, one of the four aces from a deck is selected, not necessarily from a uniform distribution. Person A’s prior distribution for which ace it is is spades 0.4, clubs 0.4, heart 0.1, diamonds 0.1. B’s prior is 0.2, 0.2, 0.3,0.3. Evidence is given to them: the card is red. They reach the same posterior: spades 0, clubs 0, hearts 0.5, diamonds 0.5.
Some might quibble over the use of probabilities of zero, and there may well be a theorem to say that if all the distributions involved are everywhere nonzero the mapping of priors to posteriors is 1-1, but the underlying point will remain in a different form: for any separation between posteriors, however small, and any separation between the priors, however large, some observation is strong enough evidence to transform those priors into posteriors that close. (I have not actually proved this as a theorem, but something along those lines should be true.)