The problem I was specifically asking to solve is “what if Bayesian updating is flawed”, which I thought was an appropriate discussion on an article about not putting all your trust in any one system.
Bayes theorem looks solid, but I’ve been wrong about theorems before. So has the mathematical community (although not very often and not for this long, but it could happen and should not be assigned 0 probability). I’m slightly sceptical of the uniqueness claim, given I’ve often seen similar proofs which are mathematically sound, but make certain assumptions about what it allowed, and are thus vulnerable to out-of-the-box solutions (Arrow’s impossibility theorem is a good example of this). In fact, given that a significant proportion of statisticians are not Bayesians, I really don’t think this is a good time for absolute faith.
To give another example, suppose tomorrow’s main page article on LW is about an interesting theorem in Bayesian probability, and one which would affect the way you update in certain situations. You can’t quite understand the proof yourself, but the article’s writer is someone whose mathematical ability you respect. In the comments, some other people express concern with certain parts of the proof, but you still can’t quite see for yourself whether its right or wrong. Do you apply it?
Assign a probability 1-epsilon to your belief that Bayesian updating works. Your belief in “Bayesian updating works” is determined by Bayesian updating; you therefore believe with 1-epsilon probability that “Bayesian updating works with probability 1-epsilon”. The base level belief is then held with probability less than 1-epsilon.
As the recursive nature of holding Bayesian beliefs about believing Bayesianly allows chains to tend toward large numbers, the probability of the base level belief tends towards zero.
There is a flaw with Bayesian updating.
I think this is just a semi-formal version of the problem of induction in Bayesian terms, though. Unfortunately the answer to the problem of induction was “pretend it doesn’t exist and things work better”, or something like that.
I think this is a form of double-counting the same evidence. You can only perform Bayesian updating on information that is new; if you try to update on information that you’ve already incorporated, your probability estimate shouldn’t move. But if you take information you’ve already incorporated, shuffle the terms around, and pretend it’s new, then you’re introducing fake evidence and get an incorrect result. You can add a term for “Bayesian updating might not work” to any model, except to a model that already accounts for that, as models of the probability that Bayesian updating works surely do. That’s what’s happening here; you’re adding “there is an epsilon probability that Bayesian updating doesn’t work” as evidence to a model that already uses and contains that information, and counting it twice (and then counting it n times).
You can also fashion a similar problem regarding priors.
Determine what method you should use to assign a prior in a certain situation.
Then determine what method you should use to assign a prior to “I picked the wrong method to assign a prior in that situation”.
Then determine what method you should to assign a prior to “I picked the wrong method to assign a prior to “I picked the wrong method to assign a prior in that situation” ”.
This doesn’t seem like double-counting of anything to me; at no point can you assume you have picked the right method for any prior-assigning with probability 1.
This one is different, in that the evidence you’re introducing is new. However, the magnitude of the effect of each new piece of evidence on your original probability falls off exponentially, such that the original probability converges.
I’m pretty sure there is an error in your reasoning. And I’m pretty sure the source of the error is an unwarranted assumption of independence between propositions which are actually entangled—in fact, logically equivalent.
But I can’t be sure there is an error unless you make your argument more formal (i.e. symbol intensive).
I think it would take the form of X being an outcome, p(X) being the probability of the outcome as determined by Bayesian updating, “p(X) is correct” being the outcome Y, p(Y) being the probability of the outcome as determined by Bayesian updating, “p(Y) is correct” being the outcome Z, and so forth.
If you have any particular style or method of formalising you’d like me to use, mention it, and I’ll see if I can rephrase it in that way.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
p(X) is a measure of my uncertainty about outcome X—“p(X) is correct” is the outcome where I determined my uncertainty about X correctly. There are also outcomes where I incorrectly determined my uncertainty about X. I therefore need to have a measure of my uncertainty about outcome “I determined my uncertainty correctly”.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
The argument went from the initial probability of one proposition being 1-epsilon to the updated probability of the same proposition being less than 1-epsilon, because there was higher-order uncertainty which multiplies through.
A toy example: We are 90% certain that this object is a blegg. Then, we receive evidence that our method for determining 90% certainty gives the wrong answer one case in ten. We are 90% certain that we are 90% certain, or in other words—we are 81% certain that the object in question is a blegg.
Now that we’re 81% certain, we receive evidence that our method is flawed one case in ten—we are now 90% certain that we are 81% certain. Or, we’re 72.9% certain. Etc. Obviously epsilon degrades much slower, but we don’t have any reason to stop applying it to itself.
Thank-you for expressing my worry in much better terms than I managed to. If you like, I’ll link to your comment in my top-level comment.
I still don’t know why everyone thinks this is the problem of induction. You can certainly have an agent which is Bayesian but doesn’t use induction (the prior which assigns equal probability to all possible sequences of observation is non-inductive). I’m not sure if you can have a non-Bayesian that uses induction, because I’m very confused about the whole subject of ideal non-Bayesian agents, but it seems like you probably could.
Interesting that Bayesian updating seems to be flawed if an only if you assign non-zero probability to the claim that is flawed. If I was feeling mischievous I would compare it to a religion, it works so long as you have absolute faith, but if you doubt even for a moment it doesn’t.
I still don’t know why everyone thinks this is the problem of induction.
It’s similar to Hume’s philosophical problem of induction (here and here specifically). Induction in this sense is contrasted with deduction—you could certainly have a Bayesian agent which doesn’t use induction (never draws a generalisation from specific observations) but I think it would necessarily be less efficient and less effective than a Bayesian agent that did.
If you like, I’ll link to your comment in my top-level comment.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
If you like, I’ll link to your comment in my top-level comment.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
I’m actually planning on tackling it myself in the next two weeks or so. I think there might be a solution that has a deductive justification for inductive reasoning. EY has already tackled problems like this but his post seems to be a much stronger variant on Hume’s “it is custom, and it works”—plus a distinction between self-reflective loops and circular loops. That distinction is how I currently rationalise ignoring the problem of induction in everyday life.
Also—I too do not know why I don’t have an overview page.
I’ve often seen similar proofs which are mathematically sound, but make certain assumptions about what it allowed, and are thus vulnerable to out-of-the-box solutions (Arrow’s impossibility theorem is a good example of this).
You have piqued my curiosity. A trick to get around Arrow’s theorem? Do you have a link?
Regarding your main point: Sure, If you want some members of your army of mutant rational agents to be so mutated that they are no longer even Bayesians, well … go ahead. I suppose I have more faith in the rough validity of trial-and-error empiricism than I do in Bayes’s theorem. But not much more faith.
I think there is already a main-page article on this subject, but the general idea is that Arrow’s theorem assumes the voting system is preferential (you vote by ranking voters) and so you can get around it with a non-preferential system.
Range voting (each voter gives each candidate as score out of ten, and the candidate with the highest total wins) is the one that springs most easily to mind, but it has problems of its own, so somebody who knows more about the subject can probably give you a better example.
As for the main point, I doubt you actually put 100% confidence in either idea. In the unlikely event that either approach led you to a contradiction, would you just curl up in a ball and go insane, or abandon it.
I think there is already a main-page article on this subject, but the general idea is that Arrow’s theorem assumes the voting system is preferential (you vote by ranking voters) and so you can get around it with a non-preferential system.
Ah. You mean this posting. It is a good article, and it supports your point about not trusting proofs until you read all of the fine print (with the warning that there is always some fine print that you miss reading).
But it doesn’t really overthrow Arrow. The “workaround” can be “gamed” by the players if they exaggerate the differences between their choices so as to skew the final solution in their own favor.
All deterministic non-dictatorial systems can be gamed to some extent (Gibbard Satterthwaite theorem, I’m reasonably confident that this one doesn’t have a work-around) although range voting is worse than most. That doesn’t change the fact that it is a counter-example to Arrow.
A better one might be approval voting, where you have as many votes as you want but you can’t vote for the same candidate more than once (equivalent to a the degenerate case of ranging where there are only two rankings you can give.
The problem I was specifically asking to solve is “what if Bayesian updating is flawed”, which I thought was an appropriate discussion on an article about not putting all your trust in any one system.
Bayes theorem looks solid, but I’ve been wrong about theorems before. So has the mathematical community (although not very often and not for this long, but it could happen and should not be assigned 0 probability). I’m slightly sceptical of the uniqueness claim, given I’ve often seen similar proofs which are mathematically sound, but make certain assumptions about what it allowed, and are thus vulnerable to out-of-the-box solutions (Arrow’s impossibility theorem is a good example of this). In fact, given that a significant proportion of statisticians are not Bayesians, I really don’t think this is a good time for absolute faith.
To give another example, suppose tomorrow’s main page article on LW is about an interesting theorem in Bayesian probability, and one which would affect the way you update in certain situations. You can’t quite understand the proof yourself, but the article’s writer is someone whose mathematical ability you respect. In the comments, some other people express concern with certain parts of the proof, but you still can’t quite see for yourself whether its right or wrong. Do you apply it?
Assign a probability 1-epsilon to your belief that Bayesian updating works. Your belief in “Bayesian updating works” is determined by Bayesian updating; you therefore believe with 1-epsilon probability that “Bayesian updating works with probability 1-epsilon”. The base level belief is then held with probability less than 1-epsilon.
As the recursive nature of holding Bayesian beliefs about believing Bayesianly allows chains to tend toward large numbers, the probability of the base level belief tends towards zero.
There is a flaw with Bayesian updating.
I think this is just a semi-formal version of the problem of induction in Bayesian terms, though. Unfortunately the answer to the problem of induction was “pretend it doesn’t exist and things work better”, or something like that.
I think this is a form of double-counting the same evidence. You can only perform Bayesian updating on information that is new; if you try to update on information that you’ve already incorporated, your probability estimate shouldn’t move. But if you take information you’ve already incorporated, shuffle the terms around, and pretend it’s new, then you’re introducing fake evidence and get an incorrect result. You can add a term for “Bayesian updating might not work” to any model, except to a model that already accounts for that, as models of the probability that Bayesian updating works surely do. That’s what’s happening here; you’re adding “there is an epsilon probability that Bayesian updating doesn’t work” as evidence to a model that already uses and contains that information, and counting it twice (and then counting it n times).
You can also fashion a similar problem regarding priors.
Determine what method you should use to assign a prior in a certain situation.
Then determine what method you should use to assign a prior to “I picked the wrong method to assign a prior in that situation”.
Then determine what method you should to assign a prior to “I picked the wrong method to assign a prior to “I picked the wrong method to assign a prior in that situation” ”.
This doesn’t seem like double-counting of anything to me; at no point can you assume you have picked the right method for any prior-assigning with probability 1.
This one is different, in that the evidence you’re introducing is new. However, the magnitude of the effect of each new piece of evidence on your original probability falls off exponentially, such that the original probability converges.
I’m pretty sure there is an error in your reasoning. And I’m pretty sure the source of the error is an unwarranted assumption of independence between propositions which are actually entangled—in fact, logically equivalent.
But I can’t be sure there is an error unless you make your argument more formal (i.e. symbol intensive).
I think it would take the form of X being an outcome, p(X) being the probability of the outcome as determined by Bayesian updating, “p(X) is correct” being the outcome Y, p(Y) being the probability of the outcome as determined by Bayesian updating, “p(Y) is correct” being the outcome Z, and so forth.
If you have any particular style or method of formalising you’d like me to use, mention it, and I’ll see if I can rephrase it in that way.
I don’t understand the phrase “p(X) is correct”.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
p(X) is a measure of my uncertainty about outcome X—“p(X) is correct” is the outcome where I determined my uncertainty about X correctly. There are also outcomes where I incorrectly determined my uncertainty about X. I therefore need to have a measure of my uncertainty about outcome “I determined my uncertainty correctly”.
The argument went from the initial probability of one proposition being 1-epsilon to the updated probability of the same proposition being less than 1-epsilon, because there was higher-order uncertainty which multiplies through.
A toy example: We are 90% certain that this object is a blegg. Then, we receive evidence that our method for determining 90% certainty gives the wrong answer one case in ten. We are 90% certain that we are 90% certain, or in other words—we are 81% certain that the object in question is a blegg.
Now that we’re 81% certain, we receive evidence that our method is flawed one case in ten—we are now 90% certain that we are 81% certain. Or, we’re 72.9% certain. Etc. Obviously epsilon degrades much slower, but we don’t have any reason to stop applying it to itself.
Thank-you for expressing my worry in much better terms than I managed to. If you like, I’ll link to your comment in my top-level comment.
I still don’t know why everyone thinks this is the problem of induction. You can certainly have an agent which is Bayesian but doesn’t use induction (the prior which assigns equal probability to all possible sequences of observation is non-inductive). I’m not sure if you can have a non-Bayesian that uses induction, because I’m very confused about the whole subject of ideal non-Bayesian agents, but it seems like you probably could.
Interesting that Bayesian updating seems to be flawed if an only if you assign non-zero probability to the claim that is flawed. If I was feeling mischievous I would compare it to a religion, it works so long as you have absolute faith, but if you doubt even for a moment it doesn’t.
It’s similar to Hume’s philosophical problem of induction (here and here specifically). Induction in this sense is contrasted with deduction—you could certainly have a Bayesian agent which doesn’t use induction (never draws a generalisation from specific observations) but I think it would necessarily be less efficient and less effective than a Bayesian agent that did.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
I’d love to see someone like EY tackle the above comment.
On a side note, why do I get an error if I click on the username of the parent’s author?
I’m actually planning on tackling it myself in the next two weeks or so. I think there might be a solution that has a deductive justification for inductive reasoning. EY has already tackled problems like this but his post seems to be a much stronger variant on Hume’s “it is custom, and it works”—plus a distinction between self-reflective loops and circular loops. That distinction is how I currently rationalise ignoring the problem of induction in everyday life.
Also—I too do not know why I don’t have an overview page.
You have piqued my curiosity. A trick to get around Arrow’s theorem? Do you have a link?
Regarding your main point: Sure, If you want some members of your army of mutant rational agents to be so mutated that they are no longer even Bayesians, well … go ahead. I suppose I have more faith in the rough validity of trial-and-error empiricism than I do in Bayes’s theorem. But not much more faith.
I’m afraid I don’t know how to post links.
I think there is already a main-page article on this subject, but the general idea is that Arrow’s theorem assumes the voting system is preferential (you vote by ranking voters) and so you can get around it with a non-preferential system.
Range voting (each voter gives each candidate as score out of ten, and the candidate with the highest total wins) is the one that springs most easily to mind, but it has problems of its own, so somebody who knows more about the subject can probably give you a better example.
As for the main point, I doubt you actually put 100% confidence in either idea. In the unlikely event that either approach led you to a contradiction, would you just curl up in a ball and go insane, or abandon it.
Ah. You mean this posting. It is a good article, and it supports your point about not trusting proofs until you read all of the fine print (with the warning that there is always some fine print that you miss reading).
But it doesn’t really overthrow Arrow. The “workaround” can be “gamed” by the players if they exaggerate the differences between their choices so as to skew the final solution in their own favor.
All deterministic non-dictatorial systems can be gamed to some extent (Gibbard Satterthwaite theorem, I’m reasonably confident that this one doesn’t have a work-around) although range voting is worse than most. That doesn’t change the fact that it is a counter-example to Arrow.
A better one might be approval voting, where you have as many votes as you want but you can’t vote for the same candidate more than once (equivalent to a the degenerate case of ranging where there are only two rankings you can give.
Thanks for the help with the links.
Next time you comment, click on the Help link to the lower right of the comment editing box.