Assign a probability 1-epsilon to your belief that Bayesian updating works. Your belief in “Bayesian updating works” is determined by Bayesian updating; you therefore believe with 1-epsilon probability that “Bayesian updating works with probability 1-epsilon”. The base level belief is then held with probability less than 1-epsilon.
As the recursive nature of holding Bayesian beliefs about believing Bayesianly allows chains to tend toward large numbers, the probability of the base level belief tends towards zero.
There is a flaw with Bayesian updating.
I think this is just a semi-formal version of the problem of induction in Bayesian terms, though. Unfortunately the answer to the problem of induction was “pretend it doesn’t exist and things work better”, or something like that.
I think this is a form of double-counting the same evidence. You can only perform Bayesian updating on information that is new; if you try to update on information that you’ve already incorporated, your probability estimate shouldn’t move. But if you take information you’ve already incorporated, shuffle the terms around, and pretend it’s new, then you’re introducing fake evidence and get an incorrect result. You can add a term for “Bayesian updating might not work” to any model, except to a model that already accounts for that, as models of the probability that Bayesian updating works surely do. That’s what’s happening here; you’re adding “there is an epsilon probability that Bayesian updating doesn’t work” as evidence to a model that already uses and contains that information, and counting it twice (and then counting it n times).
You can also fashion a similar problem regarding priors.
Determine what method you should use to assign a prior in a certain situation.
Then determine what method you should use to assign a prior to “I picked the wrong method to assign a prior in that situation”.
Then determine what method you should to assign a prior to “I picked the wrong method to assign a prior to “I picked the wrong method to assign a prior in that situation” ”.
This doesn’t seem like double-counting of anything to me; at no point can you assume you have picked the right method for any prior-assigning with probability 1.
This one is different, in that the evidence you’re introducing is new. However, the magnitude of the effect of each new piece of evidence on your original probability falls off exponentially, such that the original probability converges.
I’m pretty sure there is an error in your reasoning. And I’m pretty sure the source of the error is an unwarranted assumption of independence between propositions which are actually entangled—in fact, logically equivalent.
But I can’t be sure there is an error unless you make your argument more formal (i.e. symbol intensive).
I think it would take the form of X being an outcome, p(X) being the probability of the outcome as determined by Bayesian updating, “p(X) is correct” being the outcome Y, p(Y) being the probability of the outcome as determined by Bayesian updating, “p(Y) is correct” being the outcome Z, and so forth.
If you have any particular style or method of formalising you’d like me to use, mention it, and I’ll see if I can rephrase it in that way.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
p(X) is a measure of my uncertainty about outcome X—“p(X) is correct” is the outcome where I determined my uncertainty about X correctly. There are also outcomes where I incorrectly determined my uncertainty about X. I therefore need to have a measure of my uncertainty about outcome “I determined my uncertainty correctly”.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
The argument went from the initial probability of one proposition being 1-epsilon to the updated probability of the same proposition being less than 1-epsilon, because there was higher-order uncertainty which multiplies through.
A toy example: We are 90% certain that this object is a blegg. Then, we receive evidence that our method for determining 90% certainty gives the wrong answer one case in ten. We are 90% certain that we are 90% certain, or in other words—we are 81% certain that the object in question is a blegg.
Now that we’re 81% certain, we receive evidence that our method is flawed one case in ten—we are now 90% certain that we are 81% certain. Or, we’re 72.9% certain. Etc. Obviously epsilon degrades much slower, but we don’t have any reason to stop applying it to itself.
Thank-you for expressing my worry in much better terms than I managed to. If you like, I’ll link to your comment in my top-level comment.
I still don’t know why everyone thinks this is the problem of induction. You can certainly have an agent which is Bayesian but doesn’t use induction (the prior which assigns equal probability to all possible sequences of observation is non-inductive). I’m not sure if you can have a non-Bayesian that uses induction, because I’m very confused about the whole subject of ideal non-Bayesian agents, but it seems like you probably could.
Interesting that Bayesian updating seems to be flawed if an only if you assign non-zero probability to the claim that is flawed. If I was feeling mischievous I would compare it to a religion, it works so long as you have absolute faith, but if you doubt even for a moment it doesn’t.
I still don’t know why everyone thinks this is the problem of induction.
It’s similar to Hume’s philosophical problem of induction (here and here specifically). Induction in this sense is contrasted with deduction—you could certainly have a Bayesian agent which doesn’t use induction (never draws a generalisation from specific observations) but I think it would necessarily be less efficient and less effective than a Bayesian agent that did.
If you like, I’ll link to your comment in my top-level comment.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
If you like, I’ll link to your comment in my top-level comment.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
I’m actually planning on tackling it myself in the next two weeks or so. I think there might be a solution that has a deductive justification for inductive reasoning. EY has already tackled problems like this but his post seems to be a much stronger variant on Hume’s “it is custom, and it works”—plus a distinction between self-reflective loops and circular loops. That distinction is how I currently rationalise ignoring the problem of induction in everyday life.
Also—I too do not know why I don’t have an overview page.
Assign a probability 1-epsilon to your belief that Bayesian updating works. Your belief in “Bayesian updating works” is determined by Bayesian updating; you therefore believe with 1-epsilon probability that “Bayesian updating works with probability 1-epsilon”. The base level belief is then held with probability less than 1-epsilon.
As the recursive nature of holding Bayesian beliefs about believing Bayesianly allows chains to tend toward large numbers, the probability of the base level belief tends towards zero.
There is a flaw with Bayesian updating.
I think this is just a semi-formal version of the problem of induction in Bayesian terms, though. Unfortunately the answer to the problem of induction was “pretend it doesn’t exist and things work better”, or something like that.
I think this is a form of double-counting the same evidence. You can only perform Bayesian updating on information that is new; if you try to update on information that you’ve already incorporated, your probability estimate shouldn’t move. But if you take information you’ve already incorporated, shuffle the terms around, and pretend it’s new, then you’re introducing fake evidence and get an incorrect result. You can add a term for “Bayesian updating might not work” to any model, except to a model that already accounts for that, as models of the probability that Bayesian updating works surely do. That’s what’s happening here; you’re adding “there is an epsilon probability that Bayesian updating doesn’t work” as evidence to a model that already uses and contains that information, and counting it twice (and then counting it n times).
You can also fashion a similar problem regarding priors.
Determine what method you should use to assign a prior in a certain situation.
Then determine what method you should use to assign a prior to “I picked the wrong method to assign a prior in that situation”.
Then determine what method you should to assign a prior to “I picked the wrong method to assign a prior to “I picked the wrong method to assign a prior in that situation” ”.
This doesn’t seem like double-counting of anything to me; at no point can you assume you have picked the right method for any prior-assigning with probability 1.
This one is different, in that the evidence you’re introducing is new. However, the magnitude of the effect of each new piece of evidence on your original probability falls off exponentially, such that the original probability converges.
I’m pretty sure there is an error in your reasoning. And I’m pretty sure the source of the error is an unwarranted assumption of independence between propositions which are actually entangled—in fact, logically equivalent.
But I can’t be sure there is an error unless you make your argument more formal (i.e. symbol intensive).
I think it would take the form of X being an outcome, p(X) being the probability of the outcome as determined by Bayesian updating, “p(X) is correct” being the outcome Y, p(Y) being the probability of the outcome as determined by Bayesian updating, “p(Y) is correct” being the outcome Z, and so forth.
If you have any particular style or method of formalising you’d like me to use, mention it, and I’ll see if I can rephrase it in that way.
I don’t understand the phrase “p(X) is correct”.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
p(X) is a measure of my uncertainty about outcome X—“p(X) is correct” is the outcome where I determined my uncertainty about X correctly. There are also outcomes where I incorrectly determined my uncertainty about X. I therefore need to have a measure of my uncertainty about outcome “I determined my uncertainty correctly”.
The argument went from the initial probability of one proposition being 1-epsilon to the updated probability of the same proposition being less than 1-epsilon, because there was higher-order uncertainty which multiplies through.
A toy example: We are 90% certain that this object is a blegg. Then, we receive evidence that our method for determining 90% certainty gives the wrong answer one case in ten. We are 90% certain that we are 90% certain, or in other words—we are 81% certain that the object in question is a blegg.
Now that we’re 81% certain, we receive evidence that our method is flawed one case in ten—we are now 90% certain that we are 81% certain. Or, we’re 72.9% certain. Etc. Obviously epsilon degrades much slower, but we don’t have any reason to stop applying it to itself.
Thank-you for expressing my worry in much better terms than I managed to. If you like, I’ll link to your comment in my top-level comment.
I still don’t know why everyone thinks this is the problem of induction. You can certainly have an agent which is Bayesian but doesn’t use induction (the prior which assigns equal probability to all possible sequences of observation is non-inductive). I’m not sure if you can have a non-Bayesian that uses induction, because I’m very confused about the whole subject of ideal non-Bayesian agents, but it seems like you probably could.
Interesting that Bayesian updating seems to be flawed if an only if you assign non-zero probability to the claim that is flawed. If I was feeling mischievous I would compare it to a religion, it works so long as you have absolute faith, but if you doubt even for a moment it doesn’t.
It’s similar to Hume’s philosophical problem of induction (here and here specifically). Induction in this sense is contrasted with deduction—you could certainly have a Bayesian agent which doesn’t use induction (never draws a generalisation from specific observations) but I think it would necessarily be less efficient and less effective than a Bayesian agent that did.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
I’d love to see someone like EY tackle the above comment.
On a side note, why do I get an error if I click on the username of the parent’s author?
I’m actually planning on tackling it myself in the next two weeks or so. I think there might be a solution that has a deductive justification for inductive reasoning. EY has already tackled problems like this but his post seems to be a much stronger variant on Hume’s “it is custom, and it works”—plus a distinction between self-reflective loops and circular loops. That distinction is how I currently rationalise ignoring the problem of induction in everyday life.
Also—I too do not know why I don’t have an overview page.