There is no hope for LessWrong as long as people keep conflating Perfect Bayesian and Subjective Bayesian.
Let’s take Subjective Bayesian first. The problem is—Subjective Bayesian breaks basic laws of probability as a routine matter.
Take the simplest possible law of probability P(X) >= P(X and Y).
Now let’s X be any mathematical theorem which you’re not sure about. 1 > P(X) > 0.
Let Y be some kind of “the following proof of X is correct”.
Verifying proofs is usually very simple, so very once you’re asked about P(X and Y), you can confidently reply P(X and Y) = 1. Y is not a new information about the world. It is usually conjunction of trivial statements which you already assigned probability 1.
That is—there’s infinite number of statements for which Subjective Bayesian will reply P(X) < P(X and Y).
For Subjective Bayesian X doesn’t even have to involve any infinities, just ask a simple question about cryptography which is pretty much guaranteed to be unsolvable before heat death of the universe, and you’re done.
At this point people far too often try to switch Perfect Bayesian for Subjective Bayesian.
And this is true, Perfect Bayesian wouldn’t make this particular mistake, and all probabilities of mathematical theorems he’d give would be 0 or 1, no exceptions. The problem is—Perfect Bayesians are not possible due to uncomputability.
If your version of Perfect Bayesian is computable, straightforward application of Rice Theorem shows he won’t be able to answer every question consistently.
If you claim some super-computable oracle version Perfect Bayesian—well, first that’s already metaphysics not mathematics, but in the end, this way of working around uncomputability does not work.
At any mention of uncomputability people far too often try to switch Subjective Bayesian for Perfect Bayesian (see Eliezer’s comment).
Excellent—now you’ve explained what you mean by “Perfect Bayesian” it all makes sense! (Though I can’t help thinking it would have saved time if you’d said this earlier.)
Still, I’m not keen on this redefinition of the word ‘metaphysics’, as though your philosophy of mathematics were ‘obviously correct’ ‘received wisdom’, when actually it’s highly contentious.
Anyway, I think this is a successful attack on a kind of “Bayesian absolutism” which claims that beings who (explicitly or implicitly) assign consistent probabilities to all expressible events, and update their beliefs in the Bayesian manner, can actually exist. That may be a straw man, though.
Imagine taking the proofs of all provable propositions that can be expressed in less than N characters (for some very large number N), writing them as conjunctions of trivial statements, then randomly arranging all of those trivial statements in one extremely long sequence.
Let Z be the conjunction of all these statements (randomly ordered as above).
Then Z is logically stronger than Y. But our subjective Bayesian cannot use Z to prove X (Finding Moby Dick in the Library of Babel is no easier than just writing Moby Dick oneself, and we’ve already assumed that our subject is unable to prove X under his own steam) whereas he can use Y to prove X.
If you say that then you’re conceding the point, because Y is nothing other than the conjunction of a carefully chosen subset of the trivial statements comprising Z, re-ordered so as to give a proof that can easily be followed.
Figuring out how to reorder them requires mathematical knowledge, a special kind of knowledge that can be generated, not just through contact with the external world, but through spending computer cycles on it.
Yes. It’s therefore important to quantify how many computer cycles and other resources are involved in computing a prior. There is a souped-up version of taw’s argument along those lines: either P = NP, or else every prior that is computable in polynomial time will fall for the conjunction fallacy.
If you want to make it a bit less unrealistic, imagine there are only, say, 1000 difficult proofs randomly chopped and spliced rather than a gazillion—but still too many for the subject to make head or tail of. Perhaps imagine them adding up to a book about the size of the Bible, which a person can memorize in its entirety given sufficient determination.
It’s not even guaranteed to be true (but you can verify that yourself much more easily than X directly).
Compare this with classical result of conjunction fallacy. In experiments people routinely believed that:
P(next year the Soviet Union will invade Poland, and the United States will break off diplomatic relations with the Soviet Union) > P(next year the United States will break off diplomatic relations with the Soviet Union).
Here X=the United States will break off diplomatic relations with the Soviet Union.
Y=the Soviet Union will invade Poland.
Wouldn’t your reasoning pretty much endorse what people were doing? (with Y—one possible scenario leading to X—being new information)
Hmmm, I now think the existence of Y is actually a distraction. The underlying process is:
produce estimate for P(X) ⇒ find proof of X ⇒ P(X) increases
If estimates are allowed to change in this manner, then of course they are allowed to change when someone else shows you a proof of X. (since P(X)=P(X) is also a law of probability) If they are not allowed to change in this manner, then subjective Bayesianism applied to mathematical laws collapses anyways.
From a purely human psychological perspective:
When someone tells me a proof of a theorem, it feels like I’ve learned something.
When I figure one out myself, it feels like I figured something out, as if I’d learned information through interacting with the natural world.
Are you meaning to tell me that no one learns anything in math class? Or they learn something, but the thing they learn isn’t information?
Caveat: Formalizing the concepts requires us to deviate from human experience sometimes. I don’t think the concept of information has been formalized, by Bayesians or Frequentists, in a way that deals with the problem of acting with limited computing time, aka the problem of logical uncertainty.
Would you agree that “P(X)” you’re describing is more like “some person’s answer when asked question X” than “probability of X”?
The main difference between two is that if “X” and “Y” are the same logical outcome, then their probabilities are necessarily the same, but an actual person can reply differently depending on how question was formulated.
If you’re interested in this subject, I recommend reading about epistemic modal logic—not necessarily for their solutions, but they’re clearly aware of this problem, and can describe it better than me.
You can get pwn’d if the person offering you the bets knows more than you. The only defense is to, when you’re uncertain, have bets such that you will not take X or you will not take -X (EDIT: Because you update on the information that they’re offering you a bet. I forgot that. Thanks JG.). This can be conceptualized as saying “I don’t know”
So yeah. If you suspect that someone may know more math than you, don’t take their bets about math.
Now, it’s possible to have someone pre-commit to not knowing stuff about the world. But they can’t pre-commit to not knowing stuff about math, or they can’t as easily.
You can get pwn’d if the person offering you the bets knows more than you. The only defense is to, when you’re uncertain, have bets such that you will not take X or you will not take -X. This can be conceptualized as saying “I don’t know”
Another defense is updating on the information that this person who knows more than you is offering this bet before you decide if you will accept it.
Given that they are presented at the same time (such as X is a conjecture, Y is a proof of the conjecture), yes, accepting these bets is being Dutch Booked. But upon seeing “X and Y” you would update “X” to something like 95%.
Given that they are presented in order (What bet do you take against X? Now that’s locked in, here is a proof Y. What bet do you take for “X and Y”?) this is a malady that all reasoners without complete information suffer from. I am not sure if that counts as a Dutch Book.
Given that they are presented in order [...] I am not sure if that counts as a Dutch Book.
It is trivial to reformulate this problem to X and X’ being logically equivalent, but not immediately noticeable as such, and a person being asked about X’ and (X and Y) or something like that.
Yes, but that sounds like “If you don’t take the time to check your logical equivalencies, you will take Dutch Books”. This is that same malady: it’s called being wrong. That is not a case of Bayesianism being open to Dutch Books: it is a case of wrong people being open to Dutch Books.
“If you don’t take the time to check your logical equivalencies, you will take Dutch Books”
You’re very wrong here.
By Goedel’s Incompleteness Theorem, there is no way to “take the time to check your logical equivalencies”. There are always things that are logically equivalent that your particular method of proving, no matter how sophisticated, will not find, in any amount of time.
This is somewhat specific to Bayesianism, as Bayesianism insists that you always give a definite numerical answer.
Not being able to refuse answering (by Bayesianism) + no guarantee of self-consistency (by Incompleteness) ⇒ Dutch booking
I admit defeat. When I am presented with enough unrefusable bets that incompleteness prevents me from realising are actually Dutch Books such that my utility falls consistently below some other method, I will swap to that method.
There is no hope for LessWrong as long as people keep conflating Perfect Bayesian and Subjective Bayesian.
Let’s take Subjective Bayesian first. The problem is—Subjective Bayesian breaks basic laws of probability as a routine matter.
Take the simplest possible law of probability P(X) >= P(X and Y).
Now let’s X be any mathematical theorem which you’re not sure about. 1 > P(X) > 0.
Let Y be some kind of “the following proof of X is correct”.
Verifying proofs is usually very simple, so very once you’re asked about P(X and Y), you can confidently reply P(X and Y) = 1. Y is not a new information about the world. It is usually conjunction of trivial statements which you already assigned probability 1.
That is—there’s infinite number of statements for which Subjective Bayesian will reply P(X) < P(X and Y).
For Subjective Bayesian X doesn’t even have to involve any infinities, just ask a simple question about cryptography which is pretty much guaranteed to be unsolvable before heat death of the universe, and you’re done.
At this point people far too often try to switch Perfect Bayesian for Subjective Bayesian.
And this is true, Perfect Bayesian wouldn’t make this particular mistake, and all probabilities of mathematical theorems he’d give would be 0 or 1, no exceptions. The problem is—Perfect Bayesians are not possible due to uncomputability.
If your version of Perfect Bayesian is computable, straightforward application of Rice Theorem shows he won’t be able to answer every question consistently.
If you claim some super-computable oracle version Perfect Bayesian—well, first that’s already metaphysics not mathematics, but in the end, this way of working around uncomputability does not work.
At any mention of uncomputability people far too often try to switch Subjective Bayesian for Perfect Bayesian (see Eliezer’s comment).
Excellent—now you’ve explained what you mean by “Perfect Bayesian” it all makes sense! (Though I can’t help thinking it would have saved time if you’d said this earlier.)
Still, I’m not keen on this redefinition of the word ‘metaphysics’, as though your philosophy of mathematics were ‘obviously correct’ ‘received wisdom’, when actually it’s highly contentious.
Anyway, I think this is a successful attack on a kind of “Bayesian absolutism” which claims that beings who (explicitly or implicitly) assign consistent probabilities to all expressible events, and update their beliefs in the Bayesian manner, can actually exist. That may be a straw man, though.
The obvious solution:
Y is information. It is not information about the world, but it is information—information about math.
I don’t think that works.
Imagine taking the proofs of all provable propositions that can be expressed in less than N characters (for some very large number N), writing them as conjunctions of trivial statements, then randomly arranging all of those trivial statements in one extremely long sequence.
Let Z be the conjunction of all these statements (randomly ordered as above).
Then Z is logically stronger than Y. But our subjective Bayesian cannot use Z to prove X (Finding Moby Dick in the Library of Babel is no easier than just writing Moby Dick oneself, and we’ve already assumed that our subject is unable to prove X under his own steam) whereas he can use Y to prove X.
The Bayesian doesn’t know Z is stronger than Y. He can’t even read all of Z. Or if you compress it, he can’t decompress it.
P(Y|Z)<1.
If you say that then you’re conceding the point, because Y is nothing other than the conjunction of a carefully chosen subset of the trivial statements comprising Z, re-ordered so as to give a proof that can easily be followed.
Figuring out how to reorder them requires mathematical knowledge, a special kind of knowledge that can be generated, not just through contact with the external world, but through spending computer cycles on it.
Yes. It’s therefore important to quantify how many computer cycles and other resources are involved in computing a prior. There is a souped-up version of taw’s argument along those lines: either P = NP, or else every prior that is computable in polynomial time will fall for the conjunction fallacy.
Imagine he has read and memorized all of Z.
If you want to make it a bit less unrealistic, imagine there are only, say, 1000 difficult proofs randomly chopped and spliced rather than a gazillion—but still too many for the subject to make head or tail of. Perhaps imagine them adding up to a book about the size of the Bible, which a person can memorize in its entirety given sufficient determination.
Oh I see. Chopped and spliced. That makes more sense. I missed that in your previous comment.
The Bayesian still does not know that Z implies Y, because he has not found Y in Z, so there still isn’t a problem.
In which sense is Y information?
It’s not even guaranteed to be true (but you can verify that yourself much more easily than X directly).
Compare this with classical result of conjunction fallacy. In experiments people routinely believed that:
P(next year the Soviet Union will invade Poland, and the United States will break off diplomatic relations with the Soviet Union) > P(next year the United States will break off diplomatic relations with the Soviet Union).
Here X=the United States will break off diplomatic relations with the Soviet Union. Y=the Soviet Union will invade Poland.
Wouldn’t your reasoning pretty much endorse what people were doing? (with Y—one possible scenario leading to X—being new information)
Hmmm, I now think the existence of Y is actually a distraction. The underlying process is:
produce estimate for P(X) ⇒ find proof of X ⇒ P(X) increases
If estimates are allowed to change in this manner, then of course they are allowed to change when someone else shows you a proof of X. (since P(X)=P(X) is also a law of probability) If they are not allowed to change in this manner, then subjective Bayesianism applied to mathematical laws collapses anyways.
From a purely human psychological perspective: When someone tells me a proof of a theorem, it feels like I’ve learned something. When I figure one out myself, it feels like I figured something out, as if I’d learned information through interacting with the natural world.
Are you meaning to tell me that no one learns anything in math class? Or they learn something, but the thing they learn isn’t information?
Caveat: Formalizing the concepts requires us to deviate from human experience sometimes. I don’t think the concept of information has been formalized, by Bayesians or Frequentists, in a way that deals with the problem of acting with limited computing time, aka the problem of logical uncertainty.
I think we almost agree already ;-)
Would you agree that “P(X)” you’re describing is more like “some person’s answer when asked question X” than “probability of X”?
The main difference between two is that if “X” and “Y” are the same logical outcome, then their probabilities are necessarily the same, but an actual person can reply differently depending on how question was formulated.
If you’re interested in this subject, I recommend reading about epistemic modal logic—not necessarily for their solutions, but they’re clearly aware of this problem, and can describe it better than me.
Ok, I understood that, but I still don’t see what it has to do with Dutch books.
P(X) < P(X and Y) gives you a dutch book.
OH I SEE. Revelation.
You can get pwn’d if the person offering you the bets knows more than you. The only defense is to, when you’re uncertain, have bets such that you will not take X or you will not take -X (EDIT: Because you update on the information that they’re offering you a bet. I forgot that. Thanks JG.). This can be conceptualized as saying “I don’t know”
So yeah. If you suspect that someone may know more math than you, don’t take their bets about math.
Now, it’s possible to have someone pre-commit to not knowing stuff about the world. But they can’t pre-commit to not knowing stuff about math, or they can’t as easily.
Another defense is updating on the information that this person who knows more than you is offering this bet before you decide if you will accept it.
That’s not the Dutch book I was talking about.
Let’s say you assign “X” probability of 50%, so you gladly take 60% bet against “X”.
But you assign “X and Y” probability 90%, so you as gladly take 80% bet for “X and Y”.
You just paid $1.20 for combinations of bets that can give you returns of at most $1 (or $0 if X turns out to be true but Y turns out to be false).
This is exactly a Dutch Book.
Given that they are presented at the same time (such as X is a conjecture, Y is a proof of the conjecture), yes, accepting these bets is being Dutch Booked. But upon seeing “X and Y” you would update “X” to something like 95%.
Given that they are presented in order (What bet do you take against X? Now that’s locked in, here is a proof Y. What bet do you take for “X and Y”?) this is a malady that all reasoners without complete information suffer from. I am not sure if that counts as a Dutch Book.
It is trivial to reformulate this problem to X and X’ being logically equivalent, but not immediately noticeable as such, and a person being asked about X’ and (X and Y) or something like that.
Yes, but that sounds like “If you don’t take the time to check your logical equivalencies, you will take Dutch Books”. This is that same malady: it’s called being wrong. That is not a case of Bayesianism being open to Dutch Books: it is a case of wrong people being open to Dutch Books.
You’re very wrong here.
By Goedel’s Incompleteness Theorem, there is no way to “take the time to check your logical equivalencies”. There are always things that are logically equivalent that your particular method of proving, no matter how sophisticated, will not find, in any amount of time.
This is somewhat specific to Bayesianism, as Bayesianism insists that you always give a definite numerical answer.
Not being able to refuse answering (by Bayesianism) + no guarantee of self-consistency (by Incompleteness) ⇒ Dutch booking
I admit defeat. When I am presented with enough unrefusable bets that incompleteness prevents me from realising are actually Dutch Books such that my utility falls consistently below some other method, I will swap to that method.