If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn’t observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.
It may be that you are underestimating the AI’s cleverness, so that you expect to see decent evidence of Zoroastrianism, but in fact you found incredible evidence of Zoroastrianism, and so you become convinced. In this case your false belief about the AI not being too convincing is doing the philosophical work of deceiving you, and it’s no longer really deceiving yourself. Deceiving yourself seems to be more about starting with all correct beliefs, but talking yourself into an incorrect belief.
If you happen to luck out into having a false belief about the AI being unconvincing, and if this situation with the library of theology just falls out of the sky without your arranging it, you got lucky—but that’s being deceived by others. If you try to set up the situation, you can’t deliberately underestimate the AI because you’ll know you’re doing it. And you can’t set up the theological library situation until you’re confident you’ve deliberately underestimated the AI.
If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn’t observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.
This presumes that your mind can continue to obey the rules of Bayesian updating in the face of an optimization process that’s deliberately trying to make it break those rules. We can’t do that very well.
OP argued that self-deception occurs even if your brain remains unbroken. I would characterize “not breaking my brain” as allowing my prior belief about the book’s biasedness to make a difference in my posterior confidence of the book’s thesis. In that case the book might be arbitrarily convincing; but I might start with an arbitrarily high confidence that the book is biased, and then it boils down to an ordinary Bayesian tug o’ war, and Yvain’s comment applies.
On the other hand, I’d view a brain-breaking book as a “press X to self-modify to devout Y-believer” button. If I know the book is such, I decide not to read it. If I’m ignorant of the book’s nature, and I read it, then I’m screwed.
True. So in the process of deceiving yourself, you must first become irrational. The problem is then protesting that you are “still a reasonable person.”
True. So in the process of deceiving yourself, you must first become irrational. The problem is then protesting that you are “still a reasonable person.”
Not quite; you might choose to deceive yourself for decision-theoretic reasons. For example, the Zoroastrian Inquisition might be going around with very good lie detectors and punishing anyone who doesn’t believe. We usually equate rationality with true beliefs, but this is only an approximation; decision theory is more fundamental than truth.
You may want to look at Brandon Fitelson’s short paper Evidence of evidence is not (necessarily) evidence. You seem to be arguing that, since we have strong evidence that the book has strong evidence for Zoroastrianism before we read it, it follows that we already have (the most important part of) our evidence for Zoroastrianism. But it turns out that it’s extremely tricky to make this sort of reasoning work. To use the most primitive example from the paper, discovering that a playing card C is black is evidence that C is the ace of spades. Furthermore, that C is the ace of spades is excellent evidence that it’s an ace. But discovering that C is black does not give you any evidence whatsoever that C is an ace.
The problem here—at least one of them—is that discovering C is black is just as much evidence for C being the x of spades for any other card-value x. Similarly, before opening the book on Zoroastrianism, we have just as much evidence for the existence of strong evidence for Christianity/atheism/etc, so our credences shouldn’t suddenly start favoring any one of these. But once we learn the evidence for Zoroastrianism, we’ve acquired new information, in just the same way that learning that the card is an ace of spades provides us new information if we previously just knew it was black.
I do suspect that there are relevant disanalogies here, but don’t have a very detailed understanding of them.
You seem to be arguing that, since we have strong evidence that the book has strong evidence for Zoroastrianism before we read it, it follows that we already have (the most important part of) our evidence for Zoroastrianism.
Not exactly. I do think this would be a true statement, if the book was a genuine book on Zoroastrianism and not a book which we know was designed to deceive us. But as far as I know it’s only tangentially connected to the argument I’m making.
You may want to look at Brandon Fitelson’s short paper Evidence of evidence is not (necessarily) evidence.
Thanks for summarizing the paper; I tried to read it but it was written in a way that seemed designed to be as obscure as possible. Your explanation makes more sense.
But I still don’t see the problem. Learning a card is black increases the chance it’s the ace of spades or clubs, but decreases the chance it’s the ace of hearts or diamonds. The chance that it’s the ace of spades becomes greater, but the net chance that it’s an ace remains exactly the same. Evidence of evidence is still evidence, but evidence of evidence plus evidence of evidence that goes the opposite direction cancel out and make zero evidence.
Again, I’m not sure about the relevance here. It’s not the case that, merely by knowing the book exists without reading it, we have new evidence for the existence of some evidence which both supports and, in a different way, opposes Zoroastrianism.
Thanks for summarizing the paper; I tried to read it but it was written in a way that seemed designed to be as obscure as possible. Your explanation makes more sense.
(I guess I’d say it was written in a way designed to be precise. But I agree that the author isn’t the best writer.)
Evidence of evidence is still evidence, but evidence of evidence plus evidence of evidence that goes the opposite direction cancel out and make zero evidence.
I find this sentence hard to make sense of. Based on the first part of the sentence, you seem to be suggesting that the problem in the card scenario is that our evidence^2 (= the card is black) is both evidence^1 for the card being an ace and evidence^1 against the card being an ace, and the two pieces of evidence^1 balance out to yield the same total probability of the card being an ace as before. But clearly no single piece of information, such as the card being black, can provide evidence^1 both for and against a given hypothesis. It either yields evidence^1 or it doesn’t. And if it doesn’t, then evidence^2 is not always evidence^1.
Anyway, the relevance is this: When we learn the card is black, we acquire evidence for a bunch of different pieces of information which, taken on their own, have varying probabilistic effects on the hypothesis that the card is an ace. These effects add up in such a way as to leave the posterior probability of the hypothesis untouched. But once we actually learn one of these individual pieces of information, suddenly the posterior shoots way up.
Similarly, before we read the book, we have evidence for a bunch of different pieces of information which, taken on their own, have varying probabilistic effects on the truth of Zoroastrianism. These effects add up in such a way as to leave our posterior in Zoroastrianism untouched (assuming we don’t consider non-book-possessing religions). So why is it that when we learn one of these pieces of information by reading the book, our posterior shouldn’t change, unlike in the card case?
O.K., here’s a disanalogy that may be important. In the card case, learning that C is the ace of spades should drastically lower our credence of the card being another x of spades. On the other hand, after reading the Zoroastrianism book, we shouldn’t significantly doubt that the other books contain strong evidence, as well, given the known capabilities of the AI.
Suppose, I am going to read a book by a top Catholic theologian. I know he is probably smarter than me, because of the number of priests in the world, and their average IQ and intellectual abilities, etc, I figure the smartest of them is probably really really smart and more well read and has the very best arguments the Church found in 2000 years. If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?
It’s the very fallacy Eliezer argues against where people know about clever arguers and use this fact against everyone else.
If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?
You should take the meta-information into account, because what you’re getting is filtered evidence. See What Evidence Filtered Evidence. If the book only contained very weak arguments, this would suggest that no strong arguments could be found, and would therefore be evidence against what the book was arguing for.
Fair enough. But the arguments themselves must also update my belief. It should not ever be the case that this meta stuff completely cancels out an argument that I think is valid. That is irrational, just like not listening to someone who belongs to the enemy.
If you were already completely certain that you were about to read a valid argument, and then you read that argument, then the meta stuff would completely cancel it out. If you were almost completely certain that you were about to read a valid argument, and then you read it, then the meta stuff would almost (but not completely) cancel it out. This is why reading the same argument twice in a row does not affect your confidence much more than reading it once does. But the less certain you were about the argument’s validity the first time, the more of an effect going over it again should have.
How can this be true when different arguments have different strength and you don’t know what the statement is? Here, suppose you believe that you are about to read a completely valid argument in support of conventional arithmetic. Please update your belief now.
Here is the statement: “2+2=4”.
What if it instead was Russell’s Principia Mathematica?
But you had assumed that the book would contain extremely strong arguments in favour of Zoroastrianism. Here strong means that P(Zoroastrianism is correct | argument is valid) is big, not that P(argument is valid) is big after reading the argument. (At least this is how I interpret your setting.) Both “all arguments in Principia Mathematica are correct” and “2+2=4″ have high probabilities of being true, but P(arithmetic is correct | all arguments in Principia are correct) is much higher than P(arithmetic is correct | 2+2=4).
We are running into meta issues that are really hard to wrap your head around. You believe that the book is likely to convince you, but it’s not absolutely guaranteed to. Whether it will do so surely depends on the actual arguments used. You’d expect, a priori, that if it argues for X which is more likely, its arguments would also be more convincing. But until you actually see the arguments, you don’t know that they will convince you. It depends on what they actually are. In your formulation, what happens if you read the book and the arguments do not convince you? Also, what if the arguments do not convince you, but only because you expect the book to be extremely convincing, is this different from the case of arguments taken without this meta-knowledge not convinving you?
I think I address some of these questions in another reply, but anyway, I will try a detailed description:
Let’s denote the following propositions:
Z = “Zoroastrianism is true.”
B = Some particular, previously unknown, statement included in the book. It is supposed to be evidence for Z. Let this be in form of propositions so that I am able to assign it a probability (e.g. B shouldn’t be a Pascal-wagerish extortion).
C(r) = “B is compelling to such extent that it shifts odds for Z by ratio r”. That is, C(r) = “P(B|Z) = r*P(B|not Z)”.
F = Unknown evidence against Z.
D(r) = “F shifts odds against Z by ratio r.”
Before reading the book
p(Z) is low
I may have a probability distribution for “B = S” (that is, “the convincing argument contained in the book is S”) over set of all possible S; but if I have it, it is implicit, in sense I have an algorithm which assigns p(B = S) for any given S, but haven’t gone through the whole huge set of all possible S—else the evidence in the book wouldn’t be new to me in any meaningful sense
I have p(S|Z) and p(S|not Z) for all S, implicitly like in the previous case
I can’t calculate the distribution p(C(r)) from p(B = S), p(S|Z) and p(S|not Z), since that would require calculating explicitly p(B = S) for every S, which is out of reach; however
I have obtained p(C(r)) by another means—knowledge about how the book is constructed—and p(C(r)) has most of its mass at pretty high values of r
by the same means I have obtained p(D(r)), which is distributed at as high or even higher values of r
Can I update the prior p(Z)? If I knew for certain that C(1,000) is true, I should take it into account and multiply the odds for Z by 1,000. If I knew that D(10,000) is true, I should analogically divide the odds by 10,000. Having probability distributions instead of certainty changes little—calculate the expected value* E(r) for both C and D and use that. If the values for C and D are similar or only differ a little (which is probably what we assume), then the updates approximately cancel out.
Now I read the book and find out what B is. This necessarily replaces my prior p(C(r)) by δ(r—R), where R = P(B|Z)/P(B|not Z). It can be that R is higher that the expected value E(r) calculated from p(C(r)), it can be lower too. If it is higher—the evidence in the book is more convincing, I will update upwards. If it is lower, I will update downwards. The odds would change by R/E(r). If my expectations of convincingness of arguments were correct, that is R = E(r), actually learning what B is does nothing with my probability distribution.
Potentially confounding factor is that our prior E(r) is usually very close to 1; if somebody tells me that he has a very convincing argument, I don’t really expect a convincing argument., because convincing arguments are rare and people regularly overestimate the width of audience who would find their favourite argument compelling. Therefore I normally need to hear the argument before updating significantly. But in this scenario it is stipulated that we already know in advance that the argument is convincing.
*) I suppose that E(r) here should be geometric mean of p(C(r)), but it can be another functional; it’s too late here to think about it.
I am not sure I completely follow, but I think the point is that you will in fact update the probability up if a new argument is more convincing than you expect. Since AI can better estimate what you expect it to do than you can estimate how convincing AI will make it, it will be able to make all arguments more convincing than you expect.
I think you are adding further specifications to the original setting. Your original description assumed that AI is a very clever arguer who constructs very persuasive deceptive arguments. Now you assume that AI actively tries to make the arguments more persuasive than you expect. You can stipulate for argument’s sake that AI can always make more convincing argument than you expect, but 1) it’s not clear whether it’s even possible in realistic circumstances, 2) it obscures the (interesting and novel) original problem (“is evidence of evidence equally valuable as the evidence itself?”) by rather standard Newcomb-like mind-reading paradox.
There is some degree to which you should expect to be swayed by empty arguments, and yes, you should subtract that out if you anticipate it. But if the book is a lot more compelling than that, then the book is probably above average both in arguing skill and in actual evidence. You cannot discount it solely as empty anymore, but neither should you assume that all of the “excess” convincing came from evidence—the book could just be unusually well written. You have to balance the improbabilities of evidence vs. writing, and update on the evidence found in that way.
Usually, the uncertainty grows with the size of the thing you’re trying to measure. This means that when thinking about super-duper-well-written books, the uncertainty in the writing skill gets really big. And so when balancing the improbabilities of evidence vs. writing, the evidence barely has to do any balancing at all—the writing skill just washes it out.
If the amount of evidence presented is the same, it’s better to hear about the truth from a child than from an orator, because the child doesn’t have all those orating skills mucking up your signal-to-noise.
There is some degree to which you should expect to be swayed by empty arguments, and yes, you should subtract that out if you anticipate it.
Right. I think my argument hinges on the fact that AI knows how much you intend to subtract before you read the book, and can make it be more convincing than this amount.
I don’t think it’s okay to have the AI’s convincingness be truly infinite, in the full inf—inf = undefined sense. Your math will break down. Safer just to represent “suppose there’s a super-good arguer” by having the convincingess be finite, but larger than every other scale in the problem.
You have a very compelling point and I have to think about it. But there is meta-reasoning involved which is really tricky.
As I start to read the book, I have some P(zoroastrianism is true). It’s non-zero. Now I read the first chapter, it has some positive evidence for Z in it. I expected to see some evidence, but it is actual evidence which I have not previously considered. Should I adjust my P(Z is true) up? I think I must. So, if the book has many chapters, I must either get close to 1, or else start converging to some p < 1. Are you arguing for the latter?
Consider the case where a friend says he saw a UFO. There are two possibilities: either the friend is lying/insane/gullible, or UFOs are real (there are probably some other possibilities, but for the sake of argument let’s focus on these).
Your friend’s statement can have different effects depending on what you already believe. If either probability is already at ~100%, you have no more work to do. IE, if you’re already sure your friend is a liar, you dismiss this as yet another lie and don’t start believing in UFOs; if you’re already sure UFOs exist, you dismiss this as yet another UFO and don’t start doubting your friend.
If you’re not ~100% sure of either statement, then your observation will increase both the probability that your friend is a liar, and that aliens exist, but in different amounts. If you think your friend usually tells the truth, but you’re not sure, it will increase your probability of UFOs quite a bit (your friend wouldn’t lie to you!) but as long as you’re not going to be sure of UFOs, you also have to leave some room for the case where UFOs aren’t real, in which case the statement increases your probability that your friend is a liar.
When you hear a great argument for P, your pre-existing beliefs determine what you do in the same way as in the UFO example. It could mean that your interlocutor is a rhetorical genius so brilliant they can think up great arguments even for false positions. Or it could mean P is true. In real life, the probability of the interlocutor being such a rhetorical genius is always less than ~100%, meaning that it has to increase your probability of P at least a little.
In your example, we already know that the AI is a rhetorical genius who can create an arbitrarily good argument for anything. That totally explains away the brilliant arguments, leaving nothing left to be explained by Zoroastrianism actually being true. It’s like when your friend who is a known insane liar says he saw a UFO: the insane liar part already explains away the evidence, so even though you’re hearing words that sound like evidence, no probabilities are actually being shifted.
I understand the principle, yes. But it means if your friend is a liar, no argument he gives needs to be examined on its own merits. But what if he is a liar and he saw a UFO? What if P(he is a liar) and P(there’s a UFO) are not independent? I think if they are independent, your argument works. If they are not, it doesn’t. If UFOs appear mostly to liars, you can’t ignore his evidence. Do you agree? In my case, they are not independent: it’s easier to argue for a true proposition, even for a very intelligent AI. Here I assume that P must be strictly less than 1 always.
Does the chapter really count as evidence? Normally, X is evidence for Z is P(X|Z) > P(X|not Z). In this case, X = “there are compelling arguments for Z” and you already suppose that X is true whether or not Z. X is therefore not evidence for Z. Of course, after reading the chapter you learn the particular compelling arguments A(1), A(2), … But those arguments support Z only through X and since you know X is not evidence, A(n) are screened off. Put another way, you know that for each A(n) there is an equally compelling argument B(n) that cancels it out. Knowing what the argument actually says is important only if you want to independently determine its compellingness. But you have assumed that you already know this.
Consider a more realistic scenario: you have a coin and two hypotheses:
F: the coin is fair
H: the coin is biased towards heads and it comes up heads twice as frequently as tails
Now you tell your servant: “Toss the coin million times and write the results down. Then, from the record select a subsequence S which has P(S|H) = 1.58 P(S|F) and tell me.” The servant follows your instruction and says “HHTH”. Now you have learned something new (the servant could for example tell “THHH” instead), but you don’t update your odds in favour of H by factor 1.58, because it was almost certain in advance that the servant would be able to locate a subsequence of desired property whether or not H holds.
It’s not obvious to me that X screens off the individual arguments. In particular, X only asserts the existence of at least one compelling argument. If there are multiple, independent compelling arguments in the book, then this should presumably increase our confidence in Z beyond just knowing X. Or am I confused about something?
Also, the individual pieces of evidence could undercut my confidence in the strength of the other books’ arguments conditional on (knowledge of) the evidence I just learned. For example, suppose we expect all arguments from the other books to proceed from some initially plausible premise P. But the Zoroastrian book produces strong evidence against P. Then our evidence goes quite beyond X.
Then substitute “there are many deceptive compelling arguments for Z, much more than any book can contain” for definition of X. The point stands.
I believe your second point is only a specific case of arguments more compelling than expected (here due to their ability to undermine the counterarguments). This is fine—if the arguments are unexpectedly compelling, you update.
On the other hand, the problem was pretty symmetric at the beginning—all books were considered equivalent in their persuasive strength. If book Z argues well against P and all other books took P as granted without similarly undermining the leading premise of book Z the system of books would be unbalanced (reading all books would cause you to believe Z independently of reading order), which would violate the assumed symmetry. So, if you find good anti-P arguments in book Z, the odds are that your assumption about the contents of other books being based on P is incorrect or the other books contain good counterargument which cancels this out. You should be very certain that the system of books is balanced, else the problem doesn’t work.
I think it’s easy to make my second point without the asymmetry. Let’s re-pose the problem so that we expect in advance not only that each book will produce strong evidence in favor of the religion it advocates, but also strong evidence that none of the other books contain strong counter-evidence or similarly undermining evidence. When you read book Z, you learn individual pieces of evidence z1, z2, …, zn. But z1, …, zn undermine your confidence that the other books contain strong arguments, thus disconfirming your belief that you’d likely find convincing evidence for Zoroastrianism in the book whether or not the religion is true. But then it starts looking like we have evidence for Zoroastrianism. However, if, as you argue, z1, …, zn only support Zoroastrianism through things we expected to see in advance of reading the book, then we shouldn’t have any evidence. So either I’m confused or we still have a problem.
The scenario, as I understand it, is based on assumption that the confidence about y = “all books contain equally strong evidence for their respective religion” is high. If y is absolutely certain, p(y) = 1, the confidence cannot be shaken by whatever is found in book Z. If, on the other hand, p(y) is not certain, then what happens depends a lot on relative strength of various pieces of evidence. But this is another (more complex and fuzzier) problem—now you expect that Z not only contains evidence for Zoroastrianism, but also evidence against the very statement of the thought experiment. Doubting y is not included in the original post, where the newly converted Zoroastrian admits that reading book A would deconvert him to atheism; he refrains from doing that only because he fears Ahura Mazda’s wrath.
I think your conclusion there trades on an ambiguity of what “evidence” refers to in your y (= “all books contain equally strong evidence for their respective religion”). The assumption y could mean either:
For each book x, x contains really compelling evidence that we’re sure would equally convince us if we were to encounter it in a normal situation (i.e., without knowing about the other books or the AI’s deviousness).
For each book x, x contains really compelling evidence even after considering and correctly reasoning about all the facts of the thought experiment.
Obviously the second interpretation is either incoherent or completely trivializes the thought experiment, since it’s an assumption about what the all-things-considered best thing to believe after reading a book is, when that’s precisely the question we’re being posed in the first place. On the other hand, the first interpretation, even if assumed with probability 1, is compatible with a given book lowering the posterior expected strength of evidence of the other books.
Fair point. The ambiguity is already included in the original formulation of the thought experiment. The first formulation is compatible with lowering the posterior expected strength of evidence of other books after reading one of them, but it is also compatible with being not convinced by the evidence at all. Assuming the first interpretation the problem is underspecified and no apparent paradox is present.
The second interpretation can have several subinterpretations:
2a) For each book x, reading x convinces ordinary human about the particular proposition argued for in x (possibly using biases and imperfections of human mind).
2b) For each book x, reading x convinces ideal Bayesian reasoner (IBR) about the particular proposition.
2a was probably closest to the meaning intended in the OP. It is a paradox only if we assume that ordinary human resoning is consistent, which we don’t assume, so there is no problem. 2b depends on what IBR exactly means. If it has no limitations on processing speed and memory the thought experiment becomes impossible, since the IBR has already considered all possible arguments and can’t be swayed by rhetorical trickery. If, on the other hand, the IBR has some physical limitations, 2b can be used to show that its thinking leads to inconsistencies, but it is not much more surprising than the same conclusion from the case 2a.
If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn’t observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.
It may be that you are underestimating the AI’s cleverness, so that you expect to see decent evidence of Zoroastrianism, but in fact you found incredible evidence of Zoroastrianism, and so you become convinced. In this case your false belief about the AI not being too convincing is doing the philosophical work of deceiving you, and it’s no longer really deceiving yourself. Deceiving yourself seems to be more about starting with all correct beliefs, but talking yourself into an incorrect belief.
If you happen to luck out into having a false belief about the AI being unconvincing, and if this situation with the library of theology just falls out of the sky without your arranging it, you got lucky—but that’s being deceived by others. If you try to set up the situation, you can’t deliberately underestimate the AI because you’ll know you’re doing it. And you can’t set up the theological library situation until you’re confident you’ve deliberately underestimated the AI.
This presumes that your mind can continue to obey the rules of Bayesian updating in the face of an optimization process that’s deliberately trying to make it break those rules. We can’t do that very well.
OP argued that self-deception occurs even if your brain remains unbroken. I would characterize “not breaking my brain” as allowing my prior belief about the book’s biasedness to make a difference in my posterior confidence of the book’s thesis. In that case the book might be arbitrarily convincing; but I might start with an arbitrarily high confidence that the book is biased, and then it boils down to an ordinary Bayesian tug o’ war, and Yvain’s comment applies.
On the other hand, I’d view a brain-breaking book as a “press X to self-modify to devout Y-believer” button. If I know the book is such, I decide not to read it. If I’m ignorant of the book’s nature, and I read it, then I’m screwed.
True. So in the process of deceiving yourself, you must first become irrational. The problem is then protesting that you are “still a reasonable person.”
Not quite; you might choose to deceive yourself for decision-theoretic reasons. For example, the Zoroastrian Inquisition might be going around with very good lie detectors and punishing anyone who doesn’t believe. We usually equate rationality with true beliefs, but this is only an approximation; decision theory is more fundamental than truth.
That only seems to mean that you were a reasonable person.
You may want to look at Brandon Fitelson’s short paper Evidence of evidence is not (necessarily) evidence. You seem to be arguing that, since we have strong evidence that the book has strong evidence for Zoroastrianism before we read it, it follows that we already have (the most important part of) our evidence for Zoroastrianism. But it turns out that it’s extremely tricky to make this sort of reasoning work. To use the most primitive example from the paper, discovering that a playing card C is black is evidence that C is the ace of spades. Furthermore, that C is the ace of spades is excellent evidence that it’s an ace. But discovering that C is black does not give you any evidence whatsoever that C is an ace.
The problem here—at least one of them—is that discovering C is black is just as much evidence for C being the x of spades for any other card-value x. Similarly, before opening the book on Zoroastrianism, we have just as much evidence for the existence of strong evidence for Christianity/atheism/etc, so our credences shouldn’t suddenly start favoring any one of these. But once we learn the evidence for Zoroastrianism, we’ve acquired new information, in just the same way that learning that the card is an ace of spades provides us new information if we previously just knew it was black.
I do suspect that there are relevant disanalogies here, but don’t have a very detailed understanding of them.
Not exactly. I do think this would be a true statement, if the book was a genuine book on Zoroastrianism and not a book which we know was designed to deceive us. But as far as I know it’s only tangentially connected to the argument I’m making.
Thanks for summarizing the paper; I tried to read it but it was written in a way that seemed designed to be as obscure as possible. Your explanation makes more sense.
But I still don’t see the problem. Learning a card is black increases the chance it’s the ace of spades or clubs, but decreases the chance it’s the ace of hearts or diamonds. The chance that it’s the ace of spades becomes greater, but the net chance that it’s an ace remains exactly the same. Evidence of evidence is still evidence, but evidence of evidence plus evidence of evidence that goes the opposite direction cancel out and make zero evidence.
Again, I’m not sure about the relevance here. It’s not the case that, merely by knowing the book exists without reading it, we have new evidence for the existence of some evidence which both supports and, in a different way, opposes Zoroastrianism.
(I guess I’d say it was written in a way designed to be precise. But I agree that the author isn’t the best writer.)
I find this sentence hard to make sense of. Based on the first part of the sentence, you seem to be suggesting that the problem in the card scenario is that our evidence^2 (= the card is black) is both evidence^1 for the card being an ace and evidence^1 against the card being an ace, and the two pieces of evidence^1 balance out to yield the same total probability of the card being an ace as before. But clearly no single piece of information, such as the card being black, can provide evidence^1 both for and against a given hypothesis. It either yields evidence^1 or it doesn’t. And if it doesn’t, then evidence^2 is not always evidence^1.
Anyway, the relevance is this: When we learn the card is black, we acquire evidence for a bunch of different pieces of information which, taken on their own, have varying probabilistic effects on the hypothesis that the card is an ace. These effects add up in such a way as to leave the posterior probability of the hypothesis untouched. But once we actually learn one of these individual pieces of information, suddenly the posterior shoots way up.
Similarly, before we read the book, we have evidence for a bunch of different pieces of information which, taken on their own, have varying probabilistic effects on the truth of Zoroastrianism. These effects add up in such a way as to leave our posterior in Zoroastrianism untouched (assuming we don’t consider non-book-possessing religions). So why is it that when we learn one of these pieces of information by reading the book, our posterior shouldn’t change, unlike in the card case?
Thank you, that’s what I was trying to get at, but didn’t know how.
O.K., here’s a disanalogy that may be important. In the card case, learning that C is the ace of spades should drastically lower our credence of the card being another x of spades. On the other hand, after reading the Zoroastrianism book, we shouldn’t significantly doubt that the other books contain strong evidence, as well, given the known capabilities of the AI.
This isn’t a very formal treatment, though.
“Evidence of evidence is more likely to be filtered evidence” is a more accurate phrasing.
I’m not exactly sure what the “more likely” here means. More likely than what?
The link keeps omitting the colon in the “http://.” I don’t know why it’s doing that.
Evidence of evidence is not (necessarily) evidence
Markup code:
Suppose, I am going to read a book by a top Catholic theologian. I know he is probably smarter than me, because of the number of priests in the world, and their average IQ and intellectual abilities, etc, I figure the smartest of them is probably really really smart and more well read and has the very best arguments the Church found in 2000 years. If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?
It’s the very fallacy Eliezer argues against where people know about clever arguers and use this fact against everyone else.
You should take the meta-information into account, because what you’re getting is filtered evidence. See What Evidence Filtered Evidence. If the book only contained very weak arguments, this would suggest that no strong arguments could be found, and would therefore be evidence against what the book was arguing for.
Fair enough. But the arguments themselves must also update my belief. It should not ever be the case that this meta stuff completely cancels out an argument that I think is valid. That is irrational, just like not listening to someone who belongs to the enemy.
If you were already completely certain that you were about to read a valid argument, and then you read that argument, then the meta stuff would completely cancel it out. If you were almost completely certain that you were about to read a valid argument, and then you read it, then the meta stuff would almost (but not completely) cancel it out. This is why reading the same argument twice in a row does not affect your confidence much more than reading it once does. But the less certain you were about the argument’s validity the first time, the more of an effect going over it again should have.
How can this be true when different arguments have different strength and you don’t know what the statement is? Here, suppose you believe that you are about to read a completely valid argument in support of conventional arithmetic. Please update your belief now. Here is the statement: “2+2=4”. What if it instead was Russell’s Principia Mathematica?
But you had assumed that the book would contain extremely strong arguments in favour of Zoroastrianism. Here strong means that P(Zoroastrianism is correct | argument is valid) is big, not that P(argument is valid) is big after reading the argument. (At least this is how I interpret your setting.) Both “all arguments in Principia Mathematica are correct” and “2+2=4″ have high probabilities of being true, but P(arithmetic is correct | all arguments in Principia are correct) is much higher than P(arithmetic is correct | 2+2=4).
We are running into meta issues that are really hard to wrap your head around. You believe that the book is likely to convince you, but it’s not absolutely guaranteed to. Whether it will do so surely depends on the actual arguments used. You’d expect, a priori, that if it argues for X which is more likely, its arguments would also be more convincing. But until you actually see the arguments, you don’t know that they will convince you. It depends on what they actually are. In your formulation, what happens if you read the book and the arguments do not convince you? Also, what if the arguments do not convince you, but only because you expect the book to be extremely convincing, is this different from the case of arguments taken without this meta-knowledge not convinving you?
I think I address some of these questions in another reply, but anyway, I will try a detailed description:
Let’s denote the following propositions:
Z = “Zoroastrianism is true.”
B = Some particular, previously unknown, statement included in the book. It is supposed to be evidence for Z. Let this be in form of propositions so that I am able to assign it a probability (e.g. B shouldn’t be a Pascal-wagerish extortion).
C(r) = “B is compelling to such extent that it shifts odds for Z by ratio r”. That is, C(r) = “P(B|Z) = r*P(B|not Z)”.
F = Unknown evidence against Z.
D(r) = “F shifts odds against Z by ratio r.”
Before reading the book
p(Z) is low
I may have a probability distribution for “B = S” (that is, “the convincing argument contained in the book is S”) over set of all possible S; but if I have it, it is implicit, in sense I have an algorithm which assigns p(B = S) for any given S, but haven’t gone through the whole huge set of all possible S—else the evidence in the book wouldn’t be new to me in any meaningful sense
I have p(S|Z) and p(S|not Z) for all S, implicitly like in the previous case
I can’t calculate the distribution p(C(r)) from p(B = S), p(S|Z) and p(S|not Z), since that would require calculating explicitly p(B = S) for every S, which is out of reach; however
I have obtained p(C(r)) by another means—knowledge about how the book is constructed—and p(C(r)) has most of its mass at pretty high values of r
by the same means I have obtained p(D(r)), which is distributed at as high or even higher values of r
Can I update the prior p(Z)? If I knew for certain that C(1,000) is true, I should take it into account and multiply the odds for Z by 1,000. If I knew that D(10,000) is true, I should analogically divide the odds by 10,000. Having probability distributions instead of certainty changes little—calculate the expected value* E(r) for both C and D and use that. If the values for C and D are similar or only differ a little (which is probably what we assume), then the updates approximately cancel out.
Now I read the book and find out what B is. This necessarily replaces my prior p(C(r)) by δ(r—R), where R = P(B|Z)/P(B|not Z). It can be that R is higher that the expected value E(r) calculated from p(C(r)), it can be lower too. If it is higher—the evidence in the book is more convincing, I will update upwards. If it is lower, I will update downwards. The odds would change by R/E(r). If my expectations of convincingness of arguments were correct, that is R = E(r), actually learning what B is does nothing with my probability distribution.
Potentially confounding factor is that our prior E(r) is usually very close to 1; if somebody tells me that he has a very convincing argument, I don’t really expect a convincing argument., because convincing arguments are rare and people regularly overestimate the width of audience who would find their favourite argument compelling. Therefore I normally need to hear the argument before updating significantly. But in this scenario it is stipulated that we already know in advance that the argument is convincing.
*) I suppose that E(r) here should be geometric mean of p(C(r)), but it can be another functional; it’s too late here to think about it.
I am not sure I completely follow, but I think the point is that you will in fact update the probability up if a new argument is more convincing than you expect. Since AI can better estimate what you expect it to do than you can estimate how convincing AI will make it, it will be able to make all arguments more convincing than you expect.
I think you are adding further specifications to the original setting. Your original description assumed that AI is a very clever arguer who constructs very persuasive deceptive arguments. Now you assume that AI actively tries to make the arguments more persuasive than you expect. You can stipulate for argument’s sake that AI can always make more convincing argument than you expect, but 1) it’s not clear whether it’s even possible in realistic circumstances, 2) it obscures the (interesting and novel) original problem (“is evidence of evidence equally valuable as the evidence itself?”) by rather standard Newcomb-like mind-reading paradox.
The evidence that this was the best book he could give you is evidence.
Maybe, but this meta stuff is giving me a headache. Should I update belief about belief, or just plain belief?:)
Which may very well be an adaptive fallacy that keeps you harder to manipulate by smarter people. It is possible that in the ancestral environment:
Cost of smart people manipulating you > Cost of being somewhat wrong in a hard to spot way
There is some degree to which you should expect to be swayed by empty arguments, and yes, you should subtract that out if you anticipate it. But if the book is a lot more compelling than that, then the book is probably above average both in arguing skill and in actual evidence. You cannot discount it solely as empty anymore, but neither should you assume that all of the “excess” convincing came from evidence—the book could just be unusually well written. You have to balance the improbabilities of evidence vs. writing, and update on the evidence found in that way.
Usually, the uncertainty grows with the size of the thing you’re trying to measure. This means that when thinking about super-duper-well-written books, the uncertainty in the writing skill gets really big. And so when balancing the improbabilities of evidence vs. writing, the evidence barely has to do any balancing at all—the writing skill just washes it out.
If the amount of evidence presented is the same, it’s better to hear about the truth from a child than from an orator, because the child doesn’t have all those orating skills mucking up your signal-to-noise.
Right. I think my argument hinges on the fact that AI knows how much you intend to subtract before you read the book, and can make it be more convincing than this amount.
I don’t think it’s okay to have the AI’s convincingness be truly infinite, in the full inf—inf = undefined sense. Your math will break down. Safer just to represent “suppose there’s a super-good arguer” by having the convincingess be finite, but larger than every other scale in the problem.
You have a very compelling point and I have to think about it. But there is meta-reasoning involved which is really tricky. As I start to read the book, I have some P(zoroastrianism is true). It’s non-zero. Now I read the first chapter, it has some positive evidence for Z in it. I expected to see some evidence, but it is actual evidence which I have not previously considered. Should I adjust my P(Z is true) up? I think I must. So, if the book has many chapters, I must either get close to 1, or else start converging to some p < 1. Are you arguing for the latter?
Consider the case where a friend says he saw a UFO. There are two possibilities: either the friend is lying/insane/gullible, or UFOs are real (there are probably some other possibilities, but for the sake of argument let’s focus on these).
Your friend’s statement can have different effects depending on what you already believe. If either probability is already at ~100%, you have no more work to do. IE, if you’re already sure your friend is a liar, you dismiss this as yet another lie and don’t start believing in UFOs; if you’re already sure UFOs exist, you dismiss this as yet another UFO and don’t start doubting your friend.
If you’re not ~100% sure of either statement, then your observation will increase both the probability that your friend is a liar, and that aliens exist, but in different amounts. If you think your friend usually tells the truth, but you’re not sure, it will increase your probability of UFOs quite a bit (your friend wouldn’t lie to you!) but as long as you’re not going to be sure of UFOs, you also have to leave some room for the case where UFOs aren’t real, in which case the statement increases your probability that your friend is a liar.
When you hear a great argument for P, your pre-existing beliefs determine what you do in the same way as in the UFO example. It could mean that your interlocutor is a rhetorical genius so brilliant they can think up great arguments even for false positions. Or it could mean P is true. In real life, the probability of the interlocutor being such a rhetorical genius is always less than ~100%, meaning that it has to increase your probability of P at least a little.
In your example, we already know that the AI is a rhetorical genius who can create an arbitrarily good argument for anything. That totally explains away the brilliant arguments, leaving nothing left to be explained by Zoroastrianism actually being true. It’s like when your friend who is a known insane liar says he saw a UFO: the insane liar part already explains away the evidence, so even though you’re hearing words that sound like evidence, no probabilities are actually being shifted.
I understand the principle, yes. But it means if your friend is a liar, no argument he gives needs to be examined on its own merits. But what if he is a liar and he saw a UFO? What if P(he is a liar) and P(there’s a UFO) are not independent? I think if they are independent, your argument works. If they are not, it doesn’t. If UFOs appear mostly to liars, you can’t ignore his evidence. Do you agree? In my case, they are not independent: it’s easier to argue for a true proposition, even for a very intelligent AI. Here I assume that P must be strictly less than 1 always.
Does the chapter really count as evidence? Normally, X is evidence for Z is P(X|Z) > P(X|not Z). In this case, X = “there are compelling arguments for Z” and you already suppose that X is true whether or not Z. X is therefore not evidence for Z. Of course, after reading the chapter you learn the particular compelling arguments A(1), A(2), … But those arguments support Z only through X and since you know X is not evidence, A(n) are screened off. Put another way, you know that for each A(n) there is an equally compelling argument B(n) that cancels it out. Knowing what the argument actually says is important only if you want to independently determine its compellingness. But you have assumed that you already know this.
Consider a more realistic scenario: you have a coin and two hypotheses:
F: the coin is fair
H: the coin is biased towards heads and it comes up heads twice as frequently as tails
Now you tell your servant: “Toss the coin million times and write the results down. Then, from the record select a subsequence S which has P(S|H) = 1.58 P(S|F) and tell me.” The servant follows your instruction and says “HHTH”. Now you have learned something new (the servant could for example tell “THHH” instead), but you don’t update your odds in favour of H by factor 1.58, because it was almost certain in advance that the servant would be able to locate a subsequence of desired property whether or not H holds.
It’s not obvious to me that X screens off the individual arguments. In particular, X only asserts the existence of at least one compelling argument. If there are multiple, independent compelling arguments in the book, then this should presumably increase our confidence in Z beyond just knowing X. Or am I confused about something?
Also, the individual pieces of evidence could undercut my confidence in the strength of the other books’ arguments conditional on (knowledge of) the evidence I just learned. For example, suppose we expect all arguments from the other books to proceed from some initially plausible premise P. But the Zoroastrian book produces strong evidence against P. Then our evidence goes quite beyond X.
Then substitute “there are many deceptive compelling arguments for Z, much more than any book can contain” for definition of X. The point stands.
I believe your second point is only a specific case of arguments more compelling than expected (here due to their ability to undermine the counterarguments). This is fine—if the arguments are unexpectedly compelling, you update.
On the other hand, the problem was pretty symmetric at the beginning—all books were considered equivalent in their persuasive strength. If book Z argues well against P and all other books took P as granted without similarly undermining the leading premise of book Z the system of books would be unbalanced (reading all books would cause you to believe Z independently of reading order), which would violate the assumed symmetry. So, if you find good anti-P arguments in book Z, the odds are that your assumption about the contents of other books being based on P is incorrect or the other books contain good counterargument which cancels this out. You should be very certain that the system of books is balanced, else the problem doesn’t work.
I think it’s easy to make my second point without the asymmetry. Let’s re-pose the problem so that we expect in advance not only that each book will produce strong evidence in favor of the religion it advocates, but also strong evidence that none of the other books contain strong counter-evidence or similarly undermining evidence. When you read book Z, you learn individual pieces of evidence z1, z2, …, zn. But z1, …, zn undermine your confidence that the other books contain strong arguments, thus disconfirming your belief that you’d likely find convincing evidence for Zoroastrianism in the book whether or not the religion is true. But then it starts looking like we have evidence for Zoroastrianism. However, if, as you argue, z1, …, zn only support Zoroastrianism through things we expected to see in advance of reading the book, then we shouldn’t have any evidence. So either I’m confused or we still have a problem.
The scenario, as I understand it, is based on assumption that the confidence about y = “all books contain equally strong evidence for their respective religion” is high. If y is absolutely certain, p(y) = 1, the confidence cannot be shaken by whatever is found in book Z. If, on the other hand, p(y) is not certain, then what happens depends a lot on relative strength of various pieces of evidence. But this is another (more complex and fuzzier) problem—now you expect that Z not only contains evidence for Zoroastrianism, but also evidence against the very statement of the thought experiment. Doubting y is not included in the original post, where the newly converted Zoroastrian admits that reading book A would deconvert him to atheism; he refrains from doing that only because he fears Ahura Mazda’s wrath.
I think your conclusion there trades on an ambiguity of what “evidence” refers to in your y (= “all books contain equally strong evidence for their respective religion”). The assumption y could mean either:
For each book x, x contains really compelling evidence that we’re sure would equally convince us if we were to encounter it in a normal situation (i.e., without knowing about the other books or the AI’s deviousness).
For each book x, x contains really compelling evidence even after considering and correctly reasoning about all the facts of the thought experiment.
Obviously the second interpretation is either incoherent or completely trivializes the thought experiment, since it’s an assumption about what the all-things-considered best thing to believe after reading a book is, when that’s precisely the question we’re being posed in the first place. On the other hand, the first interpretation, even if assumed with probability 1, is compatible with a given book lowering the posterior expected strength of evidence of the other books.
Fair point. The ambiguity is already included in the original formulation of the thought experiment. The first formulation is compatible with lowering the posterior expected strength of evidence of other books after reading one of them, but it is also compatible with being not convinced by the evidence at all. Assuming the first interpretation the problem is underspecified and no apparent paradox is present.
The second interpretation can have several subinterpretations:
2a) For each book x, reading x convinces ordinary human about the particular proposition argued for in x (possibly using biases and imperfections of human mind).
2b) For each book x, reading x convinces ideal Bayesian reasoner (IBR) about the particular proposition.
2a was probably closest to the meaning intended in the OP. It is a paradox only if we assume that ordinary human resoning is consistent, which we don’t assume, so there is no problem. 2b depends on what IBR exactly means. If it has no limitations on processing speed and memory the thought experiment becomes impossible, since the IBR has already considered all possible arguments and can’t be swayed by rhetorical trickery. If, on the other hand, the IBR has some physical limitations, 2b can be used to show that its thinking leads to inconsistencies, but it is not much more surprising than the same conclusion from the case 2a.