I think I want to split up ricraz’s examples in the post into two subclasses, defined by two questions.
The first asks, given that there are many different AGI architectures one could scale up into, are some better than others? (My intuition is both that there are better ones than others, and also that there are many who are on the pareto frontier.) And is there any sort of simple ways to determine about why one is better than another? This leads to saying the following examples from the OP:
There is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general; there is an “ideal” decision theory; the idea that AGI will very likely be an “agent”; the idea that Turing machines and Kolmogorov complexity are foundational for epistemology; the idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
The second asks—suppose that some architectures are better than others, and suppose there are some simple explanations about why some are better than others. How practical is it to talk of me in this way today? Here’s some concrete examples of things I might do:
Given certain evidence for a proposition, there’s an “objective” level of subjective credence which you should assign to it, even under computational constraints; the idea that Aumann’s agreement theorem is relevant to humans; the idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors; the idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).
If I am to point to two examples that feel very concrete to me, I might ask:
When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
Certainly the first person is likely mistaken if they’re saying “In principle no exchange of evidence could cause us to agree”, but perhaps the second person is also mistaken, in implying that it makes any sense to model their disagreement in terms of idealised, scaled-up, rational agents rather than the weird bag of meat and neuroscience that we actually are—for which Aumann’s Agreement Theorem certainly has not been proven.
To be clear: the two classes of examples come from roughly the same generator, and advances in our understanding of one can lead to advances in the other. I just often draw from fairly different reference classes of evidence for updating on them (examples: For the former, Jaynes, Shannon, Feynman. For the latter, Kahneman & Tversky and Tooby & Cosmides).
When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
Making a type error is not easy to distinguish from attempting to shift frame. (If it were, the frame control wouldn’t be very effective.) In the example Eliezer gave from the sequences, he was shifting frame from one that implicitly acknowledges interpretive labor as a cost, to one that demands unlimited amounts of interpretive labor by assuming that we’re all perfect Bayesians (and therefore have unlimited computational ability, memory, etc).
This is a big part of the dynamic underlying mistake vs conflict theory.
Eliezer’s behavior in the story you’re alluding to only seems “rational” insofar as we think the other side ends up with a better opinion—I can easily imagine a structurally identical interaction where the protagonist manipulates someone into giving up on a genuine but hard to articulate objection, or proceeding down a conversational path they’re ill-equipped to navigate, thus “closing the sale.”
It’s not at all clear that improving the other person’s opinion was really one of Eliezer’s goals on this occasion, as opposed to showing up the other person’s intellectual inferiority. He called the post “Bayesian Judo”, and highlighted how his showing-off impressed someone of the opposite sex.
He does also suggest that in the end he and the other person came to some sort of agreement—but it seems pretty clear that the thing they agreed on had little to do with the claim the other guy had originally been making, and that the other guy’s opinion on that didn’t actually change. So I think an accurate, though arguably unkind, summary of “Bayesian Judo” goes like this: “I was at a party, I got into an argument with a religious guy who didn’t believe AI was possible, I overwhelmed him with my superior knowledge and intelligence, he submitted to my manifest superiority, and the whole performance impressed a woman”. On this occasion, helping the other party to have better opinions doesn’t seem to have been a high priority.
When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
Note: I confess to being a bit surprised that you picked this example. I’m not quite sure whether you picked a bad example for your point (possible) or whether I’m misunderstanding your point (also possible), but I do think that this question is interesting all on its own, so I’m going to try and answer it.
Here’s a joke that you’ve surely heard before—or have you?
Three mathematicians walk into a bar. The bartender asks them, “Do you all want a beer?”
The first mathematician says, “I don’t know.”
The second mathematician says, “I don’t know.”
The third mathematician says, “I don’t know.”
The lesson of this joke applies to the “according to Aumann’s Agreement Theorem …” case.
When someone says “I guess we’ll have to agree to disagree” and their interlocutor responds with “Actually according to Aumann’s Agreement Theorem, we can’t”, I don’t know if I’d call this a “type error”, precisely (maybe it is; I’d have to think about it more carefully); but the second person is certainly being ridiculous. And if I were the first person in such a situation, my response might be something along these lines:
“Really? We can’t? We can’t what, exactly? For example, I could turn around and walk away. Right? Surely, the AAT doesn’t say that I will be physically unable to do that? Or does it, perhaps, say that either you or I or both of us will be incapable of interacting amicably henceforth, and conversing about all sorts of topics other than this one? But if not, then what on Earth could you have meant by your comment?
“I mean… just what, exactly, did you think I meant, when I suggested that we agree to disagree? Did you take me to be claiming that (a) the both of us are ideal Bayesian reasoners, and (b) we have common knowledge of our posterior probabilities of the clearly expressible proposition the truth of which we are discussing, but (c) our posterior probabilities, after learning this, should nonetheless differ? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate of the probability that I actually meant to make that specific, precisely technical statement?”
And so on. The questions are rhetorical, of course. Anyone with half an ounce of common sense (not to mention anyone with an actual understanding of the AAT!) understands perfectly well that the Theorem is totally inapplicable to such cases.
(Of course, in some sense this is all moot. The one who says “actually, according to the AAT…” doesn’t really think that his interlocutor meant all of that. He’s not really making any kind of error… except, possibly, a tactical one—but perhaps not even that.)
I said, “So if I make an Artificial Intelligence that, without being deliberately preprogrammed with any sort of script, starts talking about an emotional life that sounds like ours, that means your religion is wrong.”
He said, “Well, um, I guess we may have to agree to disagree on this.”
I said: “No, we can’t, actually. There’s a theorem of rationality called Aumann’s Agreement Theorem which shows that no two rationalists can agree to disagree. If two people disagree with each other, at least one of them must be doing something wrong.”
(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)
Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.
Suppose one person is trying to ignore a small piece of evidence against a cherished position, and a second person says to the them “I know you’ve ignored this piece of evidence, but you can’t do that because it is Bayesian evidence—it is the case that you’re more likely to see this occur in worlds where your belief is false than in worlds where it’s true, so the correct epistemic move here is to slightly update against your current belief.”
If I may clumsily attempt to wrangle your example to my own ends, might they not then say:
“I mean… just what, exactly, did you think I meant, when I said this wasn’t any evidence at all? Did you take me to be claiming that (a) I am an ideal Bayesian reasoner, and (b) I have observed evidence that occurs in more worlds where my belief is true than if it is false, but (c) my posterior probability, after learning this, should still equal my prior probability? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate that I actually meant to make that specific, precisely technical statement?”
and further
I am not a rational agent. I am a human, and my mind does not satisfy the axioms of probability theory; therefore it is nonsensical to attempt to have me conform my speech patterns and actions to these logical formalisms.
Bayes’ theorem applies if your beliefs update according to very strict axioms, but it’s not at all obvious to me that the weird fleshy thing in my head currently conforms to those axioms. Should I nonetheless try to? And if so, why shouldn’t I for AAT?
Aumann’s Agreement Theorem is true if we are rational (bayesian) agents. There a large other number of theorems that apply to rational agents too, and it seems that sometimes people want to use these abstract formalisms to guide behaviour and sometimes not, and having a principled stance here about when and when not to use them seems useful and important.
Well, I guess you probably won’t be surprised to hear that I’m very familiar with that particular post of Eliezer’s, and instantly thought of it when I read your example. So, consider my commentary with that in mind!
(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)
Well, whether Eliezer was using the AAT correctly rather depends on what he meant by “rationalist”. Was he using it as a synonym for “perfect Bayesian reasoner”? (Not an implausible reading, given his insistence elsewhere on the term “aspiring rationalist” for mere mortals like us, and, indeed, like himself.) If so, then certainly what he said about the Theorem was true… but then, of course, it would be wholly inappropriate to apply it in the actual case at hand (especially since his interlocutor was, I surmise, some sort of religious person, and plausibly not even an aspiring rationalist).
If, instead, Eliezer was using “rationalist” to refer to mere actual humans of today, such as himself and the fellow he was conversing with, then his description of the AAT was simply inaccurate.
Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.
Indeed not. The critical point is this: there is a difference between trying to use Bayesian reasoning and intepreting people’s comments to refer to Bayesian reasoning. Whether you do the former is between you and your intellectual conscience, so to speak. Whether you do the latter, on the other hand, is a matter of both pragmatics (is this any kind of a good idea?) and of factual accuracy (are you correctly understand what someone is saying?).
So the problem with your example, and with your point, is the equivocation between two questions:
“I’m not a perfect Bayesian reasoner, but shouldn’t I try to be?” (And the third-person variant, which is isomorphic to the first-person variant to whatever degree your goals and that of your advisee/victim are aligned.)
“My interlocutor is not speaking with the assumption that we’re perfect Bayesian reasoners, nor is he referring to agreement or belief or anything else in any kind of a strict, technical, Bayesian sense, but shouldn’t I assume that he is, thus ascribing meaning to his words that is totally different than his intended meaning?”
The answer to the first question is somewhere between “Uh, sure, why not, I guess? That’s your business, anyway” and “Yes, totally do that! Tsuyoku naritai, and all that!”.
The answer to the second question is “No, that is obviously a terrible idea. Never do that.”
You have entirely missed the point I was making in that comment.
Of course I am aware of the standard form of the joke. I presented my modified form of the joke in the linked comment, as a deliberate contrast with the standard form, to illustrate the point I was making.
Aumann’s agreement theorem says that two people acting rationally (in a certain precise sense) and with common knowledge of each other’s beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesian rationalists with common priors, and if they each have common knowledge of their individual posterior probabilities, then their posteriors must be equal.
With common priors.
This is what does all the work there! If the disagreeers have non-equal priors on one of the points, then of course they’ll have different posteriors.
Of course applying Bayes’ Theorem with the same inputs is going to give the same outputs, that’s not even a theorem, that’s an equals sign.
If the disagreeers find a different set of parameters to be relevant, and/or the parameters they both find relevant do not have the same values, the outputs will differ, and they will continue to disagree.
I think I want to split up ricraz’s examples in the post into two subclasses, defined by two questions.
The first asks, given that there are many different AGI architectures one could scale up into, are some better than others? (My intuition is both that there are better ones than others, and also that there are many who are on the pareto frontier.) And is there any sort of simple ways to determine about why one is better than another? This leads to saying the following examples from the OP:
The second asks—suppose that some architectures are better than others, and suppose there are some simple explanations about why some are better than others. How practical is it to talk of me in this way today? Here’s some concrete examples of things I might do:
If I am to point to two examples that feel very concrete to me, I might ask:
Is the reasoning that Harry is doing in Chapter 86: Multiple Hypothesis Testing useful or totally insane?
When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
Certainly the first person is likely mistaken if they’re saying “In principle no exchange of evidence could cause us to agree”, but perhaps the second person is also mistaken, in implying that it makes any sense to model their disagreement in terms of idealised, scaled-up, rational agents rather than the weird bag of meat and neuroscience that we actually are—for which Aumann’s Agreement Theorem certainly has not been proven.
To be clear: the two classes of examples come from roughly the same generator, and advances in our understanding of one can lead to advances in the other. I just often draw from fairly different reference classes of evidence for updating on them (examples: For the former, Jaynes, Shannon, Feynman. For the latter, Kahneman & Tversky and Tooby & Cosmides).
Making a type error is not easy to distinguish from attempting to shift frame. (If it were, the frame control wouldn’t be very effective.) In the example Eliezer gave from the sequences, he was shifting frame from one that implicitly acknowledges interpretive labor as a cost, to one that demands unlimited amounts of interpretive labor by assuming that we’re all perfect Bayesians (and therefore have unlimited computational ability, memory, etc).
This is a big part of the dynamic underlying mistake vs conflict theory.
Eliezer’s behavior in the story you’re alluding to only seems “rational” insofar as we think the other side ends up with a better opinion—I can easily imagine a structurally identical interaction where the protagonist manipulates someone into giving up on a genuine but hard to articulate objection, or proceeding down a conversational path they’re ill-equipped to navigate, thus “closing the sale.”
It’s not at all clear that improving the other person’s opinion was really one of Eliezer’s goals on this occasion, as opposed to showing up the other person’s intellectual inferiority. He called the post “Bayesian Judo”, and highlighted how his showing-off impressed someone of the opposite sex.
He does also suggest that in the end he and the other person came to some sort of agreement—but it seems pretty clear that the thing they agreed on had little to do with the claim the other guy had originally been making, and that the other guy’s opinion on that didn’t actually change. So I think an accurate, though arguably unkind, summary of “Bayesian Judo” goes like this: “I was at a party, I got into an argument with a religious guy who didn’t believe AI was possible, I overwhelmed him with my superior knowledge and intelligence, he submitted to my manifest superiority, and the whole performance impressed a woman”. On this occasion, helping the other party to have better opinions doesn’t seem to have been a high priority.
Note: I confess to being a bit surprised that you picked this example. I’m not quite sure whether you picked a bad example for your point (possible) or whether I’m misunderstanding your point (also possible), but I do think that this question is interesting all on its own, so I’m going to try and answer it.
Here’s a joke that you’ve surely heard before—or have you?
The lesson of this joke applies to the “according to Aumann’s Agreement Theorem …” case.
When someone says “I guess we’ll have to agree to disagree” and their interlocutor responds with “Actually according to Aumann’s Agreement Theorem, we can’t”, I don’t know if I’d call this a “type error”, precisely (maybe it is; I’d have to think about it more carefully); but the second person is certainly being ridiculous. And if I were the first person in such a situation, my response might be something along these lines:
“Really? We can’t? We can’t what, exactly? For example, I could turn around and walk away. Right? Surely, the AAT doesn’t say that I will be physically unable to do that? Or does it, perhaps, say that either you or I or both of us will be incapable of interacting amicably henceforth, and conversing about all sorts of topics other than this one? But if not, then what on Earth could you have meant by your comment?
“I mean… just what, exactly, did you think I meant, when I suggested that we agree to disagree? Did you take me to be claiming that (a) the both of us are ideal Bayesian reasoners, and (b) we have common knowledge of our posterior probabilities of the clearly expressible proposition the truth of which we are discussing, but (c) our posterior probabilities, after learning this, should nonetheless differ? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate of the probability that I actually meant to make that specific, precisely technical statement?”
And so on. The questions are rhetorical, of course. Anyone with half an ounce of common sense (not to mention anyone with an actual understanding of the AAT!) understands perfectly well that the Theorem is totally inapplicable to such cases.
(Of course, in some sense this is all moot. The one who says “actually, according to the AAT…” doesn’t really think that his interlocutor meant all of that. He’s not really making any kind of error… except, possibly, a tactical one—but perhaps not even that.)
Firstly, I hadn’t heard the joke before, and it made me chuckle to myself.
Secondly, I loved this comment, for very accurately conveying the perspective I felt like ricraz was trying to defend wrt realism about rationality.
Let me say two (more) things in response:
Firstly, I was taking the example directly from Eliezer.
(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)
Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.
Suppose one person is trying to ignore a small piece of evidence against a cherished position, and a second person says to the them “I know you’ve ignored this piece of evidence, but you can’t do that because it is Bayesian evidence—it is the case that you’re more likely to see this occur in worlds where your belief is false than in worlds where it’s true, so the correct epistemic move here is to slightly update against your current belief.”
If I may clumsily attempt to wrangle your example to my own ends, might they not then say:
and further
Bayes’ theorem applies if your beliefs update according to very strict axioms, but it’s not at all obvious to me that the weird fleshy thing in my head currently conforms to those axioms. Should I nonetheless try to? And if so, why shouldn’t I for AAT?
Aumann’s Agreement Theorem is true if we are rational (bayesian) agents. There a large other number of theorems that apply to rational agents too, and it seems that sometimes people want to use these abstract formalisms to guide behaviour and sometimes not, and having a principled stance here about when and when not to use them seems useful and important.
Well, I guess you probably won’t be surprised to hear that I’m very familiar with that particular post of Eliezer’s, and instantly thought of it when I read your example. So, consider my commentary with that in mind!
Well, whether Eliezer was using the AAT correctly rather depends on what he meant by “rationalist”. Was he using it as a synonym for “perfect Bayesian reasoner”? (Not an implausible reading, given his insistence elsewhere on the term “aspiring rationalist” for mere mortals like us, and, indeed, like himself.) If so, then certainly what he said about the Theorem was true… but then, of course, it would be wholly inappropriate to apply it in the actual case at hand (especially since his interlocutor was, I surmise, some sort of religious person, and plausibly not even an aspiring rationalist).
If, instead, Eliezer was using “rationalist” to refer to mere actual humans of today, such as himself and the fellow he was conversing with, then his description of the AAT was simply inaccurate.
Indeed not. The critical point is this: there is a difference between trying to use Bayesian reasoning and intepreting people’s comments to refer to Bayesian reasoning. Whether you do the former is between you and your intellectual conscience, so to speak. Whether you do the latter, on the other hand, is a matter of both pragmatics (is this any kind of a good idea?) and of factual accuracy (are you correctly understand what someone is saying?).
So the problem with your example, and with your point, is the equivocation between two questions:
“I’m not a perfect Bayesian reasoner, but shouldn’t I try to be?” (And the third-person variant, which is isomorphic to the first-person variant to whatever degree your goals and that of your advisee/victim are aligned.)
“My interlocutor is not speaking with the assumption that we’re perfect Bayesian reasoners, nor is he referring to agreement or belief or anything else in any kind of a strict, technical, Bayesian sense, but shouldn’t I assume that he is, thus ascribing meaning to his words that is totally different than his intended meaning?”
The answer to the first question is somewhere between “Uh, sure, why not, I guess? That’s your business, anyway” and “Yes, totally do that! Tsuyoku naritai, and all that!”.
The answer to the second question is “No, that is obviously a terrible idea. Never do that.”
Actually, there is a logical error in your mathematicians joke—at least compared to how this joke normally goes. When it’s their turn, the 3rd mathematician knows that the first two wanted a beer (otherwise they would have said “yes”), and so can say Yes/No. https://www.beingamathematician.org/Jokes/445-three-logicians-walk-into-a-bar.png
You have entirely missed the point I was making in that comment.
Of course I am aware of the standard form of the joke. I presented my modified form of the joke in the linked comment, as a deliberate contrast with the standard form, to illustrate the point I was making.
Aumann’s agreement theorem says that two people acting rationally (in a certain precise sense) and with common knowledge of each other’s beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesian rationalists with common priors, and if they each have common knowledge of their individual posterior probabilities, then their posteriors must be equal.
With common priors.
This is what does all the work there! If the disagreeers have non-equal priors on one of the points, then of course they’ll have different posteriors.
Of course applying Bayes’ Theorem with the same inputs is going to give the same outputs, that’s not even a theorem, that’s an equals sign.
If the disagreeers find a different set of parameters to be relevant, and/or the parameters they both find relevant do not have the same values, the outputs will differ, and they will continue to disagree.
Relevant: Why Common Priors