Let’s suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates’ interlocutors, you find yourself agreeing with things you don’t expect to agree with.
So the person in the thought experiment doesn’t expect to agree with a book’s conclusion, before reading it.
I now come to this AI, and request it to make a library of books for me (personally). Each is to be such that if I (specifically) were to read it, I would very likely come to believe a certain proposition.
I can interpret this in two ways. Either the AI is considered by the person to be capable of fulfilling the person’s request pefectly, so he is certain (purely as an unphysical premise of the thought experiment) that he is very likely (let’s say with 99% probability) to accept a book’s conclusion upon reading it. Or he has merely asked a superintelligent but fallible AI to do its best to ensure that with 99% likelihood he will accept the book’s conclusion.
It should take into account that initially I may be opposed to the proposition, and that I am aware that I am being manipulated.
He is aware that he is being manipulated. In other words, in my first interpretation he knows for certain (again, perfect certainty being allowed by definition in the thought experiment) that if he reads a given book, he will with 99% probability accept its conclusion afterwards. In my second interpretation, it depends on how strongly he expects the AI to be capable of manipulating him. But it seems fair to assume that, as you intend the thought experiment, he believes that if the AI aims at 99% likelihood of persuading him of any given conclusion, it is capable of doing so with very high probability from his perspective.
I submit that you have merely defined this person as reflectively inconsistent, i.e. irrational. In both of my two interpretations he expects to disagree with a book’s conclusion after reading it, and he also expects to agree with it with either exactly or approximately 99% probability after reading it. This follows from the description of the thought experiment’s premises.
I submit that this thought experiment does not prove that “one can really intentionally deceive oneself and be in a mentally consistent state”, because the person in question is in a mentally inconsistent state, as you have defined the thought experiment.
If you were to adapt the thought experiment such that the person does not strongly expect to be persuaded by the AI, then he is not “intentionally deceiving himself” because he does not set out to read a book with the expectation of changing his mind, but he is merely being messed around with by the AI.
So the person in the thought experiment doesn’t expect to agree with a book’s conclusion, before reading it.
No he expects that if he reads the book, his posterior belief in the proposition is likely going to be high. But his current prior belief in the truth of the proposition is low.
Also, as I made clear in my update, AI is not perfect, merely very good. I only need it to be good enough for the whole episode to go through, i.e. that you don’t argue that a rational person will never believe in Z after reading the book and my story is implausible.
No he expects that if he reads the book, his posterior belief in the proposition is likely going to be high. But his current prior belief in the truth of the proposition is low.
Also, as I made clear in my update, AI is not perfect, merely very good. I only need it to be good enough for the whole episode to go through, i.e. that you don’t argue that a rational person will never believe in Z after reading the book and my story is implausible.
So in other words, the person is expecting to be persuaded by something other than the truth. Perhaps on the basis that the last N times he read one of these books, it changed his mind.
In that case, it is no different than if the person were stepping into a brain modification booth, and having his mind altered directly. Because a rational person would simply not be conned by this process. He would see that he currently believes in the existence of the flying spaghetti monster, and that he just read a book on the flying spaghetti monster prepared by a superintelligent AI which he had asked to prepare for him ultra-persuasive but entirely biased collections of evidence, and remember that he didn’t formerly believe in the flying spaghetti monster. He would conclude on this basis that his belief probably has no basis in reality, i.e. is inaccurate, and stop believing (with such high probability) in it.
If we are to accept that the AI is good enough to prevent this happening—a necessary premise of the thought experiment—then it must be preventing the person from being rational in this way, perhaps by including statements in the book that in some extraordinary way reprogram his mind via some backdoor vulnerability. Let’s say that perhaps the person is an android creating by the AI for its own amusement, which responds to certain phrases with massive anomalous changes in its brain wiring. That is simply the only way I can accept the premises that:
a) the person applies Bayes’s theorem properly (if this is not true, then he is simply not “mentally consistent” as you said)
b) he is aware that the books are designed to persuade him with high probability
c) he believes that the propositions to be proven in the books are untrue in general
d) he believes with high probability that the books will persuade him
which, unless I am very much mistaken, are equivalent to your statements of the problem.
If reading a book is not basically equivalent to submitting knowingly to brain modification for belief in something, then one of the above is untrue, i.e. the premises are inconsistent and the thought experiment can tell us nothing.
Remember that you are trying to prove than “one can really intentionally deceive oneself and be in a mentally consistent (although weird) state”. I accept that there is nothing mentally inconsistent about submitting to have one’s beliefs changed by brain surgery in one’s sleep. But accepting the fact that “intentionally deceiving oneself” is just three words that could be applied to any referent, I don’t think that your apparent referent is what Eliezer was talking about in the post you linked to. So you haven’t refuted him.
“Intentionally deceiving oneself” in his discussion means “deciding that one should believe something, and then (using the mundane tools available to us now, like reading books, chanting mantras, going to church, meditating etc.) forcing oneself to believe something else”. This may be possible in the trivial sense that 0 is not a probability, but in a practical sense it is basically “impossible” and that is all that Eliezer was arguing.
I’m sure Eliezer and anyone else would agree that it is possible to be an ideal Bayesian, and step in a booth in order to have oneself modified to believe in the flying spaghetti monster. It does seem to me that in order for the booth to work, it is going to have to turn you into a non-Bayesian irrational person, erase all of your memories about these booths or install false beliefs about the booths and then implant barriers in your mind to prevent the rest of your brain from changing this belief. It seems like a very difficult problem to me – but then we are talking about a superintelligent AI! In fact I expect that you’d need to be altered so much that you couldn’t even expect to be approximately the same person after leaving the booth.
Incidentally, this reminds me of a concept discussed in Greg Egan’s book “Quarantine”, which you might find interesting.
EDIT:
On re-reading, I see that the modification process as I described it doesn’t actually uphold the premises of your thought experiment, because only one iteration of book-reading could occur before the person is no longer “mentally consistent” i.e. rational, and he can’t ever read more than one of the books either (since his beliefs about or knowledge of the books themselves have been changed—which is not what he asked of the AI). So in order for the premises to be consistent, the book-programming-brain-surgery would have to completely wipe his mind and build a set of experiences from scratch so as to make the Universe seem consistent with evidence of the flying spaghetti monster, without it having to turn him into a non-Bayesian. The person would have to have evidence that the AI is clever enough that he should believe that it will be able to make books that persuade him of anything. And the AI would probably reset his mind at a point where he believes that he has never actually read any of the books yet.
What if the person realises that this exact scenario might already have happened? If the person was aware of the existence of this AI, and that he was in the business of asking it to do things liable to change his mind, I don’t suppose that the line of reasoning I have outlined here would be hard for him to arrive at. This would be likely to undermine his belief in the reality of his entire life experiences in general, lowering his degree of belief in any particular deity. I suppose the easiest way around this would be if the AI were to make him sufficiently unintelligent that he doesn’t come to suspect this, but is just barely capable of understanding the idea of a really smart being that can make “books” to persuade him of things (bearing in mind that according to the premises of the thought experiment, he has to be mentally consistent, i.e. Bayesian, and cannot have arbitrary barriers erected inside in his mind).
Firstly, upvoted for an excellent problem!
So the person in the thought experiment doesn’t expect to agree with a book’s conclusion, before reading it.
I can interpret this in two ways. Either the AI is considered by the person to be capable of fulfilling the person’s request pefectly, so he is certain (purely as an unphysical premise of the thought experiment) that he is very likely (let’s say with 99% probability) to accept a book’s conclusion upon reading it. Or he has merely asked a superintelligent but fallible AI to do its best to ensure that with 99% likelihood he will accept the book’s conclusion.
He is aware that he is being manipulated. In other words, in my first interpretation he knows for certain (again, perfect certainty being allowed by definition in the thought experiment) that if he reads a given book, he will with 99% probability accept its conclusion afterwards. In my second interpretation, it depends on how strongly he expects the AI to be capable of manipulating him. But it seems fair to assume that, as you intend the thought experiment, he believes that if the AI aims at 99% likelihood of persuading him of any given conclusion, it is capable of doing so with very high probability from his perspective.
I submit that you have merely defined this person as reflectively inconsistent, i.e. irrational. In both of my two interpretations he expects to disagree with a book’s conclusion after reading it, and he also expects to agree with it with either exactly or approximately 99% probability after reading it. This follows from the description of the thought experiment’s premises. I submit that this thought experiment does not prove that “one can really intentionally deceive oneself and be in a mentally consistent state”, because the person in question is in a mentally inconsistent state, as you have defined the thought experiment.
If you were to adapt the thought experiment such that the person does not strongly expect to be persuaded by the AI, then he is not “intentionally deceiving himself” because he does not set out to read a book with the expectation of changing his mind, but he is merely being messed around with by the AI.
No he expects that if he reads the book, his posterior belief in the proposition is likely going to be high. But his current prior belief in the truth of the proposition is low.
Also, as I made clear in my update, AI is not perfect, merely very good. I only need it to be good enough for the whole episode to go through, i.e. that you don’t argue that a rational person will never believe in Z after reading the book and my story is implausible.
So in other words, the person is expecting to be persuaded by something other than the truth. Perhaps on the basis that the last N times he read one of these books, it changed his mind.
In that case, it is no different than if the person were stepping into a brain modification booth, and having his mind altered directly. Because a rational person would simply not be conned by this process. He would see that he currently believes in the existence of the flying spaghetti monster, and that he just read a book on the flying spaghetti monster prepared by a superintelligent AI which he had asked to prepare for him ultra-persuasive but entirely biased collections of evidence, and remember that he didn’t formerly believe in the flying spaghetti monster. He would conclude on this basis that his belief probably has no basis in reality, i.e. is inaccurate, and stop believing (with such high probability) in it.
If we are to accept that the AI is good enough to prevent this happening—a necessary premise of the thought experiment—then it must be preventing the person from being rational in this way, perhaps by including statements in the book that in some extraordinary way reprogram his mind via some backdoor vulnerability. Let’s say that perhaps the person is an android creating by the AI for its own amusement, which responds to certain phrases with massive anomalous changes in its brain wiring. That is simply the only way I can accept the premises that:
a) the person applies Bayes’s theorem properly (if this is not true, then he is simply not “mentally consistent” as you said)
b) he is aware that the books are designed to persuade him with high probability
c) he believes that the propositions to be proven in the books are untrue in general
d) he believes with high probability that the books will persuade him
which, unless I am very much mistaken, are equivalent to your statements of the problem.
If reading a book is not basically equivalent to submitting knowingly to brain modification for belief in something, then one of the above is untrue, i.e. the premises are inconsistent and the thought experiment can tell us nothing.
Remember that you are trying to prove than “one can really intentionally deceive oneself and be in a mentally consistent (although weird) state”. I accept that there is nothing mentally inconsistent about submitting to have one’s beliefs changed by brain surgery in one’s sleep. But accepting the fact that “intentionally deceiving oneself” is just three words that could be applied to any referent, I don’t think that your apparent referent is what Eliezer was talking about in the post you linked to. So you haven’t refuted him.
“Intentionally deceiving oneself” in his discussion means “deciding that one should believe something, and then (using the mundane tools available to us now, like reading books, chanting mantras, going to church, meditating etc.) forcing oneself to believe something else”. This may be possible in the trivial sense that 0 is not a probability, but in a practical sense it is basically “impossible” and that is all that Eliezer was arguing.
I’m sure Eliezer and anyone else would agree that it is possible to be an ideal Bayesian, and step in a booth in order to have oneself modified to believe in the flying spaghetti monster. It does seem to me that in order for the booth to work, it is going to have to turn you into a non-Bayesian irrational person, erase all of your memories about these booths or install false beliefs about the booths and then implant barriers in your mind to prevent the rest of your brain from changing this belief. It seems like a very difficult problem to me – but then we are talking about a superintelligent AI! In fact I expect that you’d need to be altered so much that you couldn’t even expect to be approximately the same person after leaving the booth.
Incidentally, this reminds me of a concept discussed in Greg Egan’s book “Quarantine”, which you might find interesting.
EDIT:
On re-reading, I see that the modification process as I described it doesn’t actually uphold the premises of your thought experiment, because only one iteration of book-reading could occur before the person is no longer “mentally consistent” i.e. rational, and he can’t ever read more than one of the books either (since his beliefs about or knowledge of the books themselves have been changed—which is not what he asked of the AI). So in order for the premises to be consistent, the book-programming-brain-surgery would have to completely wipe his mind and build a set of experiences from scratch so as to make the Universe seem consistent with evidence of the flying spaghetti monster, without it having to turn him into a non-Bayesian. The person would have to have evidence that the AI is clever enough that he should believe that it will be able to make books that persuade him of anything. And the AI would probably reset his mind at a point where he believes that he has never actually read any of the books yet.
What if the person realises that this exact scenario might already have happened? If the person was aware of the existence of this AI, and that he was in the business of asking it to do things liable to change his mind, I don’t suppose that the line of reasoning I have outlined here would be hard for him to arrive at. This would be likely to undermine his belief in the reality of his entire life experiences in general, lowering his degree of belief in any particular deity. I suppose the easiest way around this would be if the AI were to make him sufficiently unintelligent that he doesn’t come to suspect this, but is just barely capable of understanding the idea of a really smart being that can make “books” to persuade him of things (bearing in mind that according to the premises of the thought experiment, he has to be mentally consistent, i.e. Bayesian, and cannot have arbitrary barriers erected inside in his mind).
It seems that this thought experiment has turned out to be an example of the hidden complexity of wishes!