My question is: there seems to be a good deal of context missing. What was the motivation for this post? What conversation context was it taken from? It’s difficult to interpret it, without that information.
[...] What I write is true to the best of my knowledge, because I can look it over and check before publishing. What I say aloud sometimes comes out false because my tongue moves faster than my deliberative intelligence can look it over and spot the distortion. Oh, we’re not talking about grotesque major falsehoods—but the first words off my tongue sometimes shade reality, twist events just a little toward the way they should have happened...
From the inside, it feels a lot like the experience of un-consciously-chosen, perceptual-speed, internal rationalization. I would even say that so far as I can tell, it’s the same brain hardware running in both cases—that it’s just a circuit for lying in general, both for lying to others and lying to ourselves, activated whenever reality begins to feel inconvenient.
There was a time—if I recall correctly—when I didn’t notice these little twists. And in fact it still feels embarrassing to confess them, because I worry that people will think: “Oh, no! Eliezer lies without even thinking! He’s a pathological liar!” For they have not yet noticed the phenomenon, and actually believe their own little improvements on reality—their own brain being twisted around the same way, remembering reality the way it should be (for the sake of the conversational convenience at hand).
[… I once asked someone] “Have you been hurt in the past by telling the truth?” “Yes”, he said, or “Of course”, or something like that -
(- and my brain just flashed up a small sign noting how convenient it would be if he’d said “Of course”—how much more smoothly that sentence would flow—but in fact I don’t remember exactly what he said; and if I’d been speaking out loud, I might have just said, “‘Of course’, he said” which flows well. This is the sort of thing I’m talking about, and if you don’t think it’s dangerous, you don’t understand at all how hard it is to find truth on real problems, where a single tiny shading can derail a human train of thought entirely -) [...]
[...] The German philosopher Fichte once said, “I would not break my word even to save humanity.”
Raymond Smullyan, in whose book I read this quote, seemed to laugh and not take Fichte seriously.
Abraham Heschel said of Fichte, “His salvation and righteousness were apparently so much more important to him than the fate of all men that he would have destroyed mankind to save himself.”
I don’t think they get it.
If a serial killer comes to a confessional, and confesses that he’s killed six people and plans to kill more, should the priest turn him in? I would answer, “No.” If not for the seal of the confessional, the serial killer would never have come to the priest in the first place. All else being equal, I would prefer the world in which the serial killer talks to the priest, and the priest gets a chance to try and talk the serial killer out of it. [...]
I approve of this custom and its absoluteness, and I wish we had a rationalist equivalent.
The trick would be establishing something of equivalent strength to a Catholic priest who believes God doesn’t want him to break the seal, rather than the lesser strength of a psychiatrist who outsources their tape transcriptions to Pakistan. Otherwise serial killers will, quite sensibly, use the Catholic priests instead, and get less rational advice.
Suppose someone comes to a rationalist Confessor and says: “You know, tomorrow I’m planning to wipe out the human species using this neat biotech concoction I cooked up in my lab.” What then? Should you break the seal of the confessional to save humanity?
It appears obvious to me that the issues here are just those of the one-shot Prisoner’s Dilemma, and I do not consider it obvious that you should defect on the one-shot PD if the other player cooperates in advance on the expectation that you will cooperate as well.
There are issues with trustworthiness and how the sinner can trust the rationalist’s commitment. It is not enough to be trustworthy; you must appear so. [...]
There’s a proverb I failed to Google, which runs something like, “Once someone is known to be a liar, you might as well listen to the whistling of the wind.” You wouldn’t want others to expect you to lie, if you have something important to say to them; and this issue cannot be wholly decoupled from the issue of whether you actually tell the truth. If you’ll lie when the fate of the world is at stake, and others can guess that fact about you, then, at the moment when the fate of the world is at stake, that’s the moment when your words become the whistling of the wind. [...]
For the benefit of others reading this, then, here’s what I consider to be the best presentation of the opposite view (the one Eliezer mentions, but rejects, in the first linked post): Paul Christiano’s “If we can’t lie to others, we’ll lie to ourselves”.
Seconded. A lot of recent postings have had this problem...they seem to start in the middle, or be reports of conversations where a lot of idiosyncratic vocabulary was developed.
Maybe what is going on here is that you are satisfied with your brain’s current ability to make ethical choices, but Eliezer isn’t, and his efforts to improve have yielded some thoughts worth putting on the public internet to try to help others who are also dissatisfied with their brain’s current ability to make ethical choices.
The original FB comment version of this post came with the same (lack of) context as it did here, and my impression was that this was something close to rholerith’s take. (I also think that this part of some ongoing thoughts that Local Validity was also exploring, but am not sure)
There’s something I’ve seen some rationalists try for, which I think Eliezer might be aiming at here, which is to try and be a truly robust agent.
Be the sort of person that Omega (even a version of Omega who’s only 90% accurate) can clearly tell is going to one-box.
Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into a trap.
Be the sort of agent who, if some AI engineers were whiteboarding out the agent’s decision making, they were see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
Not sure if that’s precisely what’s going on here but I think is at least somewhat related. If your day job is designing agents that could be provably friendly, it suggests the question of “how can I be provably friendly?”
There’s something I’ve seen some rationalists try for, which I think Eliezer might be aiming at here, which is to try and be a truly robust agent.
This is very close to what I’ve always seen as the whole intent of the Sequences. I also feel like there’s a connection here to what I see as a bidirectional symmetry of the Sequences’ treatment of human rationality and Artificially Intelligent Agents. I still have trouble phrasing exactly what it is I feel like I notice here, but here’s an attempt:
As an introductory manual on improving the Art of Human Rationality, the hypothetical perfectly rational, internally consistent, computationally Bayes-complete superintelligence is used as the Platonic Ideal of a Rational Intelligence, and the Sequences ground many of Rationality’s tools, techniques, and heuristics as approximations of that fundamentally non-human ideal evidence processor.
or in the other direction:
As an introductory guide to building a Friendly Superintelligence, the Coherent Extrapolated Human Rationalist, a model developed from intuitively appealing rational virtues, is used as a guide for what we want optimal intelligent agents to look like, and the Sequences as a whole are about taking this more human grounding, and justifying it as the basis on which to guide the development of AI into something that works properly, and something that we see as Friendly.
Maybe that’s not the best description, but I think there’s something there and that it’s relevant to this idea of trying to use rationality to be a “truly robust agent”. In any case I’ve always felt there was an interesting parallel with how the Sequences can be seen as “A Manual For Building Friendly AI” based on rational Bayesian principles, or “A Manual For Teaching Humans Rational Principles” based on an idealized Bayesian AI.
There’s something I’ve seen some rationalists try for, which I think Eliezer might be aiming at here, which is to try and be a truly robust agent.
I really like this phrasing. Previously, I’ve just had vague sense that there are ways to have a lot more integrity such that many situations get a huge utility bump, which can only be achieved through very clear thinking. Your comment helped me make that a lot more concrete.
You must have missed the recent news. Short version is that Peter Thiel got angry at Eliezer’s post about Trump, and he decided to send no more money to MIRI. To secure money for further AI research a few rationalists from Berkeley attempted to rob a bank; things got messy, their attempt at “acausal negotiation” about the hostages failed (predictably, duh); the only good news is that no one got killed. Now Eliezer is trying to lay some groundwork to mitigate the PR damage, in my humble opinion not very convincingly.
My question is: there seems to be a good deal of context missing. What was the motivation for this post? What conversation context was it taken from? It’s difficult to interpret it, without that information.
Some context from Eliezer’s Honesty: Beyond Internal Truth (in 2009):
And from Prices or Bindings? (2008):
I see, thanks.
For the benefit of others reading this, then, here’s what I consider to be the best presentation of the opposite view (the one Eliezer mentions, but rejects, in the first linked post): Paul Christiano’s “If we can’t lie to others, we’ll lie to ourselves”.
Seconded. A lot of recent postings have had this problem...they seem to start in the middle, or be reports of conversations where a lot of idiosyncratic vocabulary was developed.
I was going to make this same comment. Without context, seems like a lot of fixing something that ain’t broke.
Maybe what is going on here is that you are satisfied with your brain’s current ability to make ethical choices, but Eliezer isn’t, and his efforts to improve have yielded some thoughts worth putting on the public internet to try to help others who are also dissatisfied with their brain’s current ability to make ethical choices.
Maybe, or maybe there’s a different context entirely. As Said says, there really wasn’t much context to this at all.
The original FB comment version of this post came with the same (lack of) context as it did here, and my impression was that this was something close to rholerith’s take. (I also think that this part of some ongoing thoughts that Local Validity was also exploring, but am not sure)
There’s something I’ve seen some rationalists try for, which I think Eliezer might be aiming at here, which is to try and be a truly robust agent.
Be the sort of person that Omega (even a version of Omega who’s only 90% accurate) can clearly tell is going to one-box.
Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into a trap.
Be the sort of agent who, if some AI engineers were whiteboarding out the agent’s decision making, they were see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
Not sure if that’s precisely what’s going on here but I think is at least somewhat related. If your day job is designing agents that could be provably friendly, it suggests the question of “how can I be provably friendly?”
This is very close to what I’ve always seen as the whole intent of the Sequences. I also feel like there’s a connection here to what I see as a bidirectional symmetry of the Sequences’ treatment of human rationality and Artificially Intelligent Agents. I still have trouble phrasing exactly what it is I feel like I notice here, but here’s an attempt:
As an introductory manual on improving the Art of Human Rationality, the hypothetical perfectly rational, internally consistent, computationally Bayes-complete superintelligence is used as the Platonic Ideal of a Rational Intelligence, and the Sequences ground many of Rationality’s tools, techniques, and heuristics as approximations of that fundamentally non-human ideal evidence processor.
or in the other direction:
As an introductory guide to building a Friendly Superintelligence, the Coherent Extrapolated Human Rationalist, a model developed from intuitively appealing rational virtues, is used as a guide for what we want optimal intelligent agents to look like, and the Sequences as a whole are about taking this more human grounding, and justifying it as the basis on which to guide the development of AI into something that works properly, and something that we see as Friendly.
Maybe that’s not the best description, but I think there’s something there and that it’s relevant to this idea of trying to use rationality to be a “truly robust agent”. In any case I’ve always felt there was an interesting parallel with how the Sequences can be seen as “A Manual For Building Friendly AI” based on rational Bayesian principles, or “A Manual For Teaching Humans Rational Principles” based on an idealized Bayesian AI.
I really like this phrasing. Previously, I’ve just had vague sense that there are ways to have a lot more integrity such that many situations get a huge utility bump, which can only be achieved through very clear thinking. Your comment helped me make that a lot more concrete.
You must have missed the recent news. Short version is that Peter Thiel got angry at Eliezer’s post about Trump, and he decided to send no more money to MIRI. To secure money for further AI research a few rationalists from Berkeley attempted to rob a bank; things got messy, their attempt at “acausal negotiation” about the hostages failed (predictably, duh); the only good news is that no one got killed. Now Eliezer is trying to lay some groundwork to mitigate the PR damage, in my humble opinion not very convincingly.
(ROT13: abcr, whfg xvqqvat.)
xrx