Fighting a Rearguard Action Against the Truth
When we last left Eliezer2000, he was just beginning to investigate the question of how to inscribe a morality into an AI. His reasons for doing this don’t matter at all, except insofar as they happen to historically demonstrate the importance of perfectionism. If you practice something, you may get better at it; if you investigate something, you may find out about it; the only thing that matters is that Eliezer2000 is, in fact, focusing his full-time energies on thinking technically about AI morality; rather than, as previously, finding an justification for not spending his time this way. In the end, this is all that turns out to matter.
But as our story begins—as the sky lightens to gray and the tip of the sun peeks over the horizon—Eliezer2001 hasn’t yet admitted that Eliezer1997 was mistaken in any important sense. He’s just making Eliezer1997′s strategy even better by including a contingency plan for “the unlikely event that life turns out to be meaningless”...
...which means that Eliezer2001 now has a line of retreat away from his mistake.
I don’t just mean that Eliezer2001 can say “Friendly AI is a contingency plan”, rather than screaming “OOPS!”
I mean that Eliezer2001 now actually has a contingency plan. If Eliezer2001 starts to doubt his 1997 metaethics, the Singularity has a fallback strategy, namely Friendly AI. Eliezer2001 can question his metaethics without it signaling the end of the world.
And his gradient has been smoothed; he can admit a 10% chance of having previously been wrong, then a 20% chance. He doesn’t have to cough out his whole mistake in one huge lump.
If you think this sounds like Eliezer2001 is too slow, I quite agree.
Eliezer1996-2000′s strategies had been formed in the total absence of “Friendly AI” as a consideration. The whole idea was to get a superintelligence, any superintelligence, as fast as possible—codelet soup, ad-hoc heuristics, evolutionary programming, open-source, anything that looked like it might work—preferably all approaches simultaneously in a Manhattan Project. (“All parents did the things they tell their children not to do. That’s how they know to tell them not to do it.” John Moore, Slay and Rescue.) It’s not as if adding one more approach could hurt.
His attitudes toward technological progress have been formed—or more accurately, preserved from childhood-absorbed technophilia—around the assumption that any/all movement toward superintelligence is a pure good without a hint of danger.
Looking back, what Eliezer2001 needed to do at this point was declare an HMC event—Halt, Melt, and Catch Fire. One of the foundational assumptions on which everything else has been built, has been revealed as flawed. This calls for a mental brake to a full stop: take your weight off all beliefs built on the wrong assumption, do your best to rethink everything from scratch. This is an art I need to write more about—it’s akin to the convulsive effort required to seriously clean house, after an adult religionist notices for the first time that God doesn’t exist.
But what Eliezer2001 actually did was rehearse his previous technophilic arguments for why it’s difficult to ban or governmentally control new technologies—the standard arguments against “relinquishment”.
It does seem even to my modern self, that all those awful consequences which technophiles argue to follow from various kinds of government regulation, are more or less correct—it’s much easier to say what someone is doing wrong, than to say the way that is right. My modern viewpoint hasn’t shifted to think that technophiles are wrong about the downsides of technophobia; but I do tend to be a lot more sympathetic to what technophobes say about the downsides of technophilia. What previous Eliezers said about the difficulties of, e.g., the government doing anything sensible about Friendly AI, still seems pretty true. It’s just that a lot of his hopes for science, or private industry, etc., now seem equally wrongheaded.
Still, let’s not get into the details of the technovolatile viewpoint. Eliezer2001 has just tossed a major foundational assumption—that AI can’t be dangerous, unlike other technologies—out the window. You would intuitively suspect that this should have some kind of large effect on his strategy.
Well, Eliezer2001 did at least give up on his 1999 idea of an open-source AI Manhattan Project using self-modifying heuristic soup, but overall...
Overall, he’d previously wanted to charge in, guns blazing, immediately using his best idea at the time; and afterward he still wanted to charge in, guns blazing. He didn’t say, “I don’t know how to do this.” He didn’t say, “I need better knowledge.” He didn’t say, “This project is not yet ready to start coding.” It was still all, “The clock is ticking, gotta move now! The Singularity Institute will start coding as soon as it’s got enough money!”
Before, he’d wanted to focus as much scientific effort as possible with full information-sharing, and afterward he still thought in those terms. Scientific secrecy = bad guy, openness = good guy. (Eliezer2001 hadn’t read up on the Manhattan Project and wasn’t familiar with the similar argument that Leo Szilard had with Enrico Fermi.)
That’s the problem with converting one big “Oops!” into a gradient of shifting probability. It means there isn’t a single watershed moment—a visible huge impact—to hint that equally huge changes might be in order.
Instead, there are all these little opinion shifts… that give you a chance to repair the arguments for your strategies; to shift the justification a little, but keep the “basic idea” in place. Small shocks that the system can absorb without cracking, because each time, it gets a chance to go back and repair itself. It’s just that in the domain of rationality, cracking = good, repair = bad. In the art of rationality it’s far more efficient to admit one huge mistake, than to admit lots of little mistakes.
There’s some kind of instinct humans have, I think, to preserve their former strategies and plans, so that they aren’t constantly thrashing around and wasting resources; and of course an instinct to preserve any position that we have publicly argued for, so that we don’t suffer the humiliation of being wrong. And though the younger Eliezer has striven for rationality for many years, he is not immune to these impulses; they waft gentle influences on his thoughts, and this, unfortunately, is more than enough damage.
Even in 2002, the earlier Eliezer isn’t yet sure that Eliezer1997′s plan couldn’t possibly have worked. It might have gone right. You never know, right?
But there came a time when it all fell crashing down. To be continued.
- Use curiosity by 25 Feb 2011 22:23 UTC; 85 points) (
- Somewhat against “just update all the way” by 19 Feb 2023 10:49 UTC; 29 points) (
- Blatant lies are the best kind! by 3 Jul 2019 20:45 UTC; 28 points) (
- 28 Jun 2022 23:03 UTC; 13 points) 's comment on On Deference and Yudkowsky’s AI Risk Estimates by (EA Forum;
- Rationality Reading Group: Part X: Yudkowsky’s Coming of Age by 6 Apr 2016 23:05 UTC; 9 points) (
- [SEQ RERUN] Fighting a Rearguard Action Against the Truth by 6 Sep 2012 4:45 UTC; 6 points) (
- 28 Feb 2009 11:01 UTC; 3 points) 's comment on The Most Important Thing You Learned by (
- 6 Sep 2012 15:53 UTC; 0 points) 's comment on [SEQ RERUN] Fighting a Rearguard Action Against the Truth by (
- 31 Mar 2010 1:32 UTC; -3 points) 's comment on Newcomb’s problem happened to me by (
Are there any sources of more information on this convulsive effort that adult religionists go through upon noticing the lack of God?
Maybe people have an instinct to preserve their former strategies, because doing so often works. If you find out a new fact, you don’t usually have to abandon your whole set of beliefs. Are view shattering facts/arguments more common for abstract issues?
I find it funny that HMC just happens to be the acronym for my college.
On the other hand, I find this new style of writing intriguing. I think it will make a lot of sense in the book :)
I’m an RHIT alum. Boo!
Me too. I find it much more readable and enjoyable than the majority of previous posts.
Now I understand why you didn’t like my ideas back in 2002, they were somewhat like the ideas you had just rejected.
I’d just come at them for completely different reasons. I’ve stuck with them with a fair bit of refinement though, mainly because they are the only way I can see for a system to be able to try to win, under circumstances where the universe cares how and what you compute.
Post I’d like to read: Eliezer’s Chrono-Conference Call With His Various Previous Selves.
You could even have Eliezer-2018 make an appearance towards the end. Oh, and please write it in the style of the GEB dialogues.
He has discovered that there is no chocolate at the store, but since he is in the car already, why not still head there and check the baked goods?