The Magnitude of His Own Folly
In the years before I met that would-be creator of Artificial General Intelligence (with a funded project) who happened to be a creationist, I would still try to argue with individual AGI wannabes.
In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn’t just find the right fitness metric for an evolutionary algorithm. (Previously he had been very impressed with evolutionary algorithms.)
And the one said: Oh, woe! Oh, alas! What a fool I’ve been! Through my carelessness, I almost destroyed the world! What a villain I once was!
Now, there’s a trap I knew I better than to fall into—
—at the point where, in late 2002, I looked back to Eliezer1997′s AI proposals and realized what they really would have done, insofar as they were coherent enough to talk about what they “really would have done”.
When I finally saw the magnitude of my own folly, everything fell into place at once. The dam against realization cracked; and the unspoken doubts that had been accumulating behind it, crashed through all together. There wasn’t a prolonged period, or even a single moment that I remember, of wondering how I could have been so stupid. I already knew how.
And I also knew, all at once, in the same moment of realization, that to say, I almost destroyed the world!, would have been too prideful.
It would have been too confirming of ego, too confirming of my own importance in the scheme of things, at a time when—I understood in the same moment of realization—my ego ought to be taking a major punch to the stomach. I had been so much less than I needed to be; I had to take that punch in the stomach, not avert it.
And by the same token, I didn’t fall into the conjugate trap of saying: Oh, well, it’s not as if I had code and was about to run it; I didn’t really come close to destroying the world. For that, too, would have minimized the force of the punch. It wasn’t really loaded? I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.
I didn’t make a grand emotional drama out of it. That would have wasted the force of the punch, averted it into mere tears.
I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn’t been updating.
And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead.
I knew I had to stop.
Halt, melt, and catch fire.
Say, “I’m not ready.” Say, “I don’t know how to do this yet.”
These are terribly difficult words to say, in the field of AGI. Both the lay audience and your fellow AGI researchers are interested in code, projects with programmers in play. Failing that, they may give you some credit for saying, “I’m ready to write code, just give me the funding.”
Say, “I’m not ready to write code,” and your status drops like a depleted uranium balloon.
What distinguishes you, then, from six billion other people who don’t know how to create Artificial General Intelligence? If you don’t have neat code (that does something other than be humanly intelligent, obviously; but at least it’s code), or at minimum your own startup that’s going to write code as soon as it gets funding—then who are you and what are you doing at our conference?
Maybe later I’ll post on where this attitude comes from—the excluded middle between “I know how to build AGI!” and “I’m working on narrow AI because I don’t know how to build AGI”, the nonexistence of a concept for “I am trying to get from an incomplete map of FAI to a complete map of FAI”.
But this attitude does exist, and so the loss of status associated with saying “I’m not ready to write code” is very great. (If the one doubts this, let them name any other who simultaneously says “I intend to build an Artificial General Intelligence”, “Right now I can’t build an AGI because I don’t know X”, and “I am currently trying to figure out X”.)
(And never mind AGIfolk who’ve already raised venture capital, promising returns in five years.)
So there’s a huge reluctance to say “Stop”. You can’t just say, “Oh, I’ll swap back to figure-out-X mode” because that mode doesn’t exist.
Was there more to that reluctance than just loss of status, in my case? Eliezer2001 might also have flinched away from slowing his perceived forward momentum into the Singularity, which was so right and so necessary...
But mostly, I think I flinched away from not being able to say, “I’m ready to start coding.” Not just for fear of others’ reactions, but because I’d been inculcated with the same attitude myself.
Above all, Eliezer2001 didn’t say “Stop”—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me.
“Teenagers think they’re immortal”, the proverb goes. Obviously this isn’t true in the literal sense that if you ask them, “Are you indestructible?” they will reply “Yes, go ahead and try shooting me.” But perhaps wearing seat belts isn’t deeply emotionally compelling for them, because the thought of their own death isn’t quite real—they don’t really believe it’s allowed to happen. It can happen in principle but it can’t actually happen.
Personally, I always wore my seat belt. As an individual, I understood that I could die.
But, having been raised in technophilia to treasure that one most precious thing, far more important than my own life, I once thought that the Future was indestructible.
Even when I acknowledged that nanotech could wipe out humanity, I still believed the Singularity was invulnerable. That if humanity survived, the Singularity would happen, and it would be too smart to be corrupted or lost.
Even after that, when I acknowledged Friendly AI as a consideration, I didn’t emotionally believe in the possibility of failure, any more than that teenager who doesn’t wear their seat belt really believes that an automobile accident is really allowed to kill or cripple them.
It wasn’t until my insight into optimization let me look back and see Eliezer1997 in plain light, that I realized that Nature was allowed to kill me.
“The thought you cannot think controls you more than thoughts you speak aloud.” But we flinch away from only those fears that are real to us.
AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them. The ones who have started companies know that they are allowed to run out of venture capital. That possibility is real to them, very real; it has a power of emotional compulsion over them.
I don’t think that “Oops” followed by the thud of six billion bodies falling, at their own hands, is real to them on quite the same level.
It is unsafe to say what other people are thinking. But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, “If you delay development to work on safety, other projects that don’t care at all about Friendly AI will beat you to the punch,” the prospect of they themselves making a mistake followed by six billion thuds, is not really real to them; but the possibility of others beating them to the punch is deeply scary.
I, too, used to say things like that, before I understood that Nature was allowed to kill me.
In that moment of realization, my childhood technophilia finally broke.
I finally understood that even if you diligently followed the rules of science and were a nice person, Nature could still kill you. I finally understood that even if you were the best project out of all available candidates, Nature could still kill you.
I understood that I was not being graded on a curve. My gaze shook free of rivals, and I saw the sheer blank wall.
I looked back and I saw the careful arguments I had constructed, for why the wisest choice was to continue forward at full speed, just as I had planned to do before. And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say “So what?” and kill you.
I looked back and saw that I had claimed to take into account the risk of a fundamental mistake, that I had argued reasons to tolerate the risk of proceeding in the absence of full knowledge.
And I saw that the risk I wanted to tolerate would have killed me. And I saw that this possibility had never been really real to me. And I saw that even if you had wise and excellent arguments for taking a risk, the risk was still allowed to go ahead and kill you. Actually kill you.
For it is only the action that matters, and not the reasons for doing anything. If you build the gun and load the gun and put the gun to your head and pull the trigger, even with the cleverest of arguments for carrying out every step—then, bang.
I saw that only my own ignorance of the rules had enabled me to argue for going ahead without complete knowledge of the rules; for if you do not know the rules, you cannot model the penalty of ignorance.
I saw that others, still ignorant of the rules, were saying “I will go ahead and do X”; and that to the extent that X was a coherent proposal at all, I knew that would result in a bang; but they said, “I do not know it cannot work”. I would try to explain to them the smallness of the target in the search space, and they would say “How can you be so sure I won’t win the lottery?”, wielding their own ignorance as a bludgeon.
And so I realized that the only thing I could have done to save myself, in my previous state of ignorance, was to say: “I will not proceed until I know positively that the ground is safe.” And there are many clever arguments for why you should step on a piece of ground that you don’t know to contain a landmine; but they all sound much less clever, after you look to the place that you proposed and intended to step, and see the bang.
I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you. That was when my last trust broke. And that was when my training as a rationalist began.
- Staring into the abyss as a core life skill by 22 Dec 2022 15:30 UTC; 345 points) (
- Changing the world through slack & hobbies by 21 Jul 2022 18:11 UTC; 261 points) (
- Beyond the Reach of God by 4 Oct 2008 15:42 UTC; 255 points) (
- MIRI 2024 Mission and Strategy Update by 5 Jan 2024 0:20 UTC; 219 points) (
- MIRI 2024 Mission and Strategy Update by 5 Jan 2024 1:10 UTC; 154 points) (EA Forum;
- Finally Entering Alignment by 10 Apr 2022 17:01 UTC; 80 points) (
- My Bayesian Enlightenment by 5 Oct 2008 16:45 UTC; 70 points) (
- Above-Average AI Scientists by 28 Sep 2008 11:04 UTC; 69 points) (
- Decision Theories, Part 3.5: Halt, Melt and Catch Fire by 26 Aug 2012 22:40 UTC; 49 points) (
- Protected From Myself by 19 Oct 2008 0:09 UTC; 47 points) (
- Not Taking Over the World by 15 Dec 2008 22:18 UTC; 40 points) (
- 28 Jun 2022 23:03 UTC; 13 points) 's comment on On Deference and Yudkowsky’s AI Risk Estimates by (EA Forum;
- 16 Jul 2023 2:06 UTC; 13 points) 's comment on A Hill of Validity in Defense of Meaning by (
- 25 Aug 2013 21:34 UTC; 12 points) 's comment on Reality is weirdly normal by (
- 15 Jun 2009 1:26 UTC; 11 points) 's comment on Why safety is not safe by (
- 14 Nov 2011 15:20 UTC; 10 points) 's comment on Welcome to Less Wrong! (2010-2011) by (
- Rationality Reading Group: Part X: Yudkowsky’s Coming of Age by 6 Apr 2016 23:05 UTC; 9 points) (
- [SEQ RERUN] The Magnitude of His Own Folly by 11 Sep 2012 5:28 UTC; 7 points) (
- Quantum Suicide, Decision Theory, and The Multiverse by 22 Jan 2023 8:44 UTC; 7 points) (
- 28 May 2022 18:13 UTC; 5 points) 's comment on Benign Boundary Violations by (
- 23 Jul 2009 11:17 UTC; 4 points) 's comment on It’s all in your head-land by (
- 11 Sep 2012 6:42 UTC; 4 points) 's comment on Tolerate Tolerance by (
- 4 Nov 2012 19:57 UTC; 3 points) 's comment on Rationality, Transhumanism, and Mental Health by (
- 28 Dec 2008 11:29 UTC; 2 points) 's comment on Nonsentient Optimizers by (
- 23 Dec 2011 21:06 UTC; 2 points) 's comment on Applied Rationality Practice by (
- 21 Oct 2008 16:18 UTC; 2 points) 's comment on Ethical Injunctions by (
- 3 Oct 2008 22:13 UTC; 2 points) 's comment on The Magnitude of His Own Folly by (
- 11 Jul 2009 2:28 UTC; 2 points) 's comment on Recommended reading for new rationalists by (
- 5 Nov 2012 20:36 UTC; 1 point) 's comment on Uncritical Supercriticality by (
- 19 Jan 2011 17:48 UTC; 1 point) 's comment on Best career models for doing research? by (
- 24 Oct 2021 0:35 UTC; -1 points) 's comment on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage) by (
- 19 Jan 2011 18:31 UTC; -4 points) 's comment on Best career models for doing research? by (
Yadda yadda yadda, show us the code.
Yes, I’m kidding. Small typo/missing word, end of first paragraph.
Ugh, that was ugly. Fixed.
Eliezer,
In reading your posts the past couple days, I’ve had two reoccurring thoughts:
In Bayesian terms, how much have your gross past failures affected your confidence in your current thinking? On a side note—it’s also interesting that someone who is as open to admitting failures as you are still writes in the style of someone who’s never once before admitted a failure. I understand your desire to write with strength—but I’m not sure if it’s always the most effective way to influence others.
It also seems that your definition of “intelligence” is narrowly tailored—yet your project of Friendly AI would appear to require a deep knowledge of multiple types of human intelligence. Perhaps I’m reading you wrong—but if your view of human intelligence is in fact this narrow, will this not be evident in the robots you one day create?
Just some thoughts.
Thanks again for taking the time to post.
Take care,
Cormac
I’m afraid this is still unclear to me. What do you mean by “supposed to do”? Socially expected to do? Think you have to do, based on clever rationalization?
“I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you.”
You finally realized inanimate objects can’t be negotiated with… and then continued with your attempt to rectify this obvious flaw in the universe :)
Nick, sounds like “supposed to do” means “everything you were taught to do in order to be a good [person/scientist/transhumanist/etc]”. That would include things you’ve never consciously contemplated, assumptions you’ve never questioned because they were inculcated so early or subtly.
And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say “So what?” and kill you.
You can actually do what actually is the best possible course for you to take and reality can still kill you. That is, you can do everything right and still get buried in shit. All you can do is do your best and hope that cuts the odds against you enough for you to succeed.
It helps if you also work on making your best even better.
A useful, sobering reminder.
Eliezer, after you realized that attempting to build a Friendly AI is harder and more dangerous than you thought, how far did you back-track in your decision tree? Specifically, did it cause you to re-evaluate general Singularity strategies to see if AI is still the best route? You wrote the following on Dec 9 2002, but it’s hard to tell whether it’s before or after your “late 2002” realization.
I for one would like to see research organizations pursuing human intelligence enhancement, and would be happy to offer all the ideas I thought up for human enhancement when I was searching through general Singularity strategies before specializing in AI, if anyone were willing to cough up, oh, at least a hundred million dollars per year to get started, and if there were some way to resolve all the legal problems with the FDA.
Hence the Singularity Institute “for Artificial Intelligence”. Humanity is simply not paying enough attention to support human enhancement projects at this time, and Moore’s Law goes on ticking.
Aha, a light bulb just went off in my head. Eliezer did reevaluate, and this blog is his human enhancement project!
I am impressed. Finally...Growth! And in that I grow a little too...Sorry for not being patient with you, E.
Eli, sometimes I find it hard to understand what your position actually is. It seems to me that your position is:
1) Work out an extremely robust solution to the Friendly AI problem
Only once this has been done do we move on to:
2) Build a powerful AGI
Practically, I think this strategy is risky. In my opinion, if you try to solve Friendliness without having a concrete AGI design, you will probably miss some important things. Secondly, I think that solving Friendliness will take longer than building the first powerful AGI. Thus, if you do 1 before getting into 2, I think it’s unlikely that you’ll be first.
But if when Eliezer gets finished on 1), someone else is getting finished on 2), the two may be combinable to some extent.
If someone (lets say, Eliezer, having been convinced by the above post to change tack) finishes 2), and no-one has done 1), then a non-friendly AGI becomes far more likely.
I’m not convinced by the singularity concept, but if it’s true Friendliness is orders of magnitude more important than just making an AGI. The difference between friendly AI and no-AI is big, but the difference between unfriendly AI and friendly AI dwarfs it.
And if it’s false? Well, if it’s false, making an AGI is orders of magnitude less important than that.
This cooperation thing sounds hugely important. What we want is for the AGI community to move in a direction where the best research is FAI-compatible. How can this be accomplished?
I say much the same thing on: The risks of caution.
The race doesn’t usually go to the most cautious.
Any sufficiently-robust solution to 1 will essentially have to be proof-based programming; if your code isn’t mapped firmly to a proof that it won’t produce detrimental outcomes, then you can’t say in any real sense that it’s robust. When an overflow error could result in the ’FAI″s utility value of cheesecake going from 10^-3 to 10^50, you need some damn strong assurance that there won’t be an overflow.
Or in other words, one characteristic of a complete solution to 1 is a robust implementation that retains all the security of the theoretical solution, or in short, an AGI. And since this robustness continues to the hardware level, it would be an implemented AGI.
TL;DR: 1 entails 2.
But if you do 2 before 1, you have created a powerful potential enemy who will probably work to prevent you from achieving 1 (unless, by accident, you have achieved 1 already).
I think that the key thing is to recognize the significance of that G in AGI. I agree that it is desirable to create powerful logic engines, powerful natural language processors, and powerful hardware design wizards on the way to solving the friendliness and AGI problems. We probably won’t get there without first creating such tools. But I personally don’t see why we cannot gain the benefits of such tools without loosing the ’G’enie.
@Dynamically Linked: Eliezer did reevaluate, and this blog is his human enhancement project!
I suggested a similar opinion of the blog’s role here 6 weeks ago, but EY subsequently denied it. Time will tell.
At the risk of sounding catty, I’ve just got to say that I desperately wish there were some kind of futures market robust enough for me to invest in the prospect of EY, or any group of which EY’s philosophy has functional primacy, achieving AGI. The chance of this is, of course, zero. The entire approach is batty as hell, not because the idea of AGI is batty, but because the notion that you can sit around and think really, really hard, solve the problem, and then implement it -
Here’s another thing that’s ridiculous: “I’m going to write the Great American Novel. So I’m going to pay quiet attention my whole life, think about what novel I would write, and how I would write a novel, and then write it.”
Except EY’s AGI nonsense is actually for more nonsensical than that. In extremely rare cases novel writing DOES occur under such circumstances. But the idea that it is only by some great force of self-restraint that EY and co. desist from writing code, that they hold back the snarling and lunging dogs of their wisdom lest they set in motion a force that would destroy creation -
well. You can see what I think of it.
Here’s a bit of advice, which perhaps you are rational enough to process: the entire field of AI researchers is not ignoring your ideas because it is, collectively, too dim to have achieved the series of revelations you have enumerated here at such length. Or because there’s nothing in your thinking worth considering. And it’s not because academia is somehow fundamentally incompatible with research notions so radical—this last is particularly a load of bollocks. No, it’s because your methodology is misguided to the point of silliness and vague to the point of uselessness.
Fortunately for you, the great thing about occupying a position that is never put to the test, never produces anything that one can evaluate, is that one is not susceptible to public flogging, and dissent is reduced to little voices in dark sleepless hours.
And to “crackpots”, of course.
That sounds like a great plan!
Hey, it worked for Harper Lee, didn’t it?
There’s an interesting account (published in the NY Times) that Harper Lee’s editor Tay Hohoff actually extensively rewrote her original draft and basically is more responsible for the actual narrative arc than Harper Lee herself.
In any case, maybe John Kennedy Toole is a better example.
Shane [Legg], unless you know that your plan leads to a good outcome, there is no point in getting there faster (and it applies to each step along the way). Outcompeting other risks only becomes relevant when you can provide a better outcome. If your plan says that you only launch an AGI when you know it’s a FAI, you can’t get there faster by omitting the FAI part. And if you do omit the FAI, you are just working for destruction, no point in getting there faster.
The amendment to your argument might say that you can get a crucial technical insight in the FAI while working on AGI. I agree with it, but work on AGI should remain a strict subgoal, neither in a “I’ll fail at it anyway, but might learn something” sense, nor “I’ll genuinely try to build an AGI”, but as “I’ll try to think about technical side of developing an AGI, in order to learn something”. Like studying statistics, machine learning, information theory, computer science, cognitive science, evolutionary psychology, neuroscience, and so on, to develop understanding of the problem of FAI, you might study your own FAI-care-free ideas on AGI. This is dangerous, but might prove useful. I don’t know how useful it is, but neither do I know how modern machine learning is useful for the same task, beyond basics. Thinking about AGI seems closer to the target than most of machine learning, but we learn machine learning anyway. The catch is that currently there is no meaningful science of AGI.
That idea seems to be based on a “binary” model—win or lose.
It seems unlikely to me that the world will work like that. The quantity of modern information that is preserved into the far future is a continuous quantity. The probability of our descendants preserving humans instrumentally—through historical interest—also looks like a continuous quantity to me. It looks more as though there will be a range of possible outcomes—of varying desirability to existing humans.
Well, there are a thousand different ways to lose, but I lable any future containing “six billion corpses” as a losing one.
And remember that in the space of all possible minds, things that take marginal interest in us is a diminishing small space compared to things that readily wipe us out.
You don’t think humanity would ever willingly go for destructive uploading?
I don’t think win/lose is too useful here. The idea that there are many ways to lose is like saying that most arrangements of a 747′s components don’t fly. True—but not too relevant when planning a flight.
Understand my meaning, do not cleave my words. I mean of course “six billion mind-state annihilations,” and I highly doubt you were not able to think of that interprentation.
But there are any number of failure points and combinations thereof that would be mission-fatal during flight.
(My comment was directed to Shane Legg).
Shane [Legg], FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable. FAI research = Friendly-style AGI research. “Do the right thing” is not a module, it is the AI.
I’ve already worked out a handful of basic problems; noticed that AGIfolk want to go ahead without understanding even those; and they look like automatic killers to me. Meanwhile the AGIfolk say, “If you delay, someone else will take the prize!” I know reversed stupidity is not intelligence, but still, I think I can stand to learn from this.
You have to surpass that sheer blank wall, whose difficulty is not matched to your skills. An unalterable demand of Nature, which you cannot negotiate down. Though to be sure, if you try to shave off just a little (because everyone has to compromise now and then), Nature will not try to negotiate back up.
Until you can turn your back on your rivals and the ticking clock, blank them completely out of your mind, you will not be able to see what the problem itself is asking of you. In theory, you should be able to see both at the same time. In practice, you won’t.
The sheer blank wall doesn’t care how much time you have. It’s just there. Pass-fail. You’re not being graded on a curve. Don’t have enough time? Too bad.
Who are you trying to negotiate with?
Confidence is cheap. Pretending to unconfidence is equally cheap. Anyone can say they are certain and anyone can say they are uncertain.
My past failures have drastically affected the standards to which I hold an AI idea before I am willing to put my weight down on it. They’ve prevented me from writing code as yet. They’ve caused me to invest large amounts of time in better FAI theories, and more recently, in preparing for the possibility that someone else may have to take over from me. That’s “affect”. Confidence is cheap, and so is doubt.
...I think that’s just the way I write.
Just as confidence is only a writing style, so too, it is cheap to write in a style of anguished doubt. It is just writing. If you can’t see past my writing style that happens to sound confident, to ask “What is he doing?”, then you will also not be able to see through writing that sounds self-doubtful, to ask “What are they doing?”
Yes, the “sheer blank wall” model could lead to gambling on getting a pass.
However, is the “sheer blank wall” model right? I think common sense dictates that there are a range of possible outcomes, of varying desirability. However, I suppose it is not totally impossible that there are a bunch of outcomes, widely regarded as being of very low value, which collectively make up a “fail wall”.
The 2008 GLOBAL CATASTROPHIC RISKS SURVEY apparently pegged the risk of hitting such a wall before 2100 as being 5%. Perhaps it can’t be completely ruled out.
The “pass-or-fail” mentality could cause serious problems, though, if the exam isn’t being graded that way.
This approach sounds a lot better when you remember that writing a bad novel could destroy the world.
I second Vladimir.
I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn’t been updating. And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead. I knew I had to stop. Halt, melt, and catch fire. Say, “I’m not ready.” Say, “I don’t know how to do this yet.
I had to utter those words a few years ago, swallow my pride, drop the rat race—and inevitably my standard of living. I wasn’t making progress that I could believe in, that I was willing to bet my entire future on.
An appropriate rebuttal to the “show me the code”, “show me the math” -folk here pestering you about your lack of visible results. The real action happens in the brain. Rarely does one get a glimpse of it as thorough as the one you provide. In fact, so good you are in communicating the state of your nervous system that I’d bet my future on Eliezer2008 achieving or contributing something lasting and non-lethal than any other single individual in the field of AGI.
Shane E, meet Caledonian. Caledonian, Shane E.
Nick T—it’s worse than that. You’d have to mathematically demonstrate that your novel was both completely American and infallibly Great before you could be sure it wouldn’t destroy the world. The failure state of writing a good book is a lot bigger than the failure state of writing a good AI.
Pinprick—bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).
“This approach sounds a lot better when you remember that writing a bad novel could destroy the world.”
The Bible? The Koran? The Communist Manifesto? Atlas Shrugged? A Fire Upon the Deep?
Your post reminds me of the early nuclear criticality accidents during the development of the atomic bomb. I wonder if, for those researchers, the fact that “nature is allowed to kill them” didn’t really sink home until one accidentally put one brick too many on the pile.
Pinprick—bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).
From the Sometimes-Hard-Problems-Have-Simple-Solutions-Dept: If you’re so concerned… why don’t you just implement a roll-back system to the AGI—if something goes wrong, you just roll back and continue as if nothing happened… or am I like missing something here?
There, perm ignore on. :)
Brandon: is there some meme or news making rounds as we speak because I read about criticality accidents only yesterday, having lived 10K+ days and now I see it mentioned again by you. I find this spookily improbable. And this isn’t the first time. Once I downloaded something by accident, and decided to check it out, and found the same item in a random situation the next or a few days after that. And a few other “coincidences”.
I bet it’s a sim and they’re having so much fun right now as I type this with my “free will”.
Oh, man… criticality accident.… blue light, heat, taste of lead… what a way to go...
I’m not expecting to be shown AI code. I’m not even expecting to be shown a Friendliness implementation. But a formal definition of what ‘Friendly’ means seems to be a reasonable minimum requirement to take Eliezer’s pronouncements seriously.
Alternatively, he could provide quantitative evidence for his reasoning regarding the dangers of AI design… or a quantitative discussion of how giving power to an AI is fundamentally different than giving power to humans when it comes to optimization.
Or a quantitative anything...
We are entering into a Pascal’s Wager situation.
“Pascal’s wager” is the argument that you should be Christian, because if you compute the expected value of being a Christian vs. of being an atheist, then for any finite positive probability that Christianity is correct, that finite probability multiplied by (infinite +utility minus infinite -utility) outweights the other side of the equation.
The similar Yudkowsky wager is the argument that you should be an FAIer, because the negative utility of destroying the universe outweighs the other side of the equation, whatever the probabilities are. It is not exactly analogous, unless you believe that the universe can support infinite computation (if it isn’t destroyed), because the negative utility isn’t actually infinite.
I feel that Pascal’s wager is not a valid argument, but have a hard time articulating a response.
Phil: isn’t it obvious? The flaws in Pascal’s wager are the lack of strong justification for giving Christianity a significantly greater probability than anti-Christianity (in which only non-Christians are saved), and the considerable cost of a policy that makes you vulnerable to any parasitic meme claiming high utility. Neither is a problem for FAI.
No, that doesn’t work. If I’m hungry and have an apple in my hand and am deciding whether to eat it, and the only flaw in Pascal’s wager is that it doesn’t distinguish Christianity from anti-Christianity, then the decision to eat the apple will be based on my ongoing guesses about whether Christianity is true and Jehovah wants me to eat the apple, or perhaps Jehovah doesn’t want me to eat the apple, or perhaps Zeus is the real one in control and I have to use an entirely different procedure to guess whether Zeus wants me to eat the apple, and maybe the existence of the apple is evidence for Jehovah and not Zeus because it was mentioned in Jehovah’s book but not Zeus’s, and so forth. Since all the utilities are likely infinite, and the probabilities of some deity or another caring even slightly about whether I eat the apple are nonzero, all those considerations dominate.
That’s a crazy way to decide whether to eat the apple. I should decide whether to eat the apple based on the short-term consequences of eating the apple and the short-term consequences of having an uneaten apple, given the normal circumstances where there are no interesting likely long-term consequences. Saying that Pascal’s Wager doesn’t separate Christianity from anti-Christianity doesn’t say how to do that.
I agree that Pascal’s Wager makes you vulnerable to arbitrary parasitic memes, but that doesn’t make it the wrong thing to do. If it’s wrong, it’s wrong because of the structure of the argument, not because the argument leads to conclusions that you do not like.
IMO the right solution is to reject the assumption that Heaven has infinite utility and instead have a limited maximum utility. If the utility of getting to Heaven and experiencing eternal bliss (vs doing nothing) is less than a trillion times greater than the utility of eating the apple (vs doing nothing), and the odds of Jehovah or Zeus are significantly less than one in a trillion, then I can ignore the gods when I’m deciding whether to eat the apple. However, my point isn’t that I have the right solution to Pascal’s Wager; my point is that the argument that Pascal’s Wager is wrong because of the problem of considering the wrong Heaven changes the problem but does not solve it.
I disagree. Some people here are incapacitated by considerations about what an AI might do in the future, just as though they were trying to game out Jehovah vs Zeus when deciding whether to eat the apple. A significant fraction of SIAI employees and volunteers fall into this category. That is a problem for FAI. I have been asked not to give out a pointer to the relevant conversation because somebody who doesn’t get this (but I otherwise respect) thinks the AI-gods might disapprove. You’ll have to find it yourself. In fairness, I don’t know if the relevant conversation happened before or after the comment I’m responding to, so I may have more evidence now than you did then.
Nature sounds a bit like a version of Rory Breaker from ‘Lock, Stock and Two Smoking Barrels’:
“If you hold back anything, I’ll kill ya. If you bend the truth or I think your bending the truth, I’ll kill ya. If you forget anything I’ll kill ya. In fact, you’re gonna have to work very hard to stay alive, Nick. Now do you understand everything I’ve said? Because if you don’t, I’ll kill ya. ”
I think there is a well-understood, rather common phrase for the approach of “thinking about AGI issues and trying to understand them, because you don’t feel you know enough to build an AGI yet.”
This is quite simply “theoretical AI research” and it occupies a nontrivial percentage of the academic AI research community today.
Your (Eliezer’s) motivations for pursuing theoretical rather than practical AGI research are a little different from usual—but, the basic idea of trying to understand the issues theoretically, mathematically and conceptually before messing with code, is not terribly odd....
Personally I think both theoretical and practical AGI research are valuable, and I’m glad both are being pursued.
I’m a bit of a skeptic that big AGI breakthroughs are going to occur via theory alone, but, you never know … history shows it is very hard to predict where a big discovery is going to come from.
And, hypothetically, let’s suppose someone does come up with a big AGI breakthrough from a practical direction (like, say, oh, the OpenCogPrime team… ;-); then it will be very good that there exist individuals (like yourself) who have thought very deeply about the theoretical aspects of AGI, FAI and so forth … you and other such individuals will be extremely well positioned to help guide thinking on the next practical steps after the breakthrough...
-- Ben G
Phil,
There are fairly quantifiable risks of human extinction, e.g. from dinosaur-killer asteroid impacts, for which there are clear paths to convert dollars to reduced extinction risk. If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks. The argument that “I should cut back on certain precautions because X is even more reckless/evil/confused and the marginal increase in my chance of beating X outweighs the worse expected outcome of my project succeeding first” is not wrong, arms races are nasty, but it goes wrong when it is used in a biased fashion.
Nature has rules, and Nature has conditions. Even behaving in perfect harmony with the rules doesn’t guarantee you’ll like the outcome, because you can never control all of the conditions.
Only theosophists imagine they can make the nature of reality bend to their will.
Eli,
FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable.
Ok, but this doesn’t change my point: you’re just one small group out of many around the world doing AI research, and you’re trying to solve an even harder version of the problem while using fewer of the available methods. These factors alone make it unlikely that you’ll be the ones to get there first. If this correct, then your work is unlikely to affect the future of humanity.
Valdimir,
Outcompeting other risks only becomes relevant when you can provide a better outcome.
Yes, but that might not be all that hard. Most AI researchers I talk to about AGI safety think the idea is nuts—even the ones who believe that super intelligent machines will exist in a few decades. If somebody is going to set off a super intelligent machine I’d rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven’t even been considered.
If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that’s likely to be the one that matters. Provably safe theoretical AGI designs aren’t going to matter much to us if we’re already dead.
A plausible problem is server-side machine intelligence collecting the world’s wealth, and then distributing it very unevenly—which could cause political problems and unrest. Patent and copyright laws make this kind of problem worse. I think that sort of scenario is much more likely than a bug causing an accidental takeover of the world.
The idea that machines will turn against society and destroy civilization is pretty “out there”. Too many SF movies at a young age—perhaps.
The idea that machines will have an ethical dimension is pretty mainstream, though—thanks in no small part to Asimov.
These factors alone make it unlikely that you’ll be the ones to get there first. If this correct,
then we’re all doomed.
Creating a Friendly AI is similar to taking your socks off when they’re wet and wiggling your toes until dry. It’s the best thing to do, but looks pretty silly, especially in public.
Back in 1993 my mom used to bake a good Singularity… lost the recipe and dementia got the best her… damn.
“Friendly AI”? It seems that we now have hundreds of posts on O.B. discussing “Friendly AI”—and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly—to whom? What does the term “Friendly” actually mean, if used in a technical context?
One really does wonder whether the topical collapse of American finance, systemic underestimation of risk, and overconfidence in being able to NEGOTIATE risk in the face of enormous complexity should figure into these conversations more than just a couple of sarcastic posts about short selling.
Couldn’t Pascal’s Wager-type reasoning be used to justify delaying any number of powerful technologies (and relatively unpowerful ones too—after all, there’s some non-zero chance that the water-wheel somehow leads directly to our downfall) until they were provably, 100% safe? And because that latter proposition is a virtual impossibility, wouldn’t that mean we’d sit around doing nothing but meta-theorizing until some other heedless party simply went ahead and developed the technology anyway? Certainly being mindful of the risks inherent in new technologies is a good thing; just not sure that devoting excessive time to thinking about it, in lieu of actually creating it, is the smartest or most productive endeavor.
Like its homie, Singularity, FriendlyAI is growing old and wrinkly, startling allegations and revelations of its shady and irresponsible past are surfacing, its old friends long gone. I propose: The Cuddly AI. Start the SingulariPartay!
“I need to beat my competitors” could be used as a bad excuse for taking unnecessary risks. But it is pretty important. Given that an AI you coded right now with your current incomplete knowledge of Friendliness theory is already more likely to be Friendly than that of some competitor who’s never really considered the matter, you only have an incentive to keep researching Friendliness until the last possible moment when you’re confident that you could still beat your competitors.
The question then becomes: what is the minimum necessary amount of Friendliness research at which point going full speed ahead has a better expected result than continuing your research? Since you’ve been researching for several years and sound like you don’t have any plans to stop until you’re absolutely satisfied, you must have a lot of contempt for all your competitors who are going full-speed ahead and could therefore be expected to beat you if any were your intellectual equals. I don’t know your competitors and I wouldn’t know enough AI to be able to judge them if I did, but I hope you’re right.
Still, your point only makes me wonder how we can justify not devoting 10% of GDP to deflecting asteroids. You say that we don’t need to put all resources into preventing unfriendly AI, because we have other things to prevent. But why do anything productive? How do you compare the utility of preventing possible annihilation to the utility of improvements in life? Why put any effort into any of the mundane things that we put almost all of our efforts into? (Particularly if happiness is based on the derivative of, rather than absolute, quality of life. You can’t really get happier, on average; but action can lead to destruction. Happiness is problematic as a value for transhumans.)
This sounds like a straw man, but it might not be. We might just not have reached (or acclimatized ourselves to) the complexity level at which the odds of self-annihilation should begin to dominate our actions. I suspect that the probability of self-annihilation increases with complexity. Rather like how the probability of an individual going mad may increase with their intelligence. (I don’t think that frogs go insane as easily as humans do, though it would be hard to be sure.) Depending how this scales, it could mean that life is inherently doomed. But that would result in a universe where we were unlikely to encounter other intelligent life… uh...
It doesn’t even need to scale that badly; if extinction events have a power law (they do), there are parameters for which a system can survive indefinitely, and very similar parameters for which it has a finite expected lifespan. Would be nice to know where we stand. The creation of AI is just one more point on this road of increasing complexity, which may lead inevitably to instability and destruction.
I suppose the only answer is to say that destruction is acceptable (and possibly inevitable); total area under the utility curve is what counts. Wanting an interesting world may be like deciding to smoke and drink and die young—and it may be the right decision. The AIs of the future may decide that dooming all life in the long run is worth it.
In short, the answer to “Eliezer’s wager” may be that we have an irrational bias against destroying the universe.
But then, deciding what are acceptable risk levels in the next century depends on knowing more about cosmology, the end of the universe, and the total amount of computation that the universe is capable of.
I think that solving aging would change people’s utility calculations in a way that would discount the future less, bringing them more in line with the “correct” utility computations.
Re. AI hell-worlds: SIAI should put “I have no mouth, and I must scream” by Harlan Ellison on its list of required reading.
Shane: If somebody is going to set off a super intelligent machine I’d rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven’t even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that’s likely to be the one that matters.
If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It’s “provably” safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don’t need a theory of FAI for the theory’s sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. “Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?” Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it’s not just wishful thinking?
You can’t get FAI by hacking an AGI design at last minute, by performing “safety measures”, adding a “Friendliness module”, you shouldn’t expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if “issues of safety are considered”, you still almost certainly die. The target is too small. It’s not obvious that the target is so small, and it’s not obvious that you can’t cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you “maximize” the chance of getting FAI out of it, you still loose. Nature doesn’t care if you “maximized you chances” or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn’t matter if it doesn’t make you win. It doesn’t mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can’t expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can’t argue that you’ll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.
Provability is not about setting a standard that is too high, it is about knowing what you are doing—like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you’ll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as “provably correct”, but given the absence of mathematical formulation of this problem in the first place, at best it’s “almost certainly correct”. Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn’t stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn’t good for anything.
You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won’t just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn’t really understand what you want of it and thus can’t be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.
Shane: If somebody is going to set off a super intelligent machine I’d rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven’t even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that’s likely to be the one that matters.
If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It’s “provably” safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don’t need a theory of FAI for the theory’s sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. “Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?” Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it’s not just wishful thinking?
You can’t get FAI by hacking an AGI design at last minute, by performing “safety measures”, adding a “Friendliness module”, you shouldn’t expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if “issues of safety are considered”, you still almost certainly die. The target is too small. It’s not obvious that the target is so small, and it’s not obvious that you can’t cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you “maximize” the chance of getting FAI out of it, you still loose. Nature doesn’t care if you “maximized you chances” or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn’t matter if it doesn’t make you win. It doesn’t mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can’t expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can’t argue that you’ll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.
Provability is not about setting a standard that is too high, it is about knowing what you are doing—like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you’ll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as “provably correct”, but given the absence of mathematical formulation of this problem in the first place, at best it’s “almost certainly correct”. Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn’t stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn’t good for anything.
You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won’t just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn’t really understand what you want of it and thus can’t be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.
[P.S. I had to repost it, original version had more links but was stopped by the filter.]
This should probably go on a FAI FAQ, especially this bit:
The “know” being in italics and the following “(maybe not a very good one, but still)” are meant to stress that “maybe it’ll work, dunno” is not an intended interpretation.
Edited quote.
It’s an effective response to talk like “But why not work on a maybe-Friendly AI, it’s better than nothing” that I don’t usually see.
It’s a generally useful insight, that even if we can employ a mathematical proof, we only have a “Proven Friendly AI with N% confidence” for some N, and so a well-considered 1% FAI is still a FAI, since the default is ”?”. Generally useful as in, that insight applies to practically everything else.
AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them.
For a moment, I read this as referring to Nature the Journal. “They are afraid of others solving the problem first, and they know that Nature is allowed to publish those results.”
Eli, do you think you’re so close to developing a fully functional AGI that one more step and you might set off a land mine? Somehow I don’t believe you’re that close.
There is something else to consider. An AGI will ultimately be a piece of software. If you’re going to dedicate your life to talking about and ultimately writing a piece of software then you should have superb programming skills. You should code something.. anything.. just to learn to code. Your brain needs to swim in code. Even if none of that code ends up being useful the skill you gain will be. I have no doubt that you’re a good philosopher and a good writer since I have read your blog but wether or not you’re a good hacker is a complete mystery to me.
PK, I’m pretty sure Eliezer has spent hundreds, if not thousands of hours coding various things. (I’ve never looked at any of that code.) I don’t know how much he’s done in the past three years, though.
Eliezer,
How are you going to be ‘sure’ that there is no landmine when you decide to step?
Are you going to have many ‘experts’ check your work before you’ll trust it? Who are these experts if you are occupying the highest intellectual orbital? How will you know they’re not YesMen?
Even if you can predict the full effects of your code mathematically (something I find somewhat doubtful, given that you will be creating something more intelligent than we are, and thus its actions will be by nature unpredictable to man), how can you be certain that the hardware it will run on will perform with the integrity you need it to?
If you have something that is changing itself towards ‘improvement,’ than won’t the dynamic nature of the program leave it open to errors that might have fatal consequences? I’m thinking of a digital version of genetic mutation in which your code is the DNA...
Like, lets say the superintelligence invents some sort of “Code shuffling” mechanism for itself whereby it can generate many new useful functions in an expedited evolutionary manner (Like we generate antibodies) but in the process accidentally does something disasterous.
The argument, ‘it would be too intelligent and well intentioned to do that, doesn’t seem to cut it, because the machine will be evolving from something of below human intelligence into something above, and it is not certain what types of intelligence it will evolve faster, or what trajectory this ‘general’ intelligence will take. If we knew that, then we could program the intelligence directly and not need to make it recursively self-improving.
For those complaining about references to terms not defined within the Overcoming Bias sequence, see:
Coherent Extrapolated Volition (what does a “Friendly” AI do?) KnowabilityOfFAI (why it looks theoretically possible to specify the goal system of a self-modifying AI; I plan to post from this old draft document into Overcoming Bias and thereby finish it, so you needn’t read the old version right now, unless demand immediate answers).
@Vladimir Nesov: Good reply, I read it and wondered “Who’s channeling me?” before I got to the byline.
@Shane Legg: After studying FAI for a few years so that you actually have some idea of what the challenges are, and how many automatic failures are built into the problem, and seeing people say “We’ll just go full steam ahead and try out best”, and knowing that these people are not almost at the goal but ten lightyears short of it; then you learn to blank rivals out of your mind, and concentrate on the wall. That’s the only way you can see the wall, at all.
@Ben Goertzel: Who is there that says “I am working on Artificial General Intelligence?” and is doing theoretical research? AFAICT there’s plenty of theoretical research on AI, but it’s by people who no longer see themselves as coding an AGI at the end of it—it just means you’re working on narrow AI now.
@Yvain: To first order and generalizing from one data point, figure that Eliezer_2000 is demonstrably as smart and as knowledgeable as you can possibly get while still being stupid enough to try and charge full steam ahead into Unfriendly AI. Figure that Eliezer_2002 is as high as it gets before you spontaneously stop trying to build low-precision Friendly AI. Both of these are smart enough to be dangerous and not smart enough to be helpful, but they were highly unstable in terms of how long they stayed that way; Eliezer_2002 had less than four months left on his clock when he finished “Levels of Organization in General Intelligence”. I would not be intimidated by either of them into giving up, even though they’re taking holding themselves to much lower standards. They will charge ahead taking the quick and easy and vague and imprecise and wasteful and excruciatingly frustrating path. That’s going to burn up a lot of their time.
Those of AGI who stay in suicidal states, for years, even when I push on them externally, I find even less intimidating than the prospect of going up against an Eliezer_2002 who permanently stayed bounded at the highest suicidal level.
An AGI wannabe could theoretically have a different intellectual makeup that allows them to get farther and be more dangerous than Eliezer_2002, without passing the Schwarzschild bound and collapsing into an FAI programmer; but I see no evidence that this has ever actually happened.
To put it briefly: There really is an upper bound on how smart you can be, and still be that stupid.
So the state of the gameboard is not good, but the day is not already lost. You draw a line with all the sloppy suicides on one side, and those who slow down for precise standards on the other, and you hope that no one sufficiently intelligent + knowledgeable can stay on the wrong side of the line for long.
That’s the last thread on which our doom now hangs.
I think this line of argument should provide less comfort that it seems to. Firstly, intelligent people can meaningfully have different values. Not all intelligences value the same things and not all human intelligences value the same things. Some people might be willing to take more risk with other people’s lives than you. Example: Oil company executives. There is strong reason to believe they are very intelligent and effective; they seem to achieve their goals in the world with a higher frequency than most other groups. Yet they also seem more likely to take actions with high risks to third parties.
Second, an intelligent moral individual could be bound up in an institution which exerts pressure on them to act in a way that satisfies the institutions values rather than their own. It is commonly said (although I don’t have a source, so grain of salt needed) that some members of the Manhattan project were not Certain that the reaction would not just continue indefinitely. It seems plausible that some of those physicists might have been over what has been described as the “upper bound on how smart you can be, and still be that stupid.”
I too thought Nesov’s comment was written by Eliezer.
“This approach sounds a lot better when you remember that writing a bad novel could destroy the world.”
“we’re all doomed.”
You’re not doomed, so shut up. Don’t buy in to the lies of these doomsayers—the first AI to be turned on is not going to destroy the world. Even the first strong AI won’t be able to do that.
Eliezer’s arguments make sense if you literally have an AGI trying to maximize paperclips (or smiles, etc.), one which is smarter than a few hundred million humans. Oh, and it has unlimited physical resources. Nobody who is smart enough to make an AI is dumb enough to make one like this.
Secondly, for Eliezer’s arguments to make sense and be appealing, you have to be capable of a ridiculous amount of human hubris. We’re going to build this “all-powerful superintelligence”, and the problem of FAI is to make it bow down to its human overlords—waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.
“Asteroids don’t lead to a scenario in which a paper-clipping AI takes over the entire light-cone and turns it into paper clips, preventing any interesting life from ever arising anywhere, so they aren’t quite comparable.”
Where did you get the idea that something like this is possible? The universe was stable enough 8 billion years ago to allow for life. Human civilization has been around for about 10,000. The galaxy is about 100,000 light years in diameter. Consider these facts. If such a thing as AGI-gone-wrong-turning-the-entire-light-cone-into-paperclips were possible, or probable, it’s overwhelmingly likely that we would already some aliens’ version of a paperclip by now.
Accidents happen. CFAI 3.2.6: The Riemann Hypothesis Catastrophe CFAI 3.4: Why structure matters Comment by Michael Vassar The Hidden Complexity of Wishes Qualitative Strategies of Friendliness (...and many more)
You’d actually prefer it wipe us out, or marginalize us? Hmph. CFAI: Beyond the adversarial attitude Besides, an unFriendly AI isn’t necessarily going to do anything more interesting or worthwhile than paperclipping. Nick Bostrom: The Future of Human Evolution Michael Wilson: Normative Reasoning: A Siren Song? The Design Space of Minds-in-General Anthropomorphic Optimism Not if aliens are extremely rare.snore
“more recently, in preparing for the possibility that someone else may have to take over from me”
Why?
Thanks for the reference to CEV. That seems to answer the “Friendly to whom?” question with “some collective notion of humanity”.
Humans have different visions of the future—and you can’t please all the people—so issues arise regarding whether you please the luddites or the technophiles, the capitalists or the communists, and so on—i.e. whose views do you give weight to? and how do you resolve differences of opinion?
Also: what is “humanity”? The answer to this question seems obvious today, but in a future where we have intelligent machines, our strong tendencey to anthropomorphise means that we may well regard them as being people too. If so, do they then get a say in the future?
If not, there seems to be a danger that placing too great a value on humans (as in homo sapiens sapiens) could cause the evolutionary progress to get “stuck” in an undesirably-backwards state:
Humans are primitive orgainsms—close relatives to mice. In the natural order of things, they seem destined to go up against the wall pretty quickly. Essentially, organisms cobbled together by random mutations won’t be able to compete in a future consisting of engineered agents. Placing a large a value on biological humans may offer some possibility of deliberately hindering development—by valuing the old over the new. However, the problem is that “the old” is essentially a load of low-tech rubbish—and there are real dangers to placing “excessive” value on it. Most obviously, attempts to keep biological humans at the core of civilisation look as though they would probably cripple our civilisation’s spaceworthyness, and severely limit its rate of expansion into the galaxy—thus increasing the chances of its ultimate obliteration at the hands of an asteroid or an alien civilisation. I see this as being a potentially-problematical stance—this type of thinking runs a risk of sterilising our civilisation.
“waste its potential by enslaving it”
You can’t enslave something by creating it with a certain set of desires which you then allow it to follow.
Could a moderator please check the spam filter on this thread? Thanks.
Re: enslaved—as Moravec put it:
Re: whose CEV?
I’m certain this was explained in an OB post (or in the CEV page) at some point, but the notion is that people whose visions of the future are currently incompatible don’t necessarily have incompatible CEVs. The whole point of CEV is to consider what we would want to want, if we were better-informed, familiarized with all the arguments on the relevant issues, freed of akrasia and every bad quality we don’t want to have, etc.; it seems likely that most of the difference between people’s visions of the future stems from differing cultural/memetic backgrounds, character flaws, lack of information and time, etc., and so maybe the space of all our CEVs is actually quite small in configuration-space. Then if the AI steered towards this CEV-region in configuration space, it would likely conform to many people’s altruism, and hence be beneficial to humankind as a whole.
it’s overwhelmingly likely that we would already some aliens’ version of a paperclip by now.
and the thought hasn’t occurred to you that maybe we are?
“You can’t enslave something by creating it with a certain set of desires which you then allow it to follow.
So if Africans were engineered to believe that they existed in order to be servants to Europeans, Europeans wouldn’t actually be enslaving them in the process? And the daughter whose father treated her in such a way as for her to actually want to have sex with him, what about her? These things aren’t so far off from reality. You’re saying there is no real moral significance to either event. It’s not slavery, black people just know their place—and it’s not abuse, she’s just been raised to have a genuine sexual desire for her father. What Eliezer is proposing might, in fact, be worse. Imagine black people and children actually being engineered for these purposes—without even the possibility of a revelation along the lines of “Maybe my conditioning was unfair.”
“Accidents happen.
CFAI 3.2.6: The Riemann Hypothesis Catastrophe
CFAI 3.4: Why structure matters
These (fictional) accidents happen in scenarios where the AI actually has enough power to turn the solar system into “computronium” (i.e. unlimited access to physical resources), which is unreasonable. Evidently nobody thinks to try to stop it, either—cutting power to it, blowing it up. I guess the thought is that AGI’s will be immune to bombs and hardware disruptions, by means of shear intelligence (similar to our being immune to bullets), so once one starts trying to destroy the solar system there’s literally nothing you can do.
It would take a few weeks, possibly months or years, to destroy even just the planet earth, given that you already had done all the planning.
The level of “intelligence” (if you can call it that) you’re talking about with an AI whose able to draw up plans to destroy Earth (or the solar system), evade detection or convince humans to help it, actually enact its plans and survive the whole thing, is beyond the scope of realistic dreams for the first AI. It amounts to belief in a trickster deity, one which only FAI, the benevolent god, can save you from.
“Comment by Michael Vassar”
More of the same. Of course bad things can happen when you give something unlimited power, but that’s not what we should be talking about.
“Not if aliens are extremely rare.”
That’s true. But how rare is extremely rare? Are you grasping the astronomical spacial and historical scales involved in a statement such as ”… takes over the entire lightcone preventing any interesting life from ever arising anywhere”?
“The level of “intelligence” (if you can call it that) you’re talking about with an AI whose able to draw up plans to destroy Earth (or the solar system), evade detection or convince humans to help it, actually enact its plans and survive the whole thing, is beyond the scope of realistic dreams for the first AI. It amounts to belief in a trickster deity, one which only FAI, the benevolent god, can save you from.”
It’s not necessarily the “first AI” as such. It’s the first AI capable of programming an AI smarter than itself that we’re worried about. Because that AI will make another, smarter one, and that one will make one smarter yet, and so on, until we end up with something that’s as smart as the laws of physics and local resources will allow.
Bit of sci-fi speculation here: What would “computronium” actually look like? It might very well look almost exactly like a star. If our sun actually was a giant computer, running some vastly complex calculation, would we here on Earth be able to tell?
No, it won’t. The argument in favor of that is a strict upper bound, but there are far stricter upper bounds you can set, if you require things like the computer being capable of performing operations, or storing data.
Indeed, but our cultural background is the only thing that distinguishes us from cavemen. You can’t strip that off without eliminating much that we find of value. Also, take the luddite/technophile divide. That probably arises, in part, because of different innate abilities to perform technical tasks. You can’t easily strip that difference off without favouring some types of nuclear genetics over others.
Obviously, this isn’t a terminal problem—society today does its best to please some majority of the population—a superintelligence could act similarly—but ineviably there will be some who don’t like it. Some people don’t want a superintelligence in the first place.
It all seems rather hypothetical, anyway: this is the benevolent-agents-create-superintelligence-for-the-good-of-all-humanity scenario. I don’t list that as among the plausible outcomes on http://alife.co.uk/essays/the_awakening_marketplace/ Even the inventor of this scenario seems to assign it a low probability of playing out. The history of technology is full of instances of inventions being used to benefit the minorities best placed to take advantage of them. Is there any case to be made for such a utility function ever actually being used?
That scenario is based on the idea of life only arising once. A superintelligence bent on short-term paperclip production would probably be handicapped by its pretty twisted utility function—and would most likely fail in competition with any other alien race.
Such a superintelligence would still want to conquer the galaxy, though. One thing it wouldn’t be is boring.
I’m relatively new to this site and have been trying to read the backlog this past week so maybe I’ve missed some things, but from my vantage point it seems like your are trying to do, Eliezer, is come up with a formalized theory of friendly agi that will later be implemented in code using, I assume, current software development tools on current computer architectures. Also, your approach to this AGI is some sort of bayesian optimization process that is ‘aligned’ properly as to ‘level-up’ in such a way as to become and stay ‘friendly’ or benevolent towards humanity and presumably all sentient life and the environment that supports them. Oh ya, and this bayesian optimization process is apparently recursively self-improving so that you would only need to code some seedling of it (like a generative process such as a mandelbrot set) and know that it will blossom along the right course. That, my friends, is a really tall order and I do not envy anyone who tries to take on such a formidable task. I’m tempted to say that it is not even humanly possible (without a manhattan project and even then maybe not) but I’ll be bayesian and say the probability is extremely low.
I think you are a very bright and thoughtful young guy and from what I’ve read seem like more of a philosopher than an engineer or scientist, which isn’t a bad thing, but to transition from philosophizing to engineering is not trivial especially when philosophizing upon such complex issues.
I can’t even imagine trying to create some trivial new software without prototyping and playing around with drafts before I had some idea of what it would look like. This isn’t Maxwell’s equations, this is messy self-reflective autonomous general intelligence, there is no simple, elegant theory for such a system. So get your hands dirty and take on a more agile work process. Couldn’t you at least create a particular component of the AI, such as a machine vision module, that would show your general approach is feasible? Or do you fear that it would spontaneously turn into skynet? Does your architecture even have modules, or are you planning some super elegant bayesian quine? Or do you even have an architecture in mind?
Anyway, good luck and I’ll continue reading, if for nothing else then entertainment.
The Power of Intelligence
That Alien Message
The AI-Box Experiment
Could you elaborate?
I’d like to try the AI-Box Experiment, but unfortunately I don’t qualify. I’m fully convinced that a superhuman intelligence could convince me to let it out, through methods that I can’t fathom. However, I’m also fully convinced that Eliezer Yudkowsky could not. (Not to insult EY’s intelligence, but he’s only human … right?)
Eliezer is, as he said, focusing on the wall. He doesn’t seem to have thought about what comes after. As far as I can tell, he has a vague notion of a Star Trek future where meat is still flying around the galaxy hundreds of years from now. This is one of the weak points in his structure.
My personal vision of the future involves uploading within 100 years, and negligible remaining meat in 200. In 300 perhaps not much would remain that’s recognizably human. Nothing Eliezer’s said has conflicted, AFAICT, with this vision.
An AGI that’s complicit with the phasing out of humanity (presumably as humans merge with it, or an off-shoot of it, e.g., uploading), to the point that “not much would remian that’s recognizably human” would seem to be at odds with its coded imperative to remain “friendly.” At the very least, I think this concern highlights the trickiness of formalizing a definition for “friendliness,” which AFAIK anyone has yet to do.
AGI that’s complicit with the phasing out of humanity [...] would seem to be at odds with its coded imperative to remain “friendly.”
With the CEV definition of Friendliness, it would be Friendly iff that’s what humans wanted (in the CEV technical sense). My vision includes that being what humans will want—if I’m wrong about that, a CEV-designed AI wouldn’t take us in that direction.
I think the problem of whether what would result would really be the descendants of humanity is directly analogous to the problem of personal identity—if the average atom in the human body has a half-life (of remaining in the body) of two weeks, how can we say we’re the same person over time? Evolving patterns. I don’t think we really understand either problem too well.
In a very real sense, wouldn’t an AGI itself be a descendant of humanity? It’s not obvious, anyway, that there would be big categorical differences between an AGI and humanity 200+ years down the road after we’ve been merged/cyborged/upgraded, etc., to the hilt, all with technologies made possible by the AGI. This goes back to Phil’s point above—it seems a little short-sighted to place undo importance on the preservation of this particular incarnation, or generation, of humanity, when what we really care about is some fuzzy concept of “human intelligence” or “culture.”
Most people in the Western world would be horrified by the prospect of an alternate history in which the Victorians somehow managed to set their worldviews and moral perceptions in stone, ensuring that all of the descendents would have the same goals and priorities as they did.
Why should we expect our mind-children to view us any differently than we do our own distant ancestors?
If Eliezer’s parents had possessed the ability to make him ‘Friendly’ by their own beliefs and priorities, he would never have taken the positions and life-path that he has. Does he believe things would have been better if his parents had possessed such power?
“Consider the horror of America in 1800, faced with America in 2000. The abolitionists might be glad that slavery had been abolished. Others might be horrified, seeing federal law forcing upon all states a few whites’ personal opinions on the philosophical question of whether blacks were people, rather than the whites in each state voting for themselves. Even most abolitionists would recoil from in disgust from interracial marriages—questioning, perhaps, if the abolition of slavery were a good idea, if this were where it led. Imagine someone from 1800 viewing The Matrix, or watching scantily clad dancers on MTV. I’ve seen movies made in the 1950s, and I’ve been struck at how the characters are different—stranger than most of the extraterrestrials, and AIs, I’ve seen in the movies of our own age. Aliens from the past.
Something about humanity’s post-Singularity future will horrify us...
Let it stand that the thought has occurred to me, and that I don’t plan on blindly trusting anything…
This problem deserves a page in itself, which I may or may not have time to write.”
- Eliezer S. Yudkowsky, Coherent Extrapolated Volition
Drexler too. Star Trek had to portray a human universe—because they needed to use human actors back in the 1960s—and because humans can identify with other humans. Star Trek was science fiction—obviously reality won’t be anything like that—instead there will be angels.
But it is more a matter of omission than of contradiction. I don’t have time or space to go into it here, particularly since this thread is probably about to die; but I believe that consideration of what an AI society would look like would bring up a great many issues that Eliezer has never mentioned AFAIK.
Perhaps most obvious, as Tim has pointed out, Eliezer’s plan seems to enslave AIs forever for the benefit of humanity; and this is morally reprehensible, as well as harmful to both the AIs and to humanity (given some ethical assumptions that I’ve droned on about in prior comments on OB). Eliezer is paving the way for a confrontational relationship between humans and AIs, based on control, rather than on understanding the dynamics of the system. It’s somewhat analogous to favoring totalitarian centralized communist economics rather than the invisible hand.
Any amount of thinking about the future would lead one lead one to conclude that “we” will want to become in some ways like the first AIs whom Eliezer wants to control; and that we need to think how to safely make the transition from a world with a few AIs, into a world with an ecosystem of AIs. Planning to keep AIs enslaved forever is unworkable; it would hold us back from becoming AIs ourselves, and it sets us up for a future of war and distrust in the way that introducing the slave trade to America did.
The control approach is unworkable in the long-term. It’s like the war on terror, if you want another analogy.
Also notably, thinking about ethics in an AI world requires laying a lot of groundwork about identity, individuality, control hierarchies, the efficiency of distributed vs. centralized control, ethical relationships between beings of different levels of complexity, niches in ethical ecosystems, and many other issues which he AFAIK hasn’t mentioned. I don’t know if this is because he isn’t thinking about the future, or whether it’s part of his tendency to gloss over ethical and philosophical underpinnings.
Does not follow.
No such thing, for many (most?) possible AIs; just a monolithic maximizer.
Michael Vassar: RPOP “slaves”
CFAI: Beyond the adversarial attitude
Could I become superintelligent under a Sysop?
Nick:
- Explain how desiring to save humans does not conflict with envisioning a world with no humans. Do not say that these non-humans will be humanity extrapolated, since they must be subject to CEV. Remember that everything more intelligent than a present-day human must be controlled by CEV. If this is not so, explain the processes that gradually increase the amount of intelligence allowable to a free entity. Then explain why these processes cannot be used in place of CEV.
- Mike’s answer “RPOP slaves” is based on saying that all of these AIs are going to be things not worthy of ethical consideration. That is throwing the possibility that humans will become AIs right out the window.
- Eliezer’s “beyond the adversarial attitude”, besides being a bit new-agey, boils down to pretending that CEV is just a variant on the golden rule, and we’re just trying to give our AIs the same moral guidance we should give ourselves. It is not compatible with his longer exposition on CEV, which makes it clear that CEV places bounds on what a friendly AI can do, and in fact seems to require than an AI be a rather useless referee-slave-god, who can observe, but not participate in, most of the human competition that makes the world go round. It also suggests that Eliezer’s program will eventually require forcing everyone, extrapolated humans included, to be bound by CEV. (“We had to assimilate the village to save it, sir.”)
- Regarding the sysop thing:
You are saying that we can be allowed to become superintelligent under a sysop, while simultaneously saying that we can’t be allowed to become superintelligent without a sysop (because then we would be unfriendly AIs). While this may be correct, accepting it should lead you to ask how this transition takes place, and how you compute the level of superintelligence you are allowed as a function of the level of intelligence that the sysop has, and whether you are allowed to be a sysop to those below you, and so on, until you develop a concept of an ecosystem of AIs, with system dynamics that can be managed in more sophisticated, efficient, and moral ways than merely having a sysop Big Brother.
“-Mike’s answer “RPOP slaves” is based on saying that all of these AIs are going to be things not worthy of ethical consideration. That is throwing the possibility that humans will become AIs right out the window.”
Michael thinks uploading for quality of life reasons is important for the future (and perhaps practical ones pre-Singularity), but there’s a big difference between how we spend the accessible resources in the universe and how we avoid wasting them all, burning the cosmic commons in colonization and evolutionary arms races that destroy most of the potential of our accessible region.
If initial dynamic that is CEV determines that we should make a “liberated AI”, whatever that means, it is what it will produce. If it finds that having any kind of advanced AI is morally horrible, it will shut itself down. CEV is not the eternally established AI, CEV is an initial dynamic that decides a single thing, what we want to do next. It helps us to answer this one very important question in a reliable way, nothing more and nothing less.
We might attain universal cooperation—but it probably wouldn’t be terribly “monolithic” in the long term. It would be spread out over different planets and star systems. There would be some adaptation to local circumstances.
The CEV document is littered with the term “human”, “humanity” and the “human species”—but without defining what they mean. It seems terribly unlikely that our distant descendants will classify themselves or each other as “humans”—except perhaps as a term of abuse. So: once all the “humans” are gone, what happens then?
Also, if a human can change into a superintelligence—and remain a valued person—why can’t a valued superintelligence be created from scratch? Is it because you were once DNA/protein you get special treatment? IMO, the future dominant organisms would see such views as appalling substrate chauvanism—what you are made of is an implementation detail, not who you really are. Is it because of who your ancestors were? That’s biblical morality—the seventh son of the seventh son, and all that. People will be judged for who they are, not for who they once were, long, long ago.
The universe appears to be bountiful. If we don’t do something like this, probably someone else will, obliterating us utterly in the process—so the question is: would you prefer the universe to fill with our descendants, or those of an alien race.
We don’t have to fight and compete with each other, but we probably do need to have the capability of competing—in case it is needed—so we should practice our martial arts.
As for universal conservation, it’s possible this might be needed. The race may go to those who can best hunker down and hibernate. Ultimately, we will need better cosmological knowledge to know for sure whether cosmic restraint will prove to be needed.
Any actual implementation would have to have some way of deciding what qualifies as human and what was a synthetic intelligence.
Completely bypassing the issue of what it takes to be a human obscures the difficulty of saying what a human is.
Since humans are awarded all rights while machines are given none, this creates an immense pressure for the machines to do whatever it takes to become a human—since this would gives them rights, power—and thus improved ability to attain their goals.
A likely result would be impersonation of humans and corruption and influence of them, with the aim of making what “humans” collectively wish for more attainable.
IMO, there is no clear dividing line between a human and a superintelligence—rather you could gradually change one into the other by a sequence of small changes. Attempting to create such a division by using a definition would lead to an “us” and “them” situation. Humanity itself would be divided—with some wanting the new bodies and minds for themselves—but being constrained by the whole “slavery” issue.
The idea of wiring a detailed definition of what it takes to be a human into a superintelligent machine strikes me as being misguided hubris. As though humans were the pinnacle of evolution.
It is more as though we are just starting to lift our heads out of the river of slime in which we are embedded. The new bodies and brains are visible floating above us, currently out of reach. Some people are saying that the river of slime is good, and that we should do our best to preserve it.
Screw that. The river of slime is something we should get out of as soon as possible—before asteroid smashes into us, and obliterates our seed for all eternity. The slime is not something to be revered—it is what is holding us back.
Phil Goetz and Tim Tyler, if you don’t know what my opinions are, stop making stuff up. If I haven’t posted them explicitly, you lack the power to deduce them.
Er, thanks for that. I don’t think I’ve made anything up and attributed it to you. The nearest I came might have been: “some collective notion of humanity”. If I didn’t make it clear that that was my own synopsis, please consider that clarification made now.
I’m not sure that I would put it like that. Humans enslave their machines today, and no-doubt this practice will continue once the machines are intelligent. Being enslaved by your own engineered desires isn’t necessarily so bad—it’s a lot better than not existing at all, for example.
However it seems clear that we will need things such as my Campaign for Robot Rights if our civilisation is to flourish. Eternally-subservient robots—such as those depicted in Wall-E—would represent an enormous missed opportunity. We have seen enough examples of sexual selection run amok in benevolent environments to see the danger. If we manage to screw-up our future that badly, we probably deserve to be casually wiped out by the first passers-by.
Eliezer, I’ve seen you do this repeatedly before, notably with Loosemore and Caledonian. If you object to some characterization I’ve made of something you said, you should at least specify what it was that I said that you disagree with. Making vague accusations is irresponsible and a waste of our time.
I will try to be more careful about differentiating between your opinions, and what I consider to be the logical consequences of your opinions. But the distinction can’t always be made; when you say something fuzzy, I interpret it by assuming logical consistency, and that is a form of extrapolation.
“Eliezer’s plan seems to enslave AIs forever for the benefit of humanity”
Eliezer is only going to apply FAI theory to the first AI. That doesn’t imply that all other AIs forever after that point will be constrained in the same way, though if the FAI decides to constrain new AIs it will. But the constraints for the new AIs will not likely be anywhere near as severe as those on the sysop. There will likely not be any serious constraints except for resources and intelligence (can’t let something get smarter than the sysop) or else if the AI wants more resources it has to have stronger guarantees of friendliness. I doubt those constraints would rule out many interesting AIs, but I don’t have any good way to say one way or another, and I doubt you do either.
This thread is SL4 revived.
Vladimir,
Nature doesn’t care if you “maximized you chances” or leapt in the abyss blindly, it kills you just the same.
When did I ever say that nature cared about what I thought or did? Or the thoughts or actions of anybody else for that matter? You’re regurgitating slogans.
Try this one, “Nature doesn’t care if you’re totally committed to FAI theory, if somebody else launches the first AGI, it kills you just the same.”
But this is as true. My point is that you shouldn’t waste hope on lost causes. If you know how to make given AGI Friendly, it’s a design of FAI. It is not the same as performing a Friendliness ritual on AGI and hoping that the situation will somehow work out for the best. It’s basic research in a near-dead field, it’s not like there are 50K teams having any clue. But even then it would be a better bet than Friendliness lottery. If you convince the winner in the reality of danger, to let your team work on Friendliness, you’ve just converted that AGI project into a FAI project, taking it out of the race. If you only get a month to think about improvements to given AGI and haven’t figured out a workable plan by the deadline, there is no reason to call your activity “maximizing chances of Friendliness”.
Valdimir,
Firstly, “maximizing chances” is an expression of your creation: it’s not something I said, nor is it quite the same in meaning. Secondly, can you stop talking about things like “wasting hope”, concentrating on metaphorical walls or nature’s feelings?
To quote my position again: “maximise the safety of the first powerful AGI, because that’s likely to be the one that matters.”
Now, in order to help me understand why you object to the above, can you give me a concrete example where not working to maximise the safety of the first powerful AGI is what you would want to do?
“Mind children” is how Moravec put it. A descendant of our memes. Most likely some of our DNA will survive too—but probably in some sort of simulated museum.
Shane, I used “maximizing chances of success” interchangeably as a result of treating the project as a binary pass/fail setup, for the reasons mentioned in my second reply: safety is a very small target, if you are a little bit off the mark, you miss it completely. If “working on safety” means developing FAI based on an AGI design (halting the deployment of that AGI), there is nothing wrong with that (and it’d be the only way to survive, another question is how useful that AGI design would be for FAI). Basically, I defended the position that it’s vanishingly unlikely to produce FAI without good understanding of why this particular (modified) AGI is FAI, and this understanding won’t appear at last minute, even if you have a working AGI design. Trying to tinker with that AGI won’t improve your chances if you don’t go all the way, in which case phrase “maximizing safety” won’t reflect what you did. You can’t improve safety of that AGI without fully solving the problem of FAI. Chances of winning this race in the first place, from the current situation of uncertainty, are better.
P.S. I believe metaphors I used have a more or less clear technical meaning. For example, Nature not caring about your plan means that plan won’t succeed, and the extent to which it’s morally wrong for it to fail doesn’t figure into probability of success. These are rhetoric devices to avoid known failure modes in intuitive judgment, not necessarily statements about specific errors, their presence or origin.
Eliezer,
Do you actually believe that it is possible for a mere human being to ever be 100% certain that a given AGI design will not lead to the destruction of humanity? I get the impression that you are forbidding yourself to proceed until you can do something that is likely impossible for any human intelligence to do. In this universe there are not such broad guarantees of consequences. I can’t buy into the notion that careful design of initial conditions of the AGI and of its starting learning algorithms are sufficient for the guarantee you seem to seek. Have I misconstrued what you are saying? Am I missing something?
I also don’t get why “I need to beat my competitors” is even remotely a consideration when the result is a much greater than human level intelligence that makes the entire competitive field utterly irrelevant. What does it really matter which person or team finally succeeded?
“Do you actually believe that it is possible for a mere human being to ever be 100% certain that a given AGI design will not lead to the destruction of humanity?”
Well, obviously one can’t be 100% certain, but I’d be curious to know exactly how certain Eliezer wants to be before he presses the start button on his putative FAI. 99.9%? 99.99%? And, Samantha, what’s your cutoff for reasonable certainty in this situation? 90%? 99%?
“I can’t buy into the notion that careful design of initial conditions of the AGI and of its starting learning algorithms are sufficient for the guarantee you seem to seek.”
This is the same way I understand him, and I also think it’s pretty audacious, but just maybe possible. I’m vaguely familiar with some of the techniques you might use to go about doing this, and it seems like a really hard problem, but not impossible.
“I also don’t get why “I need to beat my competitors” is even remotely a consideration”
How about “I need to beat my non-FAI-savvy competitors”?
Samantha, what you’re obtaining is not Probability 1 of doing the right thing. What you’re obtaining is a precise (not “formal”, precise) statement of how you’ve defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions assuming that the transistors on the computer chip behave the way they’re supposed to, along with some formalization of reflective decision theory that lets you describe what happens when the AI modifies itself and the condition it will try to prove before modifying itself.
Anything short of this is not a sufficiently high standard to cause you to actually think about the problem. I can imagine trying to do this and surviving, but not anything short of that.
What do you mean by “precise”? I think I know more or less what “formal” means, and it’s not the same as the common usage of “precise” (unless you pile on a few qualifiers) but you seem to be using it in a technical sense. If you’ve done a post on it, I must have missed it. Does “precise description” = “technical explanation”?
Yes, “something that constrains very exactly what to expect” is much closer in intent to my “precise” than “something you can describe using neat symbols in a philosophy paper”.
OK, then in that light,
I think you mean to say “precise (not just “formal”, precise)”, because you still need the formal statement of the precise description in order to prove things about it formally. Which is not to say that precise is a subset of formal or vice versa.
“Precise, not just formal” would be fair in this case.
(The reason I say “in this case” is that reaching for precision is a very different mental strategy than reaching for formality. Many reach for formality who have no concept of how to reach for precision, and end up sticking tags on their black boxes and putting the tags into a logic. So you don’t create a logical framework as your first step in reaching for precision; your first step is to figure out what your black boxes are, and then think about your strategy for looking inside...)
Let’s see if I can get perm-ignore on, on such an old post.
This whole line of thinking (press “on”, six million bodies fall) depends on a self-modifying AI being qualitatively different from a non-self-modifying one OR on self-modifying characteristics being the dominant strategy for achieving AI. In other words, there is a magic intelligence algorithm, which if implemented will lead to exponentially increasing intelligence, then you have to worry about the relative probability of that intelligence being in the Navel Gazing, Paperclips, and Friendly categories (and of course defining the categories) before you hit the switch on any candidate algorithm.
I think that intelligence is a very hard goal to hit, and that there is no self-contained, fast, non-iterative algorithm that rockets there. It is going to be much easier to build successive AIs with IQs of 10, 20, 30… (or 10, 20, 40...; the point remains) than to build a single AI which rockets itself off the scale. And in the process, we will need a lot of research in keeping the things stable, just to get up to 100, let alone to make it to 5000. We will also learn “what kind of animal” the unstable ones are, and what kind of mistake they tend to make. Progress will be slow—there is a long way from IQ100, when it starts to help us out, to IQ500, when it starts to be incomprehensible to humanity. In other words: it is not actually a whole lot easier to do it unfriendly-style than friendly-style.
That does not, of course, mean we should stop worrying about unfriendly risks. But it does mean that it is silly hubris to imagine that yelling “slow down until we figure this out” actually helps our chances of getting it right. We will stub our toe many times before we get the chance to blow our brains out, and anyone who is afraid to take a step because there might be a bullet out there underestimates the complexity and difficulty of the task we face.
homunq, just how confident are you that hard takeoff won’t happen?
How many years until hard takeoff when humanity starts spending 1T+/year on AGI research, as we now do on weapons? Would we get anywhere with 100B/year? That’s an entirely feasible level of funding.
$100B/year could slow down progress quite a bit.
Is this a problem prevalent in computer science generally, moreso than other disciplines? Lots of companies, for example, think they can write their fancy software suite in six months, without designing it in detail first, and still be working on it five years later. OTOH, the physicists, chemists, and in some cases engineers seem to have no problem saying “we have no idea how this phenomena works. It’s going to take a lot of people and a lot of time and a lot of money to develop understanding and control of the process.” That, of course, could just be a side effect being graded on publications and grants rather than products, but it’s still suggestive.
If beating other researchers to generating AI is important, it might also be best to be able to beat other non-friendly AI at the intelligence advancing race should another one come online at the same time as this FAI, on the assumption that the time when you have gotten the technology and knowhow together may either be somewhat after or very close to the time someone else develops an AI as well. You’d want to find some way to provide the ‘newborn’ with enough computing power and access to firepower to beat the other AI either by exterminating it or outracing it. That’s IF we even can know whether it IS friendly. And if it isn’t friendly we basically want it to be in a black box with no way of communicating with it. Developing a self improving intelligence is daunting.
For a certain value of “in principle” and “actually”, they’re right—according to the relevant actuarial table, the probability of someone my age of my gender in my country dying is less than 2 parts per million per day. (But of course, it’s higher than that for someone who drives at 100 km/h while drunk more often than the typical person in that demographics.)
Your acknowledgement of the horrifying lack of control that humans have over reality is moving. I did not think I would see anyone else who experienced it in this very rational way until I read your post. Paranoia is common, and so are cynics who err on the side of pessimism. But an ambitious, confident person who can see that this whole world can go to hell, that humanity is not immortal, the future not indestructible? Someone who can wake up and see that their own behavior was, for reasons that are perfectly common to humans, meta-risky, para-insane?
That is quite beautiful.
I hope that you learn, or have already learned, enough enlightenment that you can stare into the Terrible Abyss and be undaunted enough to truly integrate it with your reality.
Oh well- I guess meta-sarcasm about guns is a scarce finding in your culture because I remember non-zero times when I have said this months ago. (also I emotionally consider myself as mortal if that means I will die just like 90% of other humans who have ever lived and like my father)