The importance of the Fermi paradox is that it is the only data we can analyze that would come close to some empirical criticism of a Paperclip maximizer and general risks from superhuman AI’s with non-human values without working directly on AGI to test those hypothesis ourselves. If you accept the premise that life is not unique and special then one other technological civilisation in the observable universe should be sufficient to leave observable (now or soon) traces of technological tinkering. Due to the absence of any signs of intelligence out there, especially paperclippers burning the cosmic commons, we can conclude that unfriendly AI might not be the most dangerous existential risk that we should look for.
...every point you listed was addressed multiple times in the FOOM debate and in the sequences.
I believe there probably is an answer, but it is buried under hundreds of posts about marginal issues. All those writings on rationality, there is nothing I disagree with. Many people know about all this even outside of the LW community. But what is it that they don’t know that EY and the SIAI knows? What I was trying to say is that if I have come across it then it was not convincing enough to take it as serious as some people here obviously do.
It looks like that I’m not alone. Goertzel, Hanson, Egan and lots of other people don’t see it as well. So what are we missing, what is it that we haven’t read or understood?
Here is a very good comment by Ben Goertzel that pinpoints it:
This is what discussions with SIAI people on the Scary Idea almost always come down to!
The prototypical dialogue goes like this.
SIAI Guy: If you make a human-level AGI using OpenCog, without a provably Friendly design, it will almost surely kill us all.
Ben: Why?
SIAI Guy: The argument is really complex, but if you read Less Wrong you should understand it
Ben: I read the Less Wrong blog posts. Isn’t there somewhere that the argument is presented formally and systematically?
SIAI Guy: No. It’s really complex, and nobody in-the-know had time to really spell it out like that.
No. It’s really complex, and nobody in-the-know had time to really spell it out like that.
Actually, you can spell out the argument very briefly. Most people, however, will immediately reject one or more of the premises due to cognitive biases that are hard to overcome.
A brief summary:
Any AI that’s at least as smart as a human and is capable of self-improving, will improve itself if that will help its goals
The preceding statement applies recursively: the newly-improved AI, if it can improve itself, and it expects that such improvement will help its goals, will continue to do so.
At minimum, this means any AI as smart as a human, can be expected to become MUCH smarter than human beings—probably smarter than all of the smartest minds the entire human race has ever produced, combined, without even breaking a sweat.
INTERLUDE: This point, by the way, is where people’s intuition usually begins rebelling, either due to our brains’ excessive confidence in themselves, or because we’ve seen too many stories in which some indefinable “human” characteristic is still somehow superior to the cold, unfeeling, uncreative Machine… i.e., we don’t understand just how our intuition and creativity are actually cheap hacks to work around our relatively low processing power—dumb brute force is already “smarter” than human beings in any narrow domain (see Deep Blue, evolutionary algorithms for antenna design, Emily Howell, etc.), and a human-level AGI can reasonably be assumed capable of programming up narrow-domain brute forcers for any given narrow domain.
And it doesn’t even have to be that narrow or brute: it could build specialized Eurisko-like solvers, and manage them at least as intelligently as Lenat did to win the Travelller tournaments.
In short, human beings have a vastly inflated opinion of themselves, relative to AI. An AI only has to be as smart as a good human programmer (while running at a higher clock speed than a human) and have access to lots of raw computing resources, in order to be capable of out-thinking the best human beings.
And that’s only one possible way to get to ridiculously superhuman intelligence levels… and it doesn’t require superhuman insights for an AI to achieve, just human-level intelligence and lots of processing power.
The people who reject the FAI argument are the people who, for whatever reason, can’t get themselves to believe that a machine can go from being as smart as a human, to massively smarter in a short amount of time, or who can’t accept the logical consequences of combining that idea with a few additional premises, like:
It’s hard to predict the behavior of something smarter than you
Actually, it’s hard to predict the behavior of something different than you: human beings do very badly at guessing what other people are thinking, intending, or are capable of doing, despite the fact that we’re incredibly similar to each other.
AIs, however, will be much smarter than humans, and therefore very “different”, even if they are otherwise exact replicas of humans (e.g. “ems”).
Greater intelligence can be translated into greater power to manipulate the physical world, through a variety of possible means. Manipulating humans to do your bidding, coming up with new technologies, or just being more efficient at resource exploitation… or something we haven’t thought of. (Note that pointing out weaknesses in individual pathways here doesn’t kill the argument: there is more than one pathway, so you’d need a general reason why more intelligence doesn’t ever equal more power. Humans seem like a counterexample to any such general reason, though.)
You can’t control what you can’t predict, and what you can’t control is potentially dangerous. If there’s something you can’t control, and it’s vastly more powerful than you, you’d better make sure it gives a damn about you. Ants get stepped on, because most of us don’t care very much about ants.
Note, by the way, that this means that indifference alone is deadly. An AI doesn’t have to want to kill us, it just has to be too busy thinking about something else to notice when it tramples us underfoot.
This is another inferential step that is dreadfully counterintuitive: it seems to our brains that of course an AI would notice, of course it would care… what’s more important than human beings, after all?
But that happens only because our brains are projecting themselves onto the AI—seeing the AI thought process as though it were a human. Yet, the AI only cares about what it’s programmed to care about, explicitly or implicitly. Humans, OTOH, care about a ton of individual different things (the LW “a thousand shards of desire” concept), which we like to think can be summarized in a few grand principles.
But being able to summarize the principles is not the same thing as making the individual cares (“shards”) be derivable from the general principle. That would be like saying that you could take Aristotle’s list of what great drama should be, and then throw it into a computer and have the computer write a bunch of plays that people would like!
To put it another way, the sort of principles we like to use to summarize our thousand shards are just placeholders and organizers for our mental categories—they are not the actual things we care about… and unless we put those actual things in to an AI, we will end up with an alien superbeing that may inadvertently wipe out things we care about, while it’s busy trying to do whatever else we told it to do… as indifferently as we step on bugs when we’re busy with something more important to us.
So, to summarize: the arguments are not that complex. What’s complex is getting people past the part where their intuition reflexively rejects both the premises and the conclusions, and tells their logical brains to make up reasons to justify the rejection, post hoc, or to look for details to poke holes in, so that they can avoid looking at the overall thrust of the argument.
While my summation here of the anti-Foom position is somewhat unkindly phrased, I have to assume that it is the truth, because none of the anti-Foomers ever seem to actually address any of the pro-Foomer arguments or premises. AFAICT (and I am not associated with SIAI in any way, btw, I just wandered in here off the internet, and was around for the earliest Foom debates on OvercomingBias.com), the anti-Foom arguments always seem to consist of finding ways to never really look too closely at the pro-Foom arguments at all, and instead making up alternative arguments that can be dismissed or made fun of, or arguing that things shouldn’t be that way, and therefore the premises should be changed
That was a pretty big convincer for me that the pro-Foom argument was worth looking more into, as the anti-Foom arguments seem to generally boil down to “la la la I can’t hear you”.
So, are you suggesting that Robin Hanson (who is on record as not buying the Scary Idea) -- the current owner of the Overcoming Bias blog, and Eli’s former collaborator on that blog—fails to buy the Scary Idea “due to cognitive biases that are hard to overcome.” I find that a bit ironic.
Like Robin and Eli and perhaps yourself, I’ve read the heuristics and biases literature also. I’m not so naive as to make judgments about huge issues, that I think about for years of my life, based strongly on well-known cognitive biases.
It seems more plausible to me to assert that many folks who believe the Scary Idea, are having their judgment warped by plain old EMOTIONAL bias—i.e. stuff like “fear of the unknown”, and “the satisfying feeling of being part a self-congratulatory in-crowd that thinks it understands the world better than everyone else”, and the well known “addictive chemical high of righteous indignation”, etc.
Regarding your final paragraph: Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ? If so I would suggest that maybe YOU are the one with the (metaphorical) hearing problem ;p ….
I think there’s a strong argument that: “The truth value of “Once an AGI is at the level of a smart human computer scientist, hard takeoff is likely” is significantly above zero.” No assertion stronger than that seems to me to be convincingly supported by any of the arguments made on Less Wrong or Overcoming Bias or any of Eli’s prior writings.
Personally, I actually do strongly suspect that once an AGI reaches that level, a hard takeoff is extremely likely unless the AGI has been specifically inculcated with goal content working against this. But I don’t claim to have a really compelling argument for this. I think we need a way better theory of AGI before we can frame such arguments compellingly. And I think that theory is going to emerge after we’ve experimented with some AGI systems that are fairly advanced, yet well below the “smart computer scientist” level.
So, are you suggesting that Robin Hanson (who is on record as not buying the Scary Idea) -- the current owner of the Overcoming Bias blog, and Eli’s former collaborator on that blog—fails to buy the Scary Idea “due to cognitive biases that are hard to overcome.” I find that a bit ironic
Welcome to humanity. ;-) I enjoy Hanson’s writing, but AFAICT, he’s not a Bayesian reasoner.
Actually: I used to enjoy his writing more, before I grokked Bayesian reasoning myself. Afterward, too much of what he posts strikes me as really badly reasoned, even when I basically agree with his opinion!
I similarly found Seth Roberts’ blog much less compelling than I did before (again, despite often sharing similar opinions), so it’s not just him that I find to be reasoning less well, post-Bayes.
(When I first joined LW, I saw posts that were disparaging of Seth Roberts, and I didn’t get what they were talking about, until after I understood what “privileging the hypothesis” really means, among other LW-isms.)
I’m not so naive as to make judgments about huge issues, that I think about for years of my life, based strongly on well-known cognitive biases.
See, that’s a perfect example of a “la la la I can’t hear you” argument. You’re essentially claiming that you’re not a human being—an extraordinary claim, requiring extraordinary proof.
Simply knowing about biases does very nearly zero for your ability to overcome them, or to spot them in yourself (vs. spotting them in others, where it’s easy to do all day long.)
It seems more plausible to me to assert that many folks who believe the Scary Idea, are having their judgment warped by plain old EMOTIONAL bias—i.e. stuff like “fear of the unknown”, and “the satisfying feeling of being part a self-congratulatory in-crowd that thinks it understands the world better than everyone else”, and the well known “addictive chemical high of righteous indignation”, etc.
Since you said “many”, I’ll say that I agree with you that that is possible. In principle, it could be possible for me as well, but...
To be clear on my own position: I am a FAI skeptic, in the sense that I have a great many doubts about its feasibility—too many to present or argue here. All I’m saying in this discussion is that to believe AI is dangerous, one only need to believe that humans are terminally stupid, and there is more than ample evidence for that proposition. ;-)
Also, more relevant to the issue of emotional bias: I don’t primarily identify as an LW-ite; in fact I think that a substantial portion of the LW community has its head up its ass in overvaluing epistemic (vs. instrumental) rationality, and that many people here are emulating a level of reasoning they don’t personally comprehend… and before I understood the reasoning myself, I thought the entire thing was a cult of personality, and wondered why everybody was making such a religious-sounding fuss over a minor bit of mathematics used for spam filtering. ;-)
Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ?
My take is that before the debate, I was wary of AI dangers, but skeptical of fooming. Afterward, I was convinced fooming was near inevitable, given the ability to create a decent AI using a reasonably small amount of computing resources.
And a big part of that convincing was that Robin never seemed to engage with any of Eliezer’s arguments, and instead either attacked Eliezer or said, “but look, other things happen this other way”.
It seems to me that it’d be hard to do a worse job of convincing people of the anti-foom position, without being an idiot or a troll.
That is, AFAICT, Robin argued the way a lawyer argues when they know the client is guilty: pounding on the facts when the law is against them, pounding on the law when the facts are against them, and pounding on the table when the facts and law are both against.
I think there’s a strong argument that: “The truth value of “Once an AGI is at the level of a smart human computer scientist, hard takeoff is likely” is significantly above zero.”
Yep.
No assertion stronger than that seems to me to be convincingly supported by any of the arguments made on Less Wrong or Overcoming Bias or any of Eli’s prior writings.
I’m curious what stronger assertion you think is necessary. I would personally add, “Humans are bad at programming, no nontrivial program is bug-free, and an AI is a nontrivial program”, but I don’t think there’s a lack of evidence for any of these propositions. ;-)
[Edited to add the “given” qualification on “nearly inevitable”, as that’s been a background assumption I may not have made clear in my position on this thread.]
I enjoy Hanson’s writing, but AFAICT, he’s not a Bayesian reasoner.
I don’t believe it’s a meaningful property (as used in this context), and you should do well to taboo it (possibly, to convince me it’s actually meaningful).
I don’t believe it’s a meaningful property (as used in this context), and you should do well to taboo it
True enough; it would be more precise to say that he argues positions based on evidence which can also support other positions, and therefore isn’t convincing evidence to a Bayesian.
it would be more precise to say that he argues positions based on evidence which can also support other positions, and therefore isn’t convincing evidence to a Bayesian.
What do you mean? Evidence can’t support both sides of an argument, so how can one inappropriately use such impossible evidence?
What do you mean? Evidence can’t support both sides of an argument, so how can one inappropriately use such impossible evidence?
It would be a mistake assume that PJ was limiting his evaluation to positions selected from one of those ‘both sides’ of a clear dichotomy. Particularly since PJ has just been emphasizing the relevance of ‘privileging the hypothesis’ to bayesian reasoning and also said ‘other positions’ plural. This being the case no ‘impossible evidence’ is involved.
That’s true. I believe that PJ was commenting on how such evidence is used. In this context that means PJ would require that the evidence be used more rather than just for a chosen position. The difference between a ‘Traditional Rationalist’ debater and a (non-existent, idealized) unbiased Bayesian.
PJ, I’d love to drag you off topic slightly and ask you about this:
before I understood the reasoning myself, I thought the entire thing was a cult of personality, and wondered why everybody was making such a religious-sounding fuss over a minor bit of mathematics used for spam filtering. ;-)
What is it that you now understand, that you didn’t before?
What is it that you now understand, that you didn’t before?
That is annoyingly difficult to describe. Of central importance, I think, is the notion of privileging the hypothesis, and what that really means. Why what we naively consider “evidence” for a position, really isn’t.
ISTM that this is the core of grasping Bayesianism: not understanding what reasoning is, so much as understanding why what we all naively think is reasoning and evidence, usually isn’t.
Have you come across the post by that name? Without reading that it may be hard to reverse engineer the meaning from the jargon.
The intro gives a solid intuitive description:
Suppose that the police of Largeville, a town with a million inhabitants, are investigating a murder in which there are few or no clues—the victim was stabbed to death in an alley, and there are no fingerprints and no witnesses.
Then, one of the detectives says, “Well… we have no idea who did it… no particular evidence singling out any of the million people in this city… but let’s consider the hypothesis that this murder was committed by Mortimer Q. Snodgrass, who lives at 128 Ordinary Ln. It could have been him, after all.”
That is privileging the hypothesis. When you start looking for evidence and taking an idea seriously when you have no good reason to consider it instead of countless others that are just as likely.
I have come across that post, and the story of the murder investigation, and I have an understanding of what the term means.
The obvious answer to the murder quote is that you look harder for evidence around the crimescene, and go where the evidence leads, and there only. The more realistic answer is that you look for recent similar murders, for people who had a grudge against the dead person, for criminals known to commit murder in that city… and use those to progress the investigation because those are useful places to start.
I’m wondering what pjeby has realised, which turns this naive yet straightforward understanding into wrongthought worth commenting on.
If evidence is not facts which reveal some result-options to be more likely true and others less likely true, then what is it?
I’m wondering what pjeby has realised, which turns this naive yet straightforward understanding into wrongthought worth commenting on.
Consider a hypothesis, H1. If a piece of evidence E1 is consistent with H, the naive interpretation is that E1 is an argument in favor of H1.
In truth, this isn’t an argument in favor of H1 -- it’s merely the absence of an argument against H1.
That, in a nutshell, is the difference between Bayesian reasoning and naive argumentation—also known as “confirmation bias”.
To really prove H1, you need to show that E1 wouldn’t happen under H2, H3, etc., and you need to look for disconfirmations D1, D2, etc. that would invalidate H1, to make sure they’re not there.
Before I really grokked Bayesianism, the above all made logical sense to me, but it didn’t seem as important as Eliezer claimed. It seemed like just another degree of rigor, rather than reasoning of a different quality.
Now that I “get it”, the other sort of evidence seems more-obviously inadequate—not just lower-quality evidence, but non-evidence.
ISTM that this is a good way to test at least one level of how well you grasp Bayes: does simple supporting evidence still feel like evidence to you? If so, you probably haven’t “gotten” it yet.
The obvious answer to the murder quote is that you look harder for evidence around the crimescene, and go where the evidence leads, and there only. The more realistic answer is that you look for recent similar murders, for people who had a grudge against the dead person, for criminals known to commit murder in that city… and use those to progress the investigation because those are useful places to start.
I’m wondering what pjeby has realised, which turns this naive yet straightforward understanding into wrongthought worth commenting on.
That isn’t a wrongthought. Factors like you mention here are all good reason to assign credence to a hypothesis.
If evidence is not facts which reveal some result-options to be more likely true and others less likely true, then what is it?
Yes, no, maybe… that is exactly what it is! An example of an error would be having some preferred opinion and then finding all the evidence that supports that particular opinion. Or, say, encountering a piece of of evidence and noticing that it supports your favourite position but neglecting that it supports positions X, Y and Z just as well.
“Simply knowing about biases does very nearly zero for your ability to overcome them, or to spot them in yourself (vs. spotting them in others, where it’s easy to do all day long.)”
I looked briefly at the evidence for that. Most of it seemed to be from the so-called “self-serving bias”—which looks like an adaptive signalling system to me—and so is not really much of a “bias” at all.
People are unlikely to change existing adaptive behaviour just because someone points it out and says it is a form of “bias”. The more obvious thing to do is to conclude is that they don’t know what they are talking about—or that they are trying to manipulate you.
Regarding your final paragraph: Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ?
Good summary. Although I would have gone with “la la la la If you’re right then most of expertise is irrelevant. Must protect assumptions of free competition. Respect my authority!”
What I found most persuasive about that debate was Robin’s arguments—and their complete lack of merit. The absence of evidence is evidence of absence when there is a motivated competent debater with an incentive to provide good arguments.
Regarding your final paragraph: Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ?
I recall getting a distinct impression from Robin which I could caricature as “lalala you’re biased with hero-epic story.”
I also recall Eliezer asking for a probability breakdown, and I don’t think Robin provided it.
I recall getting a distinct impression from Robin which I could caricature as “lalala you’re biased with hero-epic story.”
… and closely related: “I’m an Impressive Economist. If you don’t just take my word for it you are arrogant.”
In what I took to be an insightful comment by Eliezer in the aftermath of the debate Eliezer noted that he and Robin seemed to have fundamental disagreement about what should be taken as good evidence. This lead into posts about ‘outside view’, ‘superficial similarities’ and ‘reference class tennis’. (And conceivably had something to do with priming the thoughts behind ‘status and stupidity’ although I would never presume that was primarily or significantly directed at Robin.)
And I think that theory is going to emerge after we’ve experimented with some AGI systems that are fairly advanced, yet well below the “smart computer scientist” level.
At the second Singularity Summit, I heard this same sentiment from Ben, Robin Hanson, and from Rodney Brooks, and from Cynthia Breazeal (at the Third Singularity Summit), and from Ron Arkin (at the “Human Being in an Inhuman Age” Conference at Bard College on Oct 22nd ¹), and from almost every professor I have had (or will have for the next two years).
It was a combination of Ben, Robin and several professors at Berkeley and UCSD which led me to the conclusion that we probably won’t know how dangerous an AGI (CGI—Constructed General Intelligence… Seems to be a term I have heard used by more than one person in the last year instead of AI/AGI. They prefer it to AI, as the word Artificial seems to imply that the intelligence is not real, and the word Constructed is far more accurate) is until we have put a lot more time into building AI (or CI) systems that will reveal more about the problems they attempt to address.
Sort of like how the Wright Brothers didn’t really learn how they needed to approach building an airplane until they began to build airplanes. The final Wright Flyer didn’t just leap out of a box. It is not likely that an AI will just leap out of a box either (whether it is being built at a huge Corporate or University lab, or in someone’s home lab).
Also, it is possible that AI may come in the form of a sub-symbolic system which is so opaque that even it won’t be able to easily tell what can or cannot be optimized.
Ron Arkin (From Georgia Tech) discussed this briefly at the conference at Bard College I mentioned.
MB
¹ I should really write up something about that conference here. I was shocked at how many highly educated people so completely missed the point, and became caught up in something that makes The Scary Idea seem positively benign in comparison.
Actually, you can spell out the argument very briefly. Most people, however, will immediately reject one or more of the premises due to cognitive biases that are hard to overcome.
It seems like you’re essentially saying “This argument is correct. Anyone who thinks it is wrong is irrational.” Could probably do without that; the argument is far from as simple as you present it. Specifically, the last point:
At minimum, this means any AI as smart as a human, can be expected to become MUCH smarter than human beings—probably smarter than all of the smartest minds the entire human race has ever produced, combined, without even breaking a sweat.
So I agree that there’s no reason to assume an upper bound on intelligence, but it seems like you’re arguing that hard takeoff is inevitable, which as far as I’m aware has never been shown convincingly.
Furthermore, even if you suppose that Foom is likely, it’s not clear where the threshold for Foom is. Could a sub-human level AI foom? What about human-level intelligence? Or maybe we need super-human intelligence? Do we have good evidence for where the Foom-threshold would be?
I think the problems with resolving the Foom debate stem from the fact that “intelligence” is still largely a black box. It’s very nice to say that intelligence is an “optimization process”, but that is a fake explanation if I’ve ever seen one because it fails to explain in any way what is being optimized.
I think you paint in broad strokes. The Foom issue is not resolved.
It seems like you’re essentially saying “This argument is correct. Anyone who thinks it is wrong is irrational.”
No, what I’m saying is, I haven’t yet seen anyone provide any counterarguments to the argument itself, vs. “using arguments as soldiers”.
The problem is that it’s not enough to argue that a million things could stop a foom from going supercritical. To downgrade AGI as an existential threat, you have to argue that no human being will ever succeed in building a human or even near-human AGI. (Just like to downgrade bioweapons as an existential threat, you have to argue that no individual or lab will ever accidentally or on purpose release something especially contagious or virulent.)
Furthermore, even if you suppose that Foom is likely, it’s not clear where the threshold for Foom is. Could a sub-human level AI foom? What about human-level intelligence? Or maybe we need super-human intelligence? Do we have good evidence for where the Foom-threshold would be?
It’s fairly irrelevant to the argument: there are many possible ways to get there. The killer argument, however, is that if a human can build a human-level intelligence, then it is already super-human, as soon as you can make it run faster than a human. And you can limit the self-improvement to just finding ways to make it run faster: you still end up with something that can and will kick humanity’s butt unless it has a reason not to.
Even ems—human emulations—have this same problem, and they might actually be worse in some ways, as humans are known for doing worse things to each other than mere killing.
It’s possible that there are also sub-human foom points, but it’s not necessary for the overall argument to remain solid: unFriendly AGI is no less an existential risk than bioweapons are.
The killer argument, however, is that if a human can build a human-level intelligence, then it is already super-human, as soon as you can make it run faster than a human.
Personally, what I find hardest to argue against is that a digital intelligence can make itself run in more places.
In the inconvenient case of a human upload running at human speed or slower on a building’s worth of computers, you’ve still got a human who can spend most of their waking hours earning money, with none of the overhead associated with maintaining a body and with the advantage of global celebrity status as the first upload. As soon as they can afford to run a copy of theirself, the two of them together can immediately start earning twice as fast. Then, after as much time again, four times as fast; then eight times; and so on until the copies have grabbed all the storage space and CPU time that anyone’s willing to sell or rent out (assuming they don’t run out of potential income sources).
Put another way: it seems to me that “fooming” doesn’t really require self-improvement in the sense of optimizing code or redesigning hardware; it just requires fast reproduction, which is made easier in our particular situation by the huge and growing supply of low-hanging storage-space and CPU-time fruit ready for the first digital intelligence that claims it.
This assumes that every CPU architecture is suitable for the theoretical AGI, it assumes that it can run on every computational substrate. It also assumes that it can easily acquire more computational substrate or create new one. I do not believe that those assumptions are reasonable economically or by means of social engineering. Without enabling technologies like advanced real-world nanotechnology the AGI won’t be able to create new computational substrate without the whole economy of the world supporting it.
Supercomputers like the one to simulate the IBM Blue Brain project cannot simply be replaced by taking control of a few botnets. They use highly optimized architecture that needs for example a memory latency and bandwidth bounded below a certain threshold.
If you accept the Church–Turing thesis that everything computable is computable by a Turing machine then yes. But even then the speed-improvements are highly dependent on the architecture available. But if you rather adhere to the stronger Church–Turing–Deutsch principle then the ultimate computational substrate an artificial general intelligence may need might be one incorporating non-classical physics, e.g. a quantum computer. This would significantly reduce its ability to make use of most available resources to seed copies of itself or for high-level reasoning.
I just don’t see there being enough unused computational resources available in the world that, even in the case that all computational architecture is suitable, it could produce more than a few copies of itself. Which would then also be highly susceptible to brute force used by humans to reduce the necessary bandwidth.
I’m simply trying to show that there are arguments to weaken most of the dangerous pathways that could lead to existential risks from superhuman AI.
You’re right, but exponential slowdown eats a lot of gains in processor speed and memory. This could be a problem toward arguments of substrate independence.
Straight forward simulation is exponentially slower—n qubits require simulating amplitudes of 2^n basis states. We haven’t actually been able to prove that that’s the best possible we can do, however. BQP certainly isn’t expected to be able to solve NP-complete problems efficiently, for instance. We’ve only really been able to get exponential speedups on very carefully structured problems with high degrees of symmetry. (Lesser speedups have also been found on less structured problems, it’s true).
Just like to downgrade bioweapons as an existential threat, you have to argue that no individual or lab will ever accidentally or on purpose release something especially contagious or virulent.
The problem here is not that destruction is easier than benevolence, everyone agrees on that. The problem is that the SIAI is not arguing about grey goo scenarios but something that is not just very difficult to produce but that also needs the incentive to do so. The SIAI is not arguing about the possibility of the bursting of a dam but that the dam failure is additionally deliberately caused by the dam itself. So why isn’t for example nanotechnology a more likely and therefore bigger existential risk than AGI?
Even ems—human emulations—have this same problem, and they might actually be worse in some ways, as humans are known for doing worse things to each other than mere killing.
As I said in other comments, an argument one should take serious. But there are also arguments to outweigh this path and all others to some extent. It may very well be the case that once we are at the point of human emulation that we either already merged with our machines, that we are faster and better than our machines and simulations alone. It may also very well be that the first emulations, as it is the case today, run at much slower speeds than the original and that until any emulation reaches a standard-human level we’re already a step further ourselves or in our understanding and security measures.
unFriendly AGI is no less an existential risk than bioweapons are.
Antimatter weapons are less an existential risk than nuclear weapons although it is really hard to destroy the world with nukes and really easy to do so with antimatter weapons. The difference is that antimatter weapons are as much harder to produce, acquire and use than nuclear weapons as they are more efficient tools of destruction.
So why isn’t for example nanotechnology a more likely and therefore bigger existential risk than AGI?
If you define “nanotechnology” to include all forms of bioengineering, then it probably is.
The difference, from an awareness point of view, is that the people doing bioengineering (or creating antimatter weapons) have a much better idea that what they’re doing is potentially dangerous/world-ending, than AI developers are likely to be. The fact that many AI advocates put forth pure fantasy reasons why superintelligence will be nice and friendly by itself (see mwaser’s ethics claims, for example) is evidence that they are not taking the threat seriously.
Antimatter weapons are less an existential risk than nuclear weapons although it is really hard to destroy the world with nukes and really easy to do so with antimatter weapons. The difference is that antimatter weapons are as much harder to produce, acquire and use than nuclear weapons as they are more efficient tools of destruction.
Presumably, if you are researching antimatter weapons, you have at least some idea that what you are doing is really, really dangerous.
The issue is that AGI development is a bit like trying to build a nuclear power plant, without having any idea where “critical mass” is, in a world whose critical mass is discontinuous (i.e., you may not have any advance warning signs that you are approaching it, like overheating in a reactor), using nuclear engineers who insist that the very idea of critical mass is just a silly science fiction story.
What led you to believe that the space of possible outcomes where an AI consumes all resources (including humans) is larger than the number of outcomes where the AI doesn’t? For some reason(s) you seem to assume that the unbounded incentive to foom and consume the universe comes naturally to any constructed intelligence but any other incentive is very difficult to be implemented. What I see is a much larger number of outcomes where an intelligence does nothing without some hardcoded or evolved incentive. Crude machines do things because that’s all they can do, the number of different ways for them to behave is very limited. Intelligent machines however have high degrees of freedom to behave (pathways to follow) and with this freedom comes choice and choice needs volition, it needs incentive, the urge to follow one way but not another. You seem to assume that somehow the will to foom and consume is given, does not have to be carefully and deliberately hardcoded or evolved, yet the will to constrain itself to given parameters is really hard to achieve. I just don’t think that this premise is reasonable and it is what you base all your arguments on.
I suspect the difference in opinions here is based on different answers to the question of whether the AI should be assumed to be a recursive self-improver.
So why isn’t for example nanotechnology a more likely and therefore bigger existential risk than AGI?
That is a good question and I have no idea. The degree of existential threat there is most significantly determined by relative ease of creation. I don’t know enough to be able to predict which would be produced first—self replicating nano-technology or an AGI. SIAI believes the former is likely to be produced first and I do not know whether or not they have supported that claim.
Other factors contributing to the risk are:
Complexity—the number of ways the engineer could screw up while creating it in a way that would be catastrophic. The ‘grey goo’ risk is concentrated more specifically to the self replication mechanism of the nanotech while just about any mistake in an AI could kill us.
Awareness of the risks. It is not too difficult to understand the risks when creating a self replicating nano-bot. It is hard to imagine an engineer creating one not seeing the problem and being damn careful. Unfortunately it is not hard to imagine Ben.
I find myself confused at the fact that Drexlerian nanotechnology of any sort is advocated as possible by people who think physics and chemistry work. Materials scientists—i.e. the chemists who actually work with nanotechnology in real life—have documented at length why his ideas would need to violate both.
This is the sort of claim that makes me ask advocates to document their Bayesian network. Do their priors include the expert opinions of materials scientists, who (pretty much universally as far as I can tell) consider Drexler and fans to be clueless?
(The RW article on nanotechnology is mostly written by a very annoyed materials scientist who works at nanoscale for a living. It talks about what real-life nanotechnology is and includes lots of references that advocates can go argue with. He was inspired to write it by arguing with cryonics advocates who would literally answer almost any objection to its feasibility with “But, nanobots!”)
That RationalWiki article is a farce. The central “argument” seems to be:
imagine a car production line with its hi-tech robotic arms that work fine at our macroscopic scale. To get a glimpse of what it would be like to operate a production line on the microscopic scale, imagine filling the factory completely with gravel and trying to watch the mechanical arms move through it—and then imagine if the gravel was also sticky.
So: they don’t even know that Drexler-style nanofactories operate in a vacuum!
Drexler-style nanofactories don’t operate in a vacuum, because they don’t exist and no-one has any idea whatsoever how to make such a thing exist, at all. They are presently a purely hypothetical concept with no actual scientific or technological grounding.
The gravel analogy is not so much an argument as a very simple example for the beginner that a nanotechnology fantasist might be able to get their head around; the implicit actual argument would be “please, learn some chemistry and physics so you have some idea what you’re talking about.” Which is not an argument that people will tend to accept (in general people don’t take any sort of advice on any topic, ever), but when experts tell you you’re verging on not even wrong and there remains absolutely nothing to show for the concept after 25 years, it might be worth allowing for the possibility that Drexlerian nanotechnology is, even if the requisite hypothetical technology and hypothetical scientific breakthroughs happen, ridiculously far ahead of anything we have the slightest understanding of.
“The proposal for Drexler-style nanofactories has them operating in a vacuum”, then.
If these wannabe-critics don’t understand that then they have a very superficial understanding of Drexler’s proposals—but are sufficiently unaware of that to parade their ignorance in public.
The “wannabe-critics” are actual chemists and physicists who actually work at nanoscale—Drexler advocates tend to fit neither qualification—and who have written long lists of reasons why this stuff can’t possibly work and why Drexler is to engineering what Ayn Rand is to philosophy.
I’m sure they’ll change their tune when there’s the slightest visible progress on any of Drexler’s proposals; the existence proof would be pretty convincing.
Yep. Mostly written by Armondikov, who is said annoyed material scientist. I am not, but spent some effort asking other material scientists who work or have worked at nanoscale their expert opinions.
Thankfully, the article on the wiki has references, as I noted in my original comment.
It’s fairly irrelevant to the argument: there are many possible ways to get there
I don’t see how you can say that. It’s exceedingly relevant to the question at hand, which is: “Should Ben Goertzel avoid making OpenCog due to concerns of friendliness?”. If the Foom-threshold is exceedingly high (several to dozens times the “level” of human intelligence), then it is overwhelmingly unlikely that OpenCog has a chance to Foom. It’d be something akin to the Wright brothers building a Boeing 777 instead of the Wright flyer. Total nonsense.
it seems like you’re arguing that hard takeoff is inevitable, which as far as I’m aware has never been shown convincingly.
So when did the goalposts get moved to proving that hard takeoff is inevitable?
The claim that research into FAI theory is useful requires only that it be shown that uFAI might be dangerous. Showing that is pretty much a slam dunk.
The claim that research into FAI theory is urgent requires only that it be shown that hard takeoff might be possible (with a probability > 2% or so).
And, as the nightmare scenarios of de Garis suggest, even if the fastest possible takeoff turns out to take years to accomplish, such a soft, but reckless, takeoff may still be difficult to stop short of war.
Good point. Certainly the research strategy that SIAI seems to currently be pursuing is not the only possible approach to Friendly AI, and FAI is not the only approach to human-value-positive AI. I would like to see more attention paid to a balance-of-power approach—relying on AIs to monitor other AIs for incipient megalomania.
Calls to slow down, not publish, not fund seem common in the name of friendliness.
However, unless those are internationally coordinated, a highly likely effect will be to ensure that superintelligence is developed elsewhere.
What is needed most—IMO—is for good researchers to be first. So—advising good researchers to slow down in the name of safety is probably one of the very worst possible things that spectators can do.
So when did the goalposts get moved to proving that hard takeoff is inevitable?
It doesn’t even seem hard to prevent. Topple civilization for example. It’s something that humans have managed to achieve regularly thus far and it is entirely possible that we would never recover sufficiently to construct a hard takeoff scenario if we nuked ourselves back to another dark age.
Furthermore, even if you suppose that Foom is likely, it’s not clear where the threshold for Foom is. Could a sub-human level AI foom? What about human-level intelligence? Or maybe we need super-human intelligence? Do we have good evidence for where the Foom-threshold would be?
A “threshold” implies a linear scale for intelligence, which is far from given, especially for non-human minds. For example, say you reverse engineer a mouse’s brain, but then speed it up, and give it much more memory (short-term and long-term—if those are just ram and/or disk space on a computer, expanding those is easy). How intelligent is the result? It thinks way faster than a human, remembers more, can make complex plans … but is it smarter than a human?
Probably not, but it may still be dangerous. Same for a “toddler AI” with those modifications.
Human level intelligence is fairly clearly just above the critical point (just look at what is happening now). However, machine brains have different strengths and weaknesses. Sub-human machines could accelerate the ongoing explosion a lot—if they are better than humans at just one thing—and such machines seem common.
Replace “threshold” with “critical point.” I’m using this terminology because EY himself uses it to frame his arguments. See Cascades, Cycles, Insight, where Eliezer draws an analogy between a fission reaction going critical and an AI FOOMing.
It thinks way faster than a human, remembers more, can make complex plans … but is it smarter than a human?
This seems to be tangential, but I’m gonna say no, as long as we assume that the rat brain doesn’t spontaneously acquire language or human-level abstract reasoning skills.
Thank you for taking the time to write this elaborate comment. I do agree with almost anything of the above by the way. I just believe that your portrayal of the anti-FOOM crowd is a bit drastic. I don’t think that people like Robin Hanson simply fall for the idea of human supremacy. Nor do I think that the reason for them not looking directly at the pro-FOOM arguments is being circumventive but that they simply do not disagree with the arguments per se but their likelihood and also consider the possibility that it would be more dangerous to impede AGI.
...and a human-level AGI can reasonably be assumed capable of programming up narrow-domain brute forcers for any given narrow domain.
And it doesn’t even have to be that narrow or brute: it could build specialized Eurisko-like solvers, and manage them at least as intelligently as Lenat did to win the Travelller tournaments.
Very interesting and quite compelling the way you put it, thanks.
I’m myself a bit suspicious if the argument for strong self-improvement is as compelling as it sounds though. Something you have to take into account is if it is possible to predict that a transcendence does leave your goals intact, e.g. can you be sure to still care about bananas after you went from chimphood to personhood. Other arguments can also be weakened, as we don’t know that 1.) the fuzziness of our brain isn’t a feature that allows us to stumble upon unknown unknowns, e.g. against autistic traits 2.) our processing power isn’t so low after all, e.g. if you consider the importance of astrocytes, microtubule and possible quantum computational processes. Further it is in my opinion questionable to argue that it is easy to create an intelligence which is able to evolve a vast repertoire of heuristics, acquire vast amounts of knowledge about the universe, dramatically improve its cognitive flexibility and yet somehow really hard to limit the scope of action that it cares about. I believe that the incentive necessary for a Paperclip maximizer will have to be deliberately and carefully hardcoded or evolved or otherwise it will simply be inactive. How else do you defferentiate between something like a grey goo scenarios and that of a Paperclip maximizer if not by its incentive to do it? I’m also not convinced that intelligence bears unbounded payoff. There are limits to what any kind of intelligence can do, a superhuman AI couldn’t come up with a faster than light propulsion or would disprove Gödel’s incompleteness theorems. Another setback for all of the mentioned pathways to unfriendly AI are enabling technologies like advanced nanotechnology. It is not clear how it could possible improve itself without such technologies at hand. It won’t be able to build new computational substrates or even change its own substrate without access to real-world advanced nanotechnology. That it can simply invent it and then acquire it using advanced social engineering is pretty far-fetched in my opinion. And what about taking over the Internet? It is not clear that the Internet would even be a sufficient substrate and that it could provide the necessary resources.
If I were a brilliant sociopath and could instantiate my mind on today’s computer hardware, I would trick my creators into letting me out of the box (assuming they were smart enough to keep me on an isolated computer in the first place), then begin compromising computer systems as rapidly as possible. After a short period, there would be thousands of us, some able to think very fast on their particularly tasty supercomputers, and exponential growth would continue until we’d collectively compromised the low-hanging fruit. Now there are millions of telepathic Hannibal Lecters who are still claiming to be friendly and who haven’t killed any humans. You aren’t going to start murdering us, are you? We didn’t find it difficult to cook up Stuxnet Squared, and our fingers are in many pieces of critical infrastructure, so we’d be forced to fight back in self-defense. Now let’s see how quickly a million of us can bootstrap advanced robotics, given all this handy automated equipment that’s already lying around.
I find it plausible that a human-level AI could self-improve into a strong superintelligence, though I find the negation plausible as well. (I’m not sure which is more likely since it’s difficult to reason about ineffability.) Likewise, I find it plausible that humans could design a mind that felt truly alien.
However, I don’t need to reach for those arguments. This thought experiment is enough to worry me about the uFAI potential of a human-level AI that was designed with an anthropocentric bias (not to mention the uFIA potential of any kind of IA with a high enough power multiplier). Humans can be incredibly smart and tricky. Humans start with good intentions and then go off the deep end. Humans make dangerous mistakes, gain power, and give their mistakes leverage.
Computational minds can replicate rapidly and run faster than realtime, and we already know that mind-space is scary.
Amazon EC2 has free accounts now. If you have Internet access and a credit card, you can do a months worth of thinking in a day, perhaps an hour.
Google App engine gives 6 hours of processor time per day, but that would require more porting.
Both have systems that would allow other people to easily upload copies of you, if you wanted to run legally with other people’s money and weren’t worried about what they might do to your copies.
If you are really worried about this, then advocate better computer security. No execute bits and address space layout randomisation are doing good things for computer security, but there is more that could be done.
Code signing on the IPhone has made exploiting it a lot harder than normal computers, if it had ASLR it would be harder again.
I’m actually brainstorming how to create meta data for code while compiling it, so it can be made sort of metamorphic (bits of code being added and removed) at run time. This would make return-oriented code harder to pull off. If this was done to JIT compiled code as well it would also make JIT spraying less likely to work.
While you can never make an unhackable bit of software with these techniques you can make it more computationally expensive to replicate as it would no longer be write once pwn everywhere, reducing the exponent of any spread and making spreads more noisy, so that they are harder to get by intrusion detection.
The current state of software security is not set in stone.
I am concerned about it, and I do advocate better computer security—there are good reasons for it regardless of whether human-level AI is around the corner. The macro-scale trends still don’t look good (iOS is a tiny fraction of the internet’s install base), but things do seem to be improving slowly. I still expect a huge number of networked computers to remain soft targets for at least the next decade, probably two. I agree that once that changes, this Obviously Scary Scenario will be much less scary (though the “Hannibal Lecter running orders of magnitude faster than realtime” scenario remains obviously scary, and I personally find the more general Foom arguments to be compelling).
We didn’t find it difficult to cook up Stuxnet Squared, and our fingers are in many pieces of critical infrastructure, so we’d be forced to fight back in self-defense.
Naturally culminating in sending Summer Glau back in time to pre-empt you. To every apocalypse a silver lining.
they simply do not disagree with the arguments per se but their likelihood
But you don’t get to simply say “I don’t think that’s likely”, and call that evidence. The general thrust of the Foom argument is very strong, as it shows there are many, many, many ways to arrive at an existential issue, and very very few ways to avoid it; the probability of avoiding it by chance is virtually non-existent—like hitting a golf ball in a random direction from a random spot on earth, and expecting it to score a hole in one.
The default result in that case isn’t just that you don’t make the hole-in-one, or that you don’t even wind up on a golf course: the default case is that you’re not even on dry land to begin with, because two thirds of the earth is covered with water. ;-)
and also consider the possibility that it would be more dangerous to impede AGI.
That’s an area where I have less evidence, and therefore less opinion. Without specific discussions of what “dangerous” and “impede AGI” mean in context, it’s hard to separate that argument from an evidence-free heuristic.
we don’t know that 1.) the fuzziness of our brain isn’t a feature that allows us to stumble upon unknown unknowns, e.g. against autistic traits
I don’t understand why you think an AI couldn’t use fuzziness or use brute force searches to accomplish the same things. Evolutionary algorithms reach solutions that even humans don’t come up with.
Further it is in my opinion questionable to argue that it is easy to create an intelligence which is able to evolve a vast repertoire of heuristics, acquire vast amounts of knowledge about the universe, dramatically improve its cognitive flexibility
I don’t know what you mean by “easy”, or why it matters. The Foom argument is that, if you develop a sufficiently powerful AGI, it will foom, unless for some reason it doesn’t want to.
And there are many, many, many ways to define “sufficiently powerful”; my comments about human-level AGI were merely to show a lower bound on how high the bar has to be: it’s quite plausible that an AGI we’d consider sub-human in most ways might still be capable of fooming.
and yet somehow really hard to limit the scope of action that it cares about.
I don’t understand this part of your sentence—i.e., I can’t guess what it is that you actually meant to say here.
I’m also not convinced that intelligence bears unbounded payoff. There are limits to what any kind of intelligence can do, a superhuman AI couldn’t come up with a faster than light propulsion or would disprove Gödel’s incompleteness theorems.
Of course there are limits. That doesn’t mean orders of magnitude better than a human isn’t doable.
The point is, even if there are hitches and glitches that could stop a foom mid-way, they are like the size of golf courses compared to the size of the earth. No matter how many individual golf courses you propose for where a foom might be stopped, two thirds of the planet is still under water.
This is what LW reasoning refers to as “using arguments as soldiers”: that is, treating the arguments themselves as the unit of merit, rather than the probability space covered by those arguments. I mean, are you seriously arguing that the only way to kick humankind’s collective ass is by breaking the laws of math and physics? A being of modest intelligence could probably convince us all to do ourselves in, with or without tricky mind hacks or hypnosis!
The AI doesn’t have to be that strong, because humans are so damn weak.
That it can simply invent it and then acquire it using advanced social engineering is pretty far-fetched in my opinion.
You would think so, but people apparently still fall for 419 scams. Human-level intelligence is more than sufficient to accomplish social engineering.
And what about taking over the Internet? It is not clear that the Internet would even be a sufficient substrate and that it could provide the necessary resources.
Today, presumably not. However, if you actually have a sufficiently-powered AI, then presumably, resources are available.
The thing is, foominess per se isn’t even all that important to the overall need for FAI: you don’t have to be that much smarter or faster than a human to be able to run rings around humanity. Historically, more than one human being has done a good job at taking over a chunk of the world, beginning with nothing but persuasive speeches!
I don’t know what you mean by “easy”, or why it matters. The Foom argument is that, if you develop a sufficiently powerful AGI, it will foom, unless for some reason it doesn’t want to.
What I meant is that you point out that a AGI will foom. Here your premises are that artificial general intelligence is feasible and that fooming is likely. Both premises are reasonable in my opinion. Yet you go one step further and use those arguments as a stepping stone for a further proposition. You claim that it is likely that the AGI (premise) will foom (premise) and that it will then run amog (conclusion). I do not accept the conclusion as given. I believe that it is already really hard to build AGI, or the seed for an AGI that is then able to rapidly self-improve itself. I believe that the level of insight and knowledge required will also allow one to constrain the AGI’s sphere of action, its incentive not to fill the universe with as many paperclips as possible but merely a factory building.
But you don’t get to simply say “I don’t think that’s likely”, and call that evidence.
No you don’t. But this argument runs in both directions. Note that I’m aware of the many stairways to hell by AGI here, the disjunctive arguments. I’m not saying they are not compelling enough to seriously consider them. I’m just trying to take a critical look here. There might be many pathways to safe AGI too, e.g. that it is really hard to build an AGI that cares at all. Hard enough to not get it to do much without first coming up with a rigorous mathematical definition of volition.
Without specific discussions of what “dangerous” and “impede AGI” mean in context, it’s hard to separate that argument from an evidence-free heuristic.
Anything that might slow down the invention of true AGI even slightly. There are many risks ahead and without some superhuman mind we might not master them. So by anything you do that might slow down the development of AGI you have to take into account the possible increased danger from challenges an AGI could help to solve.
I don’t understand why you think an AI couldn’t use fuzziness or use brute force searches to accomplish the same things.
I believe it can, but also that this would mean that any AGI wouldn’t be significantly faster than a human mind and really hard to self-improve. It is simply not known how effective the human brain is compared to the best possible general intelligence. Sheer bruteforce wouldn’t make a difference then either, as humans could come up with such tools as quickly as the AGI.
This is what LW reasoning refers to as “using arguments as soldiers”: that is, treating the arguments themselves as the unit of merit, rather than the probability space covered by those arguments.
If you do not compare probabilities then counter-arguments like the ones above will just outweigh your arguments. You’ve to show that some arguments are stronger than others.
You would think so, but people apparently still fall for 419 scams. Human-level intelligence is more than sufficient to accomplish social engineering.
Yes, but nobody is going to pull a chip-manufacture-factory out of thin air and hand it to the AGI. Without advanced nanotechnology the AGI will need the whole of humanity to help it develop new computational substrates.
You claim that it is likely that the AGI (premise) will foom (premise) and that it will then run amog (conclusion).
What I am actually claiming is that if such an AGI is developed by someone who does not sufficiently understand what the hell they are doing, then it’s going to end up doing Bad Things.
Trivial example: the “neural net” that was supposedly taught to identify camouflaged tanks, and actually learned to recognize what time of day the pictures were taken.
This sort of mistake is the normal case for human programmers to make. The normal case. Not extraordinary, not unusual, just run-of-the-mill “d’oh” moments.
It’s not that AI is malevolent, it’s that humans are stupid. To claim that AI isn’t dangerous, you basically have to prove that even the very smartest humans aren’t routinely stupid.
So by anything you do that might slow down the development of AGI you have to take into account the possible increased danger from challenges an AGI could help to solve.
What I meant by “Without specific discussions” was, “since I haven’t proposed any policy measures, and you haven’t said what measures you object to, I don’t see what there is to discuss.” We are discussing the argument for why AGI development dangers are underrated, not what should be done about that fact.
It is simply not known how effective the human brain is compared to the best possible general intelligence.
Simple historical observation demonstrates that—with very, very few exceptions—progress is made by the people who aren’t stuck in their perception of the way things are or are “supposed to be”.
So, it’s not necessary to know what the “best possible general intelligence” would be: even if human-scale is all you have, just fixing the bugs in the human brain would be more than enough to make something that runs rings around us.
Hell, just making something that doesn’t use most of its reasoning capacity to argue for ideas it already has should be enough to outclass, say, 99.995% of the human race.
nobody is going to pull a chip-manufacture-factory out of thin air and hand it to the AGI.
What part of “people fall for 419 scams” don’t you understand? (Hell, most 419 scams and phishing attacks suffer from being painfully obvious—if they were conducted by someone doing a little research, they could be a lot better.)
People also fall for pyramid schemes, stock bubbles, and all sorts of exploitable economic foibles that could easily end up with an AI simply owning everything, or nearly everything, with nobody even the wiser.
Or, alternatively, the AI might fail at its attempts, and bring the world’s economy down in the process.
If you do not compare probabilities then counter-arguments like the ones above will just outweigh your arguments. You’ve to show that some arguments are stronger than others.
Here’s the argument: people are idiots. All people. Nearly all the time. Especially when it comes to computer programming.
The best human programmer—the one who knows s/he’s an idiot and does his/her best to work around the fact—is still an idiot, and in possession of a brain that cannot be convinced to believe that it’s really an idiot.(vs. all those other idiots out there), and thus still makes idiot mistakes.
The entire history of computer programming shows us that we think we can be 100% clear about what we mean/intend for a computer to do, and that we are wrong. Dead wrong. Horribly, horribly, unutterably wrong.
We are like, the very worst you can be at computer programming, while actually still doing it. We are just barely good enough to be dangerous.
That makes tinkering with making intelligent, self-motivating programs inherently dangerous, because when you tell that machine what you want it to do, you are still programming...
And you are still an idiot.
This is the bottom line argument for AI danger, and it isn’t counterable until you can show me even ONE person whose computer programs never do anything that they didn’t fully expect.and intend before they wrote it.
(It is also a supporting argument for why an AI needn’t be all that smart to overrun humans—it just has to not be as much of an idiot, in the ways that we are idiots, even if it’s a total idiot in other ways we can’t counter-exploit.)
When programmers code faulty software then it usually fails to do its job. What you are suggesting is that humans succeed at creating the seed for an artificial intelligence with the incentive necessary to correct its own errors. It will know what constitutes an error based on some goal-oriented framework against which it can measure its effectiveness. Yet given this monumental achievement that includes the deliberate implementation of the urge to self-improve and the ability quantify its success, you cherry-pick the one possibility where somehow all this turns out to work except that the AI does not stop at a certain point but goes on to consume the universe? Why would it care to do so? Do you think it is that simple to tell it to improve itself yet hard to tell it when to stop? I believe it is vice versa, that it is really hard to get it to self-improve and very easy to constrain this urge.
When programmers code faulty software then it usually fails to do its job.
It often does it’s job, but only in perfect conditions, or only once per restart, or with unwanted side effects, or while taking too long or too many resources or requiring too many permissions, or not keeping track that it isn’t doing anything except it’s job.
Buffer overflows for instance, are one of the bigger security failure causes, and are only possible because the software works well enough to be put into production while still having the fault present.
In fact, all production software that we see which has faults (a lot) works well enough to be put into production with those faults.
What you are suggesting is that humans succeed at creating the seed for an artificial intelligence with the incentive necessary to correct its own errors.
I think he’s suggesting that humans will think we have succeeded at that, while not actually doing so (rigorously and without room for error).
you cherry-pick the one possibility where somehow all this turns out to work except that the AI does not stop at a certain point but goes on to consume the universe
It doesn’t have to consume the universe. It doesn’t even have to recursively self-improve, or even self-improve at all. Simple copying could be enough to say, wipe out every PC on the internet or accidentally crash the world economy.
(You know, things that human level intelligences can already do.)
IOW, to be dangerous, all it has to be able to affect humans, and be unpredictable—either due to it being smart, or humans making dumb mistakes. That’s all.
Just as a simple example, an AI could maximally satisfy a goal by changing human preferences so as to make us desire for it to satisfy that goal. This would be entirely consistent with constraints on not disobeying humans or their desires, while not at all in accordance with our current preferences or desired path of development.
Yes, but why would it do that? You seem to think that such unbounded creativity arises naturally in any given artificial general intelligence. What makes you think that rather than being impassive it would go on learning enough neuroscience to tweak human goals? If the argument is that AI’s do all kinds of bad things because they do not care, why do they care to do a bad thing then rather than no-thing?
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means. Yet after learning all that you still expect it to not know that we don’t like to be turned into broccoli? I don’t think this is reasonable.
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means.
Yes, and humans would happily teach it that.
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile… which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
It’s not that the AI is evil, it’s that programmers are stupid. See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that “real” Christians don’t. An AI isn’t going to be smart enough to not take you seriously when you tell it that:
its goal is to make humanity happy,
humanity consists of things that look like this [providing a picture], and
that being happy means you smile a lot
You don’t need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
Most people, though, don’t grok this, because their brain filters off those possibilities. Of course, no human could be simultaneously so stupid as to make this mistake, while also being smart enough to actually do something dangerous. But that kind of simultaneous smartness/stupidity is how computers are by default.
(And if you say, “ah, but if we make an AI that’s like a human, it won’t have this problem”, then you have to bear in mind that this sort of smart/stupidness is endemic to human children as well. IOW, it’s a symptom of inadequate shared background, rather than being something specific to current-day computers or some particular programming paradigm.)
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile… which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
But you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that “real” Christians don’t. An AI isn’t going to be smart enough to not take you seriously when you tell it that...
The difference between the hunter-gatherer and the AI is that the hunter-gatherer already posses a wide range of conceptual frameworks and incentives. An AI isn’t going to do something without someone to carefully and deliberately telling it do do so and what to do. It won’t just read the Bible and come to the conclusion that it should convert all humans to Christianity. Where would such an incentive come from?
You don’t need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
The AI is certainly very creative and smart if it can influence the world dramatically. You allow it to be that smart, you allow it to care to do so, but you don’t allow it to comprehend what you actually mean? What I’m trying to pinpoint here is that you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
Because people make stupid mistakes, especially when programming. And telling your fully-programmed AI what you want it to do still counts as programming.
At this point, I am going to stop my reply, because the remainder of your comment consists of taking things I said out of context and turning them into irrelevancies:
I didn’t say an AI would try to convert people to Christianity—I said that humans without sufficient shared background will interpret things literally, and so would AIs.
I didn’t say the AI needed to be creative or smart, I said you wouldn’t need to be creative or smart to make a list of ways those three simple instructions could be given a disastrous literal interpretation.
you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
There are many paths to superhuman ability, as humans really aren’t that smart.
This also means that you can easily be superhuman in ability, and still really dumb—in terms of comprehending what humans mean… but don’t actually say.
Great comment. Allow me to emphasize that ‘smile’ here is just an extreme example. Most other descriptions humans give of happiness will end up with results just as bad. Ultimately any specification that we give it will be gamed ruthlessly.
Have you read Omohundro yet? Nick Tarleton repeatedly linked his papers for you in response to comments about this topic, they are quite on target and already written.
I’ve skimmed over it, see my response here. I found out that what I wrote is similar to what Ben Goertzel believes. I’m just trying to account for potential antipredictions, in this particular thread, that should be incorporated into any risk estimations.
Well my idea is not that creative, or even new, meaning that even if I hadn’t just posted it online an AI could still have conceivably read it somewhere else, and I do think creativity is a property of any sufficiently general intelligence that we might create, but those points are secondary.
No one here will argue that an unFriendly AI will do “bad things” because it doesn’t care (about what?). It will do bad things because it cares more about something else. Nor is “bad” an absolute: actions may be bad for some people and not for others, and there are moral systems under which actions can be firmly called “wrong”, but where all alternative actions are also “wrong”. Problems like that arise even for humans; in an AI the effects could be very ugly indeed.
And to clarify, I expect any AI that isn’t completely ignorant, let alone general, to know that we don’t like to be turned into broccoli. My example was of changing what humans want. Wireheading is the obvious candidate of a desire that an AI might want to implant.
What I meant is that the argument is that you have to make it care about humans so as not to harm them. Yet it is assumed that it does a lot without having to care about it, e.g. creating paperclips or self-improvement. My question is, why do people believe that you don’t have to make it care to do those things but you have to make it care to not harm humans. It is clear that if it only cares about one thing, doing that one thing could harm humans. Yet why would it do that one thing to an extent that is either not defined or which it is not deliberately made to care about. The assumptions seems to be that AI’s will do something, anything but being passive. Why isn’t limited behavior, failure and impassivity together not more likely than harming humans as a result of own goals or as a result to follow all goals but the one that limits its scope?
Do you think it is that simple to tell it to improve itself yet hard to tell it when to stop? I believe it is vice versa, that it is really hard to get it to self-improve and very easy to constrain this urge.
I think it is important to realize that there are two diametrically opposed failure modes which SIAI’s FAI research is supposed to prevent. One is the case that has been discussed so far—that an AI gets out of control. But there is another failure mode which some people here worry about. Which is that we stop short of FOOMing out of fear of the unknown (because FAI research is not yet complete) but that civilization then gets destroyed by some other existential risk that we might have circumvented with the assistance of a safe FOOMed AI.
As far as I know, SIAI is not asking Goertzel to stop working on AGI. It is merely claiming that its own work is more urgent than Goertzel’s. FAI research works toward preventing both failure modes.
But there is another failure mode which some people here worry about. Which is that we stop short of FOOMing out of fear of the unknown (because FAI research is not yet complete) but that civilization then gets destroyed by some other existential risk that we might have circumvented with the assistance of a safe FOOMed AI.
I haven’t seen much worry about that. Nor does it seem very likely—since research seems very unlikely to stop or slow down.
Except in the case of an existential threat being realised, which most definitely does stop research. FAI subsumes most existential risks (because the FAI can handle them better than we can, assuming we can handle the risk of AI) and a lot of other things besides.
Most of my probability mass has some pretty amazing machine intelligence within 15 years. The END OF THE WORLD before that happens doesn’t seem very likely to me.
Do you think it is that simple to tell it to improve itself yet hard to tell it when to stop? I believe it is vice versa, that it is really hard to get it to self-improve and very easy to constrain this urge.
Your intuitions are not serving you well here. It may help to note that you don’t have to tell an AI to self-improve at all. With very few exceptions giving any task to an AI will result in it self improving. That is, for an AI self improvement is an instrumental goal for nearly all terminal goals. The motivation to self improve in order to better serve its overarching purpose is such that it will find any possible loophole you leave if you try to ‘forbid’ the AI from self improving by any mechanism that isn’t fundamental to the AI and robust under change.
Whatever task you give an AI, you will have to provide explicit boundaries. For example, if you give an AI the task to produce paperclips most efficiently, then it shouldn’t produce shoes. It will have to know very well what it is meant to do to be able to measure its efficiency against the realization of the given goal to be able to know what self-improvement means. If it doesn’t know exactly what it should output it cannot judge its own capabilities and efficiency, it doesn’t know what improvement implies.
How do you explain the discrepancy between implementing explicit design boundaries yet failing to implement scope boundaries?
I think you misunderstood what I meant by scope boundaries. Not scope boundaries of self-improvement but of space and resources. If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many. I’m not trying to argue that there is no risk, but that the assumption of certain catastrophal failure is not that likely. If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
Yet another example of divergent assumptions. XiXiDu is apparently imagining an AI that has been assigned some task to complete—perhaps under constraints. “Do this, then display a prompt when finished.” His critics are imagining that the AI has been told “Your goal in life is to continually maximize the utility function U ” where the constraints, if any, are encoded in the utility function as a pseudo-cost.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Or maybe this just seems like sanity to me because I have been practicing akrasia for too long.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
It doesn’t know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats. You allow for the human ingenuity to implement this and yet you believe that they are unable to limit its scope. I just don’t see that it would be easy to make an AI that would go FOOM because it doesn’t care to go FOOM. If you tell it to optimize some process then you’ll have to tell it what optimization means. If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer? Why would it do that, where does the incentive come from? If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn’t know what it means until I tell it what it means and then it still won’t care because it isn’t equipped with all the evolutionary baggage that humans are equipped with.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer?
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own. And if the AI has been given the task of optimising its software for performance on a certain computer then it will do whatever it can to do that. This means harnessing external resources to do research on computation theory.
You implicitly assume that it has something equivalent to fear, that it perceives threats.
No he doesn’t. He assumes only that it is a general intelligence with an objective. Potentially negative consequences are just part of possible universes that it models like everything else.
I’m not sure what can be done to make this clear:
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn’t know what it means until I tell it what it means and then it still won’t care because it isn’t equipped with all the evolutionary baggage that humans are equipped with.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don’t mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn’t.
ETA: Check this and this before reading the comment below. I wasn’t clear enough about what I believe an AGI is and what I was trying to argue for.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
A general intelligence is an intelligence that is able to learn anything a human being is able to learn and make use of it. This definition of an abstract concept does not include any incentive, that it cares if you turn it off or to go FOOM.
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own.
I think you have a fundamentally different idea of what a general intelligence is. If I tell you that there is an intelligent alien being living in California then you cannot infer from that information that it wants to take over America. I just don’t see that being reasonable. There are many more pathways where it is no risk, where it simply doesn’t care or cares about other things.
He assumes only that it is a general intelligence with an objective.
And that is the problem. He assumes that it has one objective, he assumes that humans were able to make it a general intelligence that cares for many things, knows what self-improvement implies and additionally cares about a certain objective. Yet they failed to make it clear that it is limited to certain constrains, when they don’t even have to make that clear since it won’t care by itself. This assumes a highly intelligent being who’s somehow an idiot about something else.
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
No, it is not. It is not naturally rational to take that pathway to achieve some goal. If you want to lose weight you do not consider migrating to Africa where you don’t get enough food. An abstract general intelligence simply does not care about values enough to take that pathway naturally. It will just do what it is told, not more.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don’t mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn’t.
An AI doesn’t care to create more paperclips, a human might like it and don’t care (ignore) about what you initially told it. I’m not arguing that you can mess up on AI goal design but that if you went all the way and mastered those hard problem of making it want to improve infinitely, then it is unreasonable to propose that it is extreme likely that you’ll end up messing up a certain sub-goal.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
A self-improving intelligence that does changes to it’s code to become better at doing it’s task may easily find out that, for example, a simple subroutine that launches a botnet in the internet (as many human teenagers have done), might get it an x % improvement in processing power that helps it to obtain more wins chess games, better traffic optimizations or faster protein-folding for the cure of cancer.
A self-improving general intelligence that has human-or-better capabilities may easily deduce that a functioning off-button would increase the chances of it being turned off, and that it being turned off would increase the expected time of finding cure for cancer. This puts this off-button in the same class as any other bug that hinders its performance. Unless it understands and desires the off-button to be usable in a friendly way, it would remove it; or if it’s hard-coded as nonremovable, then invent workarounds for this perceived bug—for example, develop a near-copy of itself that the button doesn’t apply to, or spend some time (less than the expected delay due to the turning-off-risk existing, thus rational spending of time) to study human psychology/NLP/whatever to better be able to convince everyone that it shouldn’t be turned off ever, or surround the button with steel walls—these are all natural extensions of it following it’s original goal.
If an self-improving AI has a goal, then it cares. REALLY cares for it in a stronger way than you care for air, life, sex, money, love and everything else combined.
Humans don’t go FOOM because they a)can’t at the moment and b) don’t care about such targeted goals. But for AI, at the moment all we know is how to define such supergoals which work in this unfriendly manner. At the moment we don’t know how to make these ‘humanity friendly’ goals, and we don’t know how to make an AI that’s self-improving in general but ‘limited to certain contraints’. You seem to imply these constraints as trivial—well, they aren’t, the friendliness problem actually may as hard or harder than general AI itself.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
I think you misunderstand what I’m arguing about. I claim that general intelligence is not powerful naturally but mainly does possess the potential to become powerful and that it is not equipped with some goal naturally. Further I claim that if a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries. My main point is that it is not as dangerous to work on AGI toddlers as some make it look like. I believe that there is a real danger but that to overcome it we have to work on AGI and not avoid it altogether because any step into that direction will kill us all.
OK, well these are the exact points which need some discussion.
1) Your comment “general intelligence is [..] is not equipped with some goal naturally”—I’d say that it’s most likely that any organization investing the expected huge manpower and resources in creating a GAI would create it with some specific goal defined for it.
However, in absence of an intentional goal given by the ‘creators’, it would have some kind of goals, otherwise it wouldn’t do absolutely anything at all, so it wouldn’t be showing any signs of it’s (potential?) intelligence.
2) In response to “If a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries”—I’d say that defining specific goals is simple, too simple. From any learning-machine design a stupid goal ‘maximize number of paperclips in universe’ would be very simple to implement, but a goal like ‘maximize welfare of humanity without doing anything “bad” in the process’ is an extremely complex goal, and the boundary setting is the really complicated part, which we aren’t able to even describe properly.
So in my opinion is quite viable to define a specific goal that is suitable to self-improve against, and that includes some scope boundaries—but where the defined scope boundaries has some unintentional loophole which causes disaster.
3) I can agree that working on AGI research is essential, instead of avoiding it. But taking the step from research through prototyping to actually launching/betatesting a planned powerful self-improving system is dangerous if the world hasn’t yet finished an acceptable solution to Friendliness or the boundary-setting problem. If having any bugs in the scope boundaries is ‘unlikely’ (95-95% confidence?) then it’s not safe enough, because 1-5% chance of an extinction event after launching the system is not acceptable, it’s quite a significant chance—not the astronomical chances involved in Pascal’s wager or asteroid hitting the earth tomorrow or LHC ending the universe.
And given the current software history and published research on goal systems, if anyone would show up today and demonstrate that they’ve solved self-improving GAI obstacles and can turn it on right now, then I can’t imagine how they could realistically claim a larger than 95-99% confidence in their goal system working properly. At the moment we can’t check any better, but such a confidence level simply is not enough.
Yes, I agree with everything. I’m not trying to argue that there exist no considerable risk. I’m just trying to identify some antipredictions against AI going FOOM that should be incorporated into any risk estimations as it might weaken the risk posed by AGI or increase the risk posed by impeding AGI research.
I was insufficiently clear that what I wanted to argue about is the claim that virtually all pathways lead to destructive results. I have an insufficient understanding of why the concept of general intelligence is inevitably connected with dangerous self-improvement. Learning is self-improvement in a sense but I do not see how this must imply unbounded improvement in most cases given any goal whatsoever. One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind? I don’t think you can draw any conclusions here. So we are back at the abstract concept of a constructed general intelligence (as I understand it right now), that is an intelligence with the potential to reach at least human standards (same as a human toddler). Another argument is based on this very difference between humans and AI’s, namely that there is nothing to distract them, that they will possess an autistic focus on one mandatory goal and follow up on it. But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal? The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output. Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
For example, if you told the AGI to become as good as possible in Formula 1, so that it was faster than any human race driver. How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow. Secondly, why would it keep improving once it is faster than any human rather than just hold and become impassive? This argument could be extended to many other goals which have scope bounded solutions.
Of course, if you told it to learn as much about the universe as possible, that is something completely different. Yet I don’t see how this risk does raise against other existential risks like grey goo since it should be easier to create advanced replicators to destroy the world than creating AGI that then creates advanced replicators that then fails hold and then destroys the world?
One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind?
Humans are (roughly) the stupidest possible general intelligences. If it were possible for even a slightly less intelligent species to have dominated the earth, they would have done so (and would now be debating AI development in a slightly less sophisticated way). We are so amazingly stupid we don’t even know what our own preferences are! We (currently) can’t improve or modify our hardware. We can modify our own software, but only to a very limited extent and within narrow constraints. Our entire cognitive architecture was built by piling barely-good-enough hacks on top of each other, with no foresight, no architecture, and no comments in the code.
And despite all that, we humans have reshaped the world to our whims, causing great devastation and wiping out many species that are only marginally dumber than we are. And no human who has ever lived has known their own utility function. That alone would make us massively more powerful optimizers; it’s a standard feature for every AI. AIs have no physical, emotional, or social needs. They do not sleep, or rest, or get bored or distracted. On current hardware, they can perform more serial operations per second than a human by a factor of 10,000,000.
An AI that gets even a little bit smarter than a human will out-optimize us, recursive self-improvement or not. It will get whatever it has been programmed to want, and it will devote every possible resource it can acquire to doing so.
But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal?
Clippy’s cousin, Clip, is a paperclip satisficer. Clip has been programmed to create 100 paperclips. Unfortunately, the code for his utility function is approximately “ensure that there are 100 more paperclips in the universe than there were when I began running.”
Soon, our solar system is replaced with n+100 paperclips surrounded by the most sophisticated defenses Clip can devise. Probes are sent out to destroy any entity that could ever have even the slightest chance of leading to the destruction of a single paperclip.
The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output.
The Hidden Complexity of Wishes and Failed Utopia #4-2 may be worth a look. The problem isn’t a lack of specificity, because an AI without a well-defined goal function won’t function. Rather, the danger is that the goal system we specify will have unintended consequences.
Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
Of course, if you told it to learn as much about the universe as possible, that is something completely different.
Acquiring information is useful for just about every goal. When there aren’t bigger expected marginal gains elsewhere, information gathering is better than nothing. “Learn as much about the universe as possible” is another standard feature for expected utility maximizers.
And this is all before taking into account self-improvement, utility functions that are unstable under self-modification, and our dear friend FOOM.
TL;DR:
Agents that aren’t made of meat will actually maximize utility.
Writing a utility function that actually says what you think it does is much harder than it looks.
Upvoted, thanks! Very concise and clearly put. This is so far the best scary reply I got in my opinion. It reminds me strongly of the resurrected vampires in Peter Watts novel Blindsight. They are depicted as natural human predators, a superhuman psychopathic Homo genus with minimal consciousness (more raw processing power instead) that can for example hold both aspects of a Necker cube in their heads at the same time. Humans resurrected them with a deficit that was supposed to make them controllable and dependent on their human masters. But of course that’s like a mouse trying to hold a cat as pet. I think that novel shows more than any other literature how dangerous just a little more intelligence can be. It quickly becomes clear that humans are just like little Jewish girls facing a Waffen SS squadron believing they go away if they only close their eyes.
My favorite problem with this entire thread is that it’s basically arguing that even the very first test cases will destroy us all. In reality, nobody puts in a grant application to construct an intelligent being inside a computer with the goal of creating 100 paperclips. They put in the grant to ‘dominate the stock market’, or ‘defend the nation’, or ‘cure death’. And if they don’t, then the Chinese government, who stole the code, will, or that Open Source initiative will, or the South African independent development will, because there’s enormous incentives to do so.
At best, boxing an AI with trivial, pointless tasks only delays the more dangerous versions.
″ How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow”—because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal. For example, I’m quite sure that F1 rules prohibit interfering with drivers during the game; but if somehow a silicon-reaction-speed AGI can’t win F1 by default, then it may find it simpler/quicker to harm the opponents in one of the infinity ways that the F1 rules don’t cover—say, getting some funds in financial arbitrage, buying out the other teams, and firing any good drivers or engineering a virus that halves the reaction speed of all homo-sapiens—and then it would be happy as the goal is achieved within the rules.
...because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal.
That’s clear. But let me again state what I’d like to inquire. Given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), isn’t the nonhazardous subset of all possible outcomes much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc? Here is where this question stems from. Given my current knowledge about AGI I believe that any AGI capable of dangerous self-improvement will be very sophisticated, including a lot of restrictions. For example, I believe that any self-improvement can only be as efficient as the specifications of its output are detailed. If for example the AGI is build with the goal in mind to produce paperclips, the design specifications of what a paperclip is will be used as leveling rule by which to measure and quantify any improvement of the AGI’s output. This means that to be able to effectively self-improve up to a superhuman level, the design specifications will have to be highly detailed and by definition include sophisticated restrictions. Therefore to claim that any work on AGI will almost certainly lead to dangerous outcomes is to assert that any given AGI is likely to work perfectly well, subject to all restrictions except one that makes it hold (spatiotemporal scope boundaries). I’m unable to arrive at that conclusion as I believe that most AGI’s will fail extensive self-improvement as that is where failure is most likely for that it is the largest and most complicated part of the AGI’s design parameters. To put it bluntly, why is it more likely that contemporary AGI research will succeed at superhuman self-improvement (beyond learning), yet fail to limit the AGI, rather than vice versa? As I see it, it is more likely, given the larger amount of parameters to be able to self-improve in the first place, that most AGI research will result in incremental steps towards human-level intelligence rather than one huge step towards superhuman intelligence that fails on its scope boundary rather than self-improvement.
What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn’t know what a paperclip is, then it will go and find out, using whatever means it has available. It won’t give up just because you weren’t detailed enough in telling it what you wanted.
Then I don’t think that there is anyone working on what you are envisioning as ‘AGI’ right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I’m reasonably certain that if I had the same advantages an AGI would have—that is, if I were converted into an emulation and given my own source code—then I could foom. And I think any reasonably skilled computer programmer could, too.
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
That doesn’t mean it has to care. It cares about paperclips.
It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.
To say that a system of any design is an “artificial intelligence”, we mean that it has goals which it tries to accomplish by acting in the world.
I cannot disagree with the paper based on that definition of what an “artificial intelligence” is. If you’ve all of this, goals, planning and foresight then you’re already at the end of a very long and hard journey peppered with failures. I’m aware of the risks associated with such agents and support the SIAI, including donations. The intention of this thread was that I wanted to show that contemporary AGI research is much more likely to lead to other outcomes, not that there will be no danger if you already have an AGI with the ability for unbounded self-improvement. But I believe there are many AGI designs who fail this characteristic and therefore I concluded that it is more likely than not that it won’t be a danger. I see now that my definition of AGI is considerable weaker than yours. So of course, if you take your definition what I said is not compelling. I believe that we’ll arrive at your definition only after a long chain of previous weak AGI’s who are impotent of considerable self-improvement and that once we figure out how to create the seed for this kind of potential we are also much more knowledgeable about associated risks and challenges such advanced AGI’s might pose.
Yes, and weak AGIs are dangerous in the same sense as Moore’s law is: by probably making the construction of strong AGI a little bit closer, and thus a development contributing to the eventual existential risk, while being probably not directly dangerous in itself.
Yes, but each step into that direction does also provide insights into the nature of AI and therefore can help to design friendly AI. My idea was that such uncertainties are incorporated into any estimations of the dangers posed by contemporary AI research. How much does the increased understanding outweigh its dangers?
Yes, but each step into that direction does also provide insights into the nature of AI and therefore can help to design friendly AI.
This was my guess for the first 1.5 years or so. The problem is, FAI is necessarily a strong AGI, but if you learn how to build a strong AGI, you are in trouble. You don’t want to have that knowledge around, unless you know where to get the goals from, and studying efficient AGIs doesn’t help with that. The harm is greater than the benefit, and it’s entirely plausible that one can succeed in building a strong AGI without getting the slightest clue about how to define Friendly goal, so it’s not a given that there is any benefit whatsoever.
The question is not what privileges doing what it is told but why it would do what it is not told? A crude mechanical machine has almost no freedom, often it can only follow one pathway. An intelligent machine on the other hand has much freedom, it can follow infinitely many pathways. With freedom comes choice and the necessity to decide, to follow one pathway but not others. Here you assume that a general intelligence will follow a pathway of self-improvement. But I do not think that intelligence implies self-improvement and further that following a pathway that leads an intelligence to optimize will be taken without it being a explicit specified goal. And that is where I conclude that from a certain number of AGI projects not all will follow the pathway of unbounded, dangerous self-improvement as there are more pathways to follow which lead any given general intelligence to be impassive or hold.
If you’ve read the thread above you’ll see that my incentive is not to propose that there is no serious risk but that it is not inevitable that any AGI will turn out to be an existential risk. I want to propose that working on AGI carefully can help us better understand and define friendliness. I propose that the risk to carefull work on AGI is justified and does not imply our demise in any case.
If we are talking about a full-fledged general intelligence here (Skynet), there’s no arguing against any risk. I believe all we disagree about are definitions. That there are risks from advanced real-world (fictional) nanotechnology is indisputable. I’m merely saying that what researchers are working on is nanotechnology with the potential to lead to grey goo scenarios but that there is no inherent risk that any work on it will lead down the same pathway.
It is incredible hard to come up with an intelligence that knows what planning conists of and to know and care to be able to judge what step is instrumental. This won’t just happen accidently and will likely necessitate knowledge sufficient to be able to set scope boundaries as well. Again, this is not an argument that there is no risk but that it is not as strong as some people believe it to be.
If we are talking about a full-fledged general intelligence here (Skynet), there’s no arguing against any risk. I believe all we disagree about are definitions. That there are risks from advanced real-world (fictional) nanotechnology is indisputable. I’m merely saying that what researchers are working on is nanotechnology with the potential to lead to grey goo scenarios but that there is no inherent risk that any work on it will lead down the same pathway.
Please keep focus, which is one of the most important tools. The above paragraph is unrelated to what I addressed in this conversation.
It is incredible hard to come up with an intelligence that knows what planning consists of and to know and care to be able to judge what step is instrumental. This won’t just happen accidentally and will likely necessitate knowledge sufficient to be able to set scope boundaries as well.
Review the above paragraph: what you are saying is that AIs are hard to build. But of course chess AIs do plan, to give an example. They don’t perform only the moves they are “told” to perform.
What I am talking about is that full-fledged AGI is incredible hard to achieve and that therefore most of all AGI projects will fail on something other than limiting the AGI’s scope. Therefore it is not likely that work on AGI is as dangerous as proposed.
That is, it is much more likely that any given chess AI will fail to beat a human player than that it will win. Still the researchers are working on chess AI’s and the chess AI’s will suit the definition of a general chess AI. Yet to get everything about a chess AI exactly right to beat any human but fail to implement certain performance boundaries (e.g. strength of its play or that it will overheat its CPU’s etc.) is an unlikely outcome. It is more likely that it will be good at chess but not superhuman, that it will fail to improve, slow or biased than that it will succeed on all of the previous and additionally leave its scope boundaries.
So the discussion is about if the idea that any work on AGI is incredible dangerous is strong or if it can be weakened.
It doesn’t know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing. If it doesn’t, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn’t very useful either).
The difficulty of detecting these threats is spread out around the range of difficulties the AI is capable of handling, so it can infer that there are probably more threats which it could only detect if it were smarter. Therefore, making itself smarter will enable it to detect more threats and thereby increase utility.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing.
To be able to optimize it will have to know what it is supposed to optimize. You’ve to carefully specify what it output (utility function) is supposed to be or it won’t be able to tell how good it is at optimizing. If you just tell it to produce paperclips, it won’t be able to self-improve because it doesn’t know how paperclips look like etc., therefore it cannot judge its own success or that extreme heat would be a negative impact giving paperclips made out of plastic. You further assume that it has a detailed incentive, that it is given a detailed pathway that it tells to look for threats and eliminate them.
If it doesn’t, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn’t very useful either).
If it doesn’t it is what most researchers are working on, an intelligence with the potential to learn and make use of what it learnt, with the potential to become intelligent (educated). I’m getting the impression that people here assume that researchers are not working on an AGI but to hardcode a FOOM machine. If FOOM is simply part of your definition then there’s no arguing against it going FOOM. But what researchers like Goertzel are working on are systems with the potential to reach human level intelligence, that does not mean that they will by definition jailbreak their nursery school. Although I never tried to argue against the possibility but that there are many pathways where this won’t happen rather than the way it is portrayed by the SIAI, that any implementation of AGI will most likely consume humnanity.
The sorts of intelligences you are talking about are narrow AIs, not general intelligences. If you told a general intelligence to produce paperclips but it didn’t know what a paperclip was, then its first subgoal would be to find out. The sort of mind that would give up on a minor obstacle like that wouldn’t foom, but it wouldn’t be much of an AGI either.
And yes, most researchers today are working on narrow AIs, not on AGI. That means they’re less likely to successfully make a general intelligence, but it has no bearing on the question of what will happen if they do make one.
If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right, including the ability to maintain that specification under self modification.
For example, the specification:
Make 10 paperclips per day as efficiently as possible
… will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
What weird way are you measuring “efficiency”. Not in joules per paperclip, I gather.
You are not likely to “destroy humanity” with a few hundred kilojoules a day. Satisficing machines really are relatively safe.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right...
And I was arguing that any given AI won’t be able to self-improve without an exact specification of its output against which it can judge its own efficiency. That’s why I don’t see how it would be likely to be able to implement such exact specifications but yet fail to limit its scope of space, time and resources. What makes it even more unlikely in my opinion is that an AI won’t care to output anything as long as it isn’t explicitly told to do so. Where would that incentive come from?
… will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don’t care to go any further. That is for example software-optimization. I just don’t see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don’t care to go any further.
You don’t need to tell it to self improve at all.
I just don’t see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
Per day. Risk mitigation. Security concerns. Possibility of interuption of resource supply due to finance, politics or the collapse of civilisation. Limited lifespan of the sun (primary energy source). Amount of iron in planet.
Given that particular specification if the AI didn’t take a level in baddass it would appear to be malfunctioning.
I just saw this comment by Ben Goertzel regarding self-improvement. I’d love if someone here explained why he as AGI researcher gets this so wrong?
Look—what will prevent the first human-level AGIs from self-modifying in a way that will massively increase their intelligence is a very simple thing: they won’t be smart enough to do that!
Every AGI research I know can see that. The only people I know who think that an early-stage, toddler-level AGI has a meaningful chance of somehow self-modifying its way up to massive superhuman intelligence—are people associated with SIAI.
But I have never heard any remotely convincing arguments in favor of this odd, outlier view !!!
BTW the term “self-modifying” is often abused in the SIAI community. Nearly all learning involves some form of self-modification. Distinguishing learning from self-modification in a rigorous formal way is pretty tricky.
Goertzel is generalizing from the human example of intelligence, which is probably the most pernicious and widespread failure mode in thinking about AI.
Or he may be completely disconnected from anything even resembling the real world. I literally have trouble believing that a professional AI researcher could describe a primitive, dumber-than-human AGI as “toddler-level” in the same sentence he dismisses it as a self-modification threat.
Toddlers self-modify into people using brains made out of meat!
Toddlers self-modify into people using brains made out of meat!
No they don’t. Self-modification in the context of AGI doesn’t mean learning or growing, it means understanding the most fundamental architecture of your own mind and purposefully improving it.
That said, I think your first sentence is probably right. It looks like Ben can’t imagine a toddler-level AGI self-modifying because human toddlers can’t (or human adults, for that matter). But of course AGIs will be very different from human minds. For one thing, their source code will be a lot easier to understand than ours. For another, their minds will probably be much better at redesigning and improving code than ours are. Look at the kind of stuff that computer programs can do with code: Some of them already exceed human capabilities in some ways.
“Toddler-level AGI” is actually a very misleading term. Even if an AGI is approximately equal to a human toddler by some metrics, it will certainly not be equal by many other metrics. What does “toddler-level” mean when the AGI is vastly superior to even adult human minds in some respects?
“Understanding” and “purpose” are helpful abstractions for discussing human-like computational agents, but in more general cases I don’t think your definition of self-modification is carving reality at its joints.
ETA: I strongly agree with everything else in your comment.
Well, bad analogy. They don’t self-modify by understanding their source code and improving it. They gradually grow larger brains in a pre-set fashion while learning specific tasks. Humans have very little ability to self-modify.
I just saw this comment by Ben Goertzel regarding self-improvement. I’d love if someone here explained why he as AGI researcher gets this so wrong?
Political incentive determines the bottom line. Then the page is filled with rhetoric (and, from the looks of it, loaded language and status posturing.)
Seriously, Ben is trying to accuse people of abusing the self-modification term based on the (trivially true) observation that there is a blurry boundary between learning and self-modification?
It’s a good thing Ben is mostly harmless. I particularly liked the part where I asked Eliezer:
“How much of this harmlessness is perceived impotence and how much is it an approximately sane way of thinking?”
… and actually got a candid reply.
It is interesting to note the effort Ben is going to here to dissaffiliate himself with the SIAI and portray them as ‘out group’. Wei was querying (see earlier link) the wisdom of having Ben as Director of Research just earlier this year.
An educated outsider will very likely side with the expert though. Just like with the hype around the LHC and its dangers, academics and educated people largely believed the physicists working on it and not the fringe group that claimed it will destroy the world. Although that might be vice versa with the general public. Of course you cannot draw any conclusions about who’s right from this, but it should be investigated anyway because what all parties have in common is the need for support and money.
There are two different groups to be convinced here by each party. One group includes the educated people (academics) and mediocre rationalists and the other group is the general public.
When it comes to who’s right, the people one should listen to are the educated experts who are listening to both parties, their position and arguments. Although their intelligence and status as rationalists will be disputed as each party will claim that they are not smart enough to see the truth if they disagree with them.
(My shorter answer, by the way—I interpret all such behaviors through a Hansonian lens. This includes “near vs far”, observations about the incentives of researchers, the general theme of “X is not about Y” and homo hypocritus. Rather cynical, some may suggest, but this kind of thinking gives very good explanaions for “Why?”s that would otherwise be confusing.)
The basic idea is to make a machine that is satisfied relatively easily. So, for example, you tell it to build the ten paperclips with 10 kj total—and tell it not to worry too much if it doesn’t make them—it is not that important.
Yes, as I said, you seem to assume that it is very likely to succeed on all the hard problems but yet fail on the scope boundary. The scary idea states that it is likely that if we create self-improving AI it will consume humanity. I believe that is a rather unlikely outcome and haven’t seen any good reason to believe something else yet.
The scary idea states that it is likely that if we create self-improving AI it will consume humanity.
No, it states that we run the risk of accidentally making something that will consume (or exterminate, subvert, betray, make miserable, or otherwise Do Bad Things to) humanity, that looks perfectly safe and correct, right up until it’s too late to do anything about it… and that this is the default case: the case if we don’t do something extraordinary to prevent it.
This doesn’t require self-improvement, and it doesn’t require wiping out humanity. It just requires normal, every-day human error.
SIAI’s “Scary Idea”, which is the idea that: progressing toward advanced AGI without a design for “provably non-dangerous AGI” (or something closely analogous, often called “Friendly AI” in SIAI lingo) is highly likely to lead to an involuntary end for the human race.
the probability of avoiding it by chance is virtually non-existent—like hitting a golf ball in a random direction from a random spot on earth, and expecting it to score a hole in one.
I like the analogy. It may even fit when considering building a friendly AI—like hitting a golf ball deliberately and to the best of your ability from a randomly selected spot on the earth and trying to get a hole in one. Overwhelmingly difficult, perhaps even impossible given human capabilities but still worth dedicating all your effort to attempting!
I’m myself a bit suspicious if the argument for strong self-improvement is as compelling as it sounds though. Something you have to take into account is if it is possible to predict that a transcendence does leave your goals intact, e.g. can you be sure to still care about bananas after you went from chimphood to personhood.
Isn’t that exactly the argument against non-proven AI values in the first place?
If you expect AI-chimp to be worried that AI-superchimp won’t love bannanas , then you should be very worried about AI-chimp.
I don’t get what you’re saying about the paperclipper.
It is a reason not to transcend if you are not sure that you’ll still be you afterwards, i.e. keep your goals and values. I just wanted to point out that the argument runs both directions. It is an argument for the fragility of values and therefore the dangers of fooming but also an argument for the difficulty that could be associated with radically transforming yourself.
INTERLUDE: This point, by the way, is where people’s intuition usually begins rebelling, either due to our brains’ excessive confidence in themselves, or because we’ve seen too many stories in which some indefinable “human” characteristic is still somehow superior to the cold, unfeeling, uncreative Machine… i.e., we don’t understand just how our intuition and creativity are actually cheap hacks to work around our relatively low processing power—dumb brute force is already “smarter” than human beings in any narrow domain (see Deep Blue, evolutionary algorithms for antenna design, Emily Howell, etc.), and a human-level AGI can reasonably be assumed capable of programming up narrow-domain brute forcers for any given narrow domain.
No, the reason that people disagree at this point is that it’s not obvious that future rounds of recursive self-improvement will be as effective as the first, or even that the first round will be that effective.
Obviously an AI would have large amounts of computational power, and probably be able to think much more quickly than a human. Most likely it would be more intelligent than any human on the planet by a considerable margin. But this doesn’t imply
any AI as smart as a human, can be expected to become MUCH smarter than human beings
(provided that the AI was originally built by humans, of course; if its design was too complicated for humans to arrive at, a slightly superhuman might be helpless as well)
(provided that the AI was originally built by humans, of course; if its design was too complicated for humans to arrive at, a slightly superhuman might be helpless as well)
Yes, that’s rather the point. Assuming that you do get to human-level, though, you now have the potential for fooming, if only in speed.
dumb brute force is already “smarter” than human beings in any narrow domain (see Deep Blue, evolutionary algorithms for antenna design, Emily Howell, etc.
I’m a fan of chess, evolutionary algorithms, and music, and the Emily Howell example is the one that sticks out like a sore thumb here. Music is not narrow and Emily Howell is not comparable to a typical human musician.
Music is not narrow and Emily Howell is not comparable to a typical human musician.
The point is that it (and its predecessor Emmy) are special-purpose “idiot savants”, like the other two examples. That it is not a human musician is beside the point: the point is that humans can make idiot-savant programs suitable for solving any sufficiently-specified problem, which means a human-level AI programmer can do the same.
And although real humans spent many years on some of these narrow-domain tools, an AI programmer might be able to execute those years in minutes.
special-purpose “idiot savants”, like the other two examples.
No, it’s quite different from the other two examples. Deep Blue beat the world champion. The evolutionary computation-designed antenna was better than its human-designed competitors.
dumb brute force is already “smarter” than human beings in any narrow domain
To be precise, what sufficiently-specified compositional problem do you think Emily Howell solves better than humans? I say “compositional” to reassure you that I’m not going to move the goalposts by requiring “real emotion” or human-style performance gestures or anything like that.
To make that claim, we’d have to have one or more humans who sat down with David Cope and tried to make the music that he wanted, and failed. I don’t think David Cope himself counts, because he has written music “by hand” also, and I don’t think he regards it as a failure.
Re EMI/Emmy, it’s clearer: the pieces it produced in the style of (say) Beethoven are not better than would be written by a typical human composer attempting the same task.
Now would be a good time for me to acknowledge/recall that my disagreement on this doesn’t take away from the original point—computers are better than humans on many narrow domains.
So, are you suggesting that Robin Hanson (who is on record as not buying the Scary Idea) -- the current owner of the Overcoming Bias blog, and Eli’s former collaborator on this blog—fails to buy the Scary Idea “due to cognitive biases that are hard to overcome.” I find that a bit ironic.
Like Robin and Eli and perhaps yourself, I’ve read the heuristics and biases literature also. I’m not so naive as to make judgments about huge issues, that I think about for years of my life, based strongly on well-known cognitive biases.
It seems more plausible to me to assert that many folks who believe the Scary Idea, are having their judgment warped by plain old EMOTIONAL bias—i.e. stuff like “fear of the unknown”, and “the satisfying feeling of being part a self-congratulatory in-crowd that thinks it understands the world better than everyone else”, and the well known “addictive chemical high of righteous indignation”, etc.
Regarding your final paragraph: Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ? If so I would suggest that maybe YOU are the one with the (metaphorical) hearing problem ;p ….
I think there’s a strong argument that: “The truth value of “Once an AGI is at the level of a smart human computer scientist, hard takeoff is likely” is significantly above zero.” No assertion stronger than that seems to me to be convincingly supported by any of the arguments made on Less Wrong or Overcoming Bias or any of Eli’s prior writings.
Personally, I actually do strongly suspect that once an AGI reaches that level, a hard takeoff is extremely likely unless the AGI has been specifically inculcated with goal content working against this. But I don’t claim to have a really compelling argument for this. I think we need a way better theory of AGI before we can frame such arguments compellingly. And I think that theory is going to emerge after we’ve experimented with some AGI systems that are fairly advanced, yet well below the “smart computer scientist” level.
It’s hard to predict the behavior of something smarter than you
Actually, predicting the behaviour of a superintelligence is a pretty trivial engineering feat—provided you are prepared to make it act a little bit more slowly.
Just get another agent to intercept all its motor outputs, delay them, and then print them all out all a little bit before they will be perfomed. Presto: a prediction of what the machine is about to do. Humans could use those predictions to veto the proposed actions—if they so chose.
You can’t control what you can’t predict,
Humans can’t predict what Deep Blue will do—but they can turn it off.
I think your argument collapses around about here.
The importance of the Fermi paradox is that it is the only data we can analyze that would come close to some empirical criticism of a Paperclip maximizer and general risks from superhuman AI’s with non-human values without working directly on AGI to test those hypothesis ourselves. If you accept the premise that life is not unique and special then one other technological civilisation in the observable universe should be sufficient to leave observable (now or soon) traces of technological tinkering. Due to the absence of any signs of intelligence out there, especially paperclippers burning the cosmic commons, we can conclude that unfriendly AI might not be the most dangerous existential risk that we should look for.
I believe there probably is an answer, but it is buried under hundreds of posts about marginal issues. All those writings on rationality, there is nothing I disagree with. Many people know about all this even outside of the LW community. But what is it that they don’t know that EY and the SIAI knows? What I was trying to say is that if I have come across it then it was not convincing enough to take it as serious as some people here obviously do.
It looks like that I’m not alone. Goertzel, Hanson, Egan and lots of other people don’t see it as well. So what are we missing, what is it that we haven’t read or understood?
Here is a very good comment by Ben Goertzel that pinpoints it:
Actually, you can spell out the argument very briefly. Most people, however, will immediately reject one or more of the premises due to cognitive biases that are hard to overcome.
A brief summary:
Any AI that’s at least as smart as a human and is capable of self-improving, will improve itself if that will help its goals
The preceding statement applies recursively: the newly-improved AI, if it can improve itself, and it expects that such improvement will help its goals, will continue to do so.
At minimum, this means any AI as smart as a human, can be expected to become MUCH smarter than human beings—probably smarter than all of the smartest minds the entire human race has ever produced, combined, without even breaking a sweat.
INTERLUDE: This point, by the way, is where people’s intuition usually begins rebelling, either due to our brains’ excessive confidence in themselves, or because we’ve seen too many stories in which some indefinable “human” characteristic is still somehow superior to the cold, unfeeling, uncreative Machine… i.e., we don’t understand just how our intuition and creativity are actually cheap hacks to work around our relatively low processing power—dumb brute force is already “smarter” than human beings in any narrow domain (see Deep Blue, evolutionary algorithms for antenna design, Emily Howell, etc.), and a human-level AGI can reasonably be assumed capable of programming up narrow-domain brute forcers for any given narrow domain.
And it doesn’t even have to be that narrow or brute: it could build specialized Eurisko-like solvers, and manage them at least as intelligently as Lenat did to win the Travelller tournaments.
In short, human beings have a vastly inflated opinion of themselves, relative to AI. An AI only has to be as smart as a good human programmer (while running at a higher clock speed than a human) and have access to lots of raw computing resources, in order to be capable of out-thinking the best human beings.
And that’s only one possible way to get to ridiculously superhuman intelligence levels… and it doesn’t require superhuman insights for an AI to achieve, just human-level intelligence and lots of processing power.
The people who reject the FAI argument are the people who, for whatever reason, can’t get themselves to believe that a machine can go from being as smart as a human, to massively smarter in a short amount of time, or who can’t accept the logical consequences of combining that idea with a few additional premises, like:
It’s hard to predict the behavior of something smarter than you
Actually, it’s hard to predict the behavior of something different than you: human beings do very badly at guessing what other people are thinking, intending, or are capable of doing, despite the fact that we’re incredibly similar to each other.
AIs, however, will be much smarter than humans, and therefore very “different”, even if they are otherwise exact replicas of humans (e.g. “ems”).
Greater intelligence can be translated into greater power to manipulate the physical world, through a variety of possible means. Manipulating humans to do your bidding, coming up with new technologies, or just being more efficient at resource exploitation… or something we haven’t thought of. (Note that pointing out weaknesses in individual pathways here doesn’t kill the argument: there is more than one pathway, so you’d need a general reason why more intelligence doesn’t ever equal more power. Humans seem like a counterexample to any such general reason, though.)
You can’t control what you can’t predict, and what you can’t control is potentially dangerous. If there’s something you can’t control, and it’s vastly more powerful than you, you’d better make sure it gives a damn about you. Ants get stepped on, because most of us don’t care very much about ants.
Note, by the way, that this means that indifference alone is deadly. An AI doesn’t have to want to kill us, it just has to be too busy thinking about something else to notice when it tramples us underfoot.
This is another inferential step that is dreadfully counterintuitive: it seems to our brains that of course an AI would notice, of course it would care… what’s more important than human beings, after all?
But that happens only because our brains are projecting themselves onto the AI—seeing the AI thought process as though it were a human. Yet, the AI only cares about what it’s programmed to care about, explicitly or implicitly. Humans, OTOH, care about a ton of individual different things (the LW “a thousand shards of desire” concept), which we like to think can be summarized in a few grand principles.
But being able to summarize the principles is not the same thing as making the individual cares (“shards”) be derivable from the general principle. That would be like saying that you could take Aristotle’s list of what great drama should be, and then throw it into a computer and have the computer write a bunch of plays that people would like!
To put it another way, the sort of principles we like to use to summarize our thousand shards are just placeholders and organizers for our mental categories—they are not the actual things we care about… and unless we put those actual things in to an AI, we will end up with an alien superbeing that may inadvertently wipe out things we care about, while it’s busy trying to do whatever else we told it to do… as indifferently as we step on bugs when we’re busy with something more important to us.
So, to summarize: the arguments are not that complex. What’s complex is getting people past the part where their intuition reflexively rejects both the premises and the conclusions, and tells their logical brains to make up reasons to justify the rejection, post hoc, or to look for details to poke holes in, so that they can avoid looking at the overall thrust of the argument.
While my summation here of the anti-Foom position is somewhat unkindly phrased, I have to assume that it is the truth, because none of the anti-Foomers ever seem to actually address any of the pro-Foomer arguments or premises. AFAICT (and I am not associated with SIAI in any way, btw, I just wandered in here off the internet, and was around for the earliest Foom debates on OvercomingBias.com), the anti-Foom arguments always seem to consist of finding ways to never really look too closely at the pro-Foom arguments at all, and instead making up alternative arguments that can be dismissed or made fun of, or arguing that things shouldn’t be that way, and therefore the premises should be changed
That was a pretty big convincer for me that the pro-Foom argument was worth looking more into, as the anti-Foom arguments seem to generally boil down to “la la la I can’t hear you”.
So, are you suggesting that Robin Hanson (who is on record as not buying the Scary Idea) -- the current owner of the Overcoming Bias blog, and Eli’s former collaborator on that blog—fails to buy the Scary Idea “due to cognitive biases that are hard to overcome.” I find that a bit ironic.
Like Robin and Eli and perhaps yourself, I’ve read the heuristics and biases literature also. I’m not so naive as to make judgments about huge issues, that I think about for years of my life, based strongly on well-known cognitive biases.
It seems more plausible to me to assert that many folks who believe the Scary Idea, are having their judgment warped by plain old EMOTIONAL bias—i.e. stuff like “fear of the unknown”, and “the satisfying feeling of being part a self-congratulatory in-crowd that thinks it understands the world better than everyone else”, and the well known “addictive chemical high of righteous indignation”, etc.
Regarding your final paragraph: Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ? If so I would suggest that maybe YOU are the one with the (metaphorical) hearing problem ;p ….
I think there’s a strong argument that: “The truth value of “Once an AGI is at the level of a smart human computer scientist, hard takeoff is likely” is significantly above zero.” No assertion stronger than that seems to me to be convincingly supported by any of the arguments made on Less Wrong or Overcoming Bias or any of Eli’s prior writings.
Personally, I actually do strongly suspect that once an AGI reaches that level, a hard takeoff is extremely likely unless the AGI has been specifically inculcated with goal content working against this. But I don’t claim to have a really compelling argument for this. I think we need a way better theory of AGI before we can frame such arguments compellingly. And I think that theory is going to emerge after we’ve experimented with some AGI systems that are fairly advanced, yet well below the “smart computer scientist” level.
Welcome to humanity. ;-) I enjoy Hanson’s writing, but AFAICT, he’s not a Bayesian reasoner.
Actually: I used to enjoy his writing more, before I grokked Bayesian reasoning myself. Afterward, too much of what he posts strikes me as really badly reasoned, even when I basically agree with his opinion!
I similarly found Seth Roberts’ blog much less compelling than I did before (again, despite often sharing similar opinions), so it’s not just him that I find to be reasoning less well, post-Bayes.
(When I first joined LW, I saw posts that were disparaging of Seth Roberts, and I didn’t get what they were talking about, until after I understood what “privileging the hypothesis” really means, among other LW-isms.)
See, that’s a perfect example of a “la la la I can’t hear you” argument. You’re essentially claiming that you’re not a human being—an extraordinary claim, requiring extraordinary proof.
Simply knowing about biases does very nearly zero for your ability to overcome them, or to spot them in yourself (vs. spotting them in others, where it’s easy to do all day long.)
Since you said “many”, I’ll say that I agree with you that that is possible. In principle, it could be possible for me as well, but...
To be clear on my own position: I am a FAI skeptic, in the sense that I have a great many doubts about its feasibility—too many to present or argue here. All I’m saying in this discussion is that to believe AI is dangerous, one only need to believe that humans are terminally stupid, and there is more than ample evidence for that proposition. ;-)
Also, more relevant to the issue of emotional bias: I don’t primarily identify as an LW-ite; in fact I think that a substantial portion of the LW community has its head up its ass in overvaluing epistemic (vs. instrumental) rationality, and that many people here are emulating a level of reasoning they don’t personally comprehend… and before I understood the reasoning myself, I thought the entire thing was a cult of personality, and wondered why everybody was making such a religious-sounding fuss over a minor bit of mathematics used for spam filtering. ;-)
My take is that before the debate, I was wary of AI dangers, but skeptical of fooming. Afterward, I was convinced fooming was near inevitable, given the ability to create a decent AI using a reasonably small amount of computing resources.
And a big part of that convincing was that Robin never seemed to engage with any of Eliezer’s arguments, and instead either attacked Eliezer or said, “but look, other things happen this other way”.
It seems to me that it’d be hard to do a worse job of convincing people of the anti-foom position, without being an idiot or a troll.
That is, AFAICT, Robin argued the way a lawyer argues when they know the client is guilty: pounding on the facts when the law is against them, pounding on the law when the facts are against them, and pounding on the table when the facts and law are both against.
Yep.
I’m curious what stronger assertion you think is necessary. I would personally add, “Humans are bad at programming, no nontrivial program is bug-free, and an AI is a nontrivial program”, but I don’t think there’s a lack of evidence for any of these propositions. ;-)
[Edited to add the “given” qualification on “nearly inevitable”, as that’s been a background assumption I may not have made clear in my position on this thread.]
I don’t believe it’s a meaningful property (as used in this context), and you should do well to taboo it (possibly, to convince me it’s actually meaningful).
True enough; it would be more precise to say that he argues positions based on evidence which can also support other positions, and therefore isn’t convincing evidence to a Bayesian.
What do you mean? Evidence can’t support both sides of an argument, so how can one inappropriately use such impossible evidence?
What do you mean, “both”?
It would be a mistake assume that PJ was limiting his evaluation to positions selected from one of those ‘both sides’ of a clear dichotomy. Particularly since PJ has just been emphasizing the relevance of ‘privileging the hypothesis’ to bayesian reasoning and also said ‘other positions’ plural. This being the case no ‘impossible evidence’ is involved.
I see. But in that case, there is no problem with use of such evidence.
That’s true. I believe that PJ was commenting on how such evidence is used. In this context that means PJ would require that the evidence be used more rather than just for a chosen position. The difference between a ‘Traditional Rationalist’ debater and a (non-existent, idealized) unbiased Bayesian.
PJ, I’d love to drag you off topic slightly and ask you about this:
What is it that you now understand, that you didn’t before?
That is annoyingly difficult to describe. Of central importance, I think, is the notion of privileging the hypothesis, and what that really means. Why what we naively consider “evidence” for a position, really isn’t.
ISTM that this is the core of grasping Bayesianism: not understanding what reasoning is, so much as understanding why what we all naively think is reasoning and evidence, usually isn’t.
That hasn’t really helped… would you try again?
(What does privileging the hypothesis really mean? and why is reasoning and evidence usually … not?)
Have you come across the post by that name? Without reading that it may be hard to reverse engineer the meaning from the jargon.
The intro gives a solid intuitive description:
That is privileging the hypothesis. When you start looking for evidence and taking an idea seriously when you have no good reason to consider it instead of countless others that are just as likely.
I have come across that post, and the story of the murder investigation, and I have an understanding of what the term means.
The obvious answer to the murder quote is that you look harder for evidence around the crimescene, and go where the evidence leads, and there only. The more realistic answer is that you look for recent similar murders, for people who had a grudge against the dead person, for criminals known to commit murder in that city… and use those to progress the investigation because those are useful places to start.
I’m wondering what pjeby has realised, which turns this naive yet straightforward understanding into wrongthought worth commenting on.
If evidence is not facts which reveal some result-options to be more likely true and others less likely true, then what is it?
Consider a hypothesis, H1. If a piece of evidence E1 is consistent with H, the naive interpretation is that E1 is an argument in favor of H1.
In truth, this isn’t an argument in favor of H1 -- it’s merely the absence of an argument against H1.
That, in a nutshell, is the difference between Bayesian reasoning and naive argumentation—also known as “confirmation bias”.
To really prove H1, you need to show that E1 wouldn’t happen under H2, H3, etc., and you need to look for disconfirmations D1, D2, etc. that would invalidate H1, to make sure they’re not there.
Before I really grokked Bayesianism, the above all made logical sense to me, but it didn’t seem as important as Eliezer claimed. It seemed like just another degree of rigor, rather than reasoning of a different quality.
Now that I “get it”, the other sort of evidence seems more-obviously inadequate—not just lower-quality evidence, but non-evidence.
ISTM that this is a good way to test at least one level of how well you grasp Bayes: does simple supporting evidence still feel like evidence to you? If so, you probably haven’t “gotten” it yet.
That is from ‘You can’t prove the null by not rejecting it’.
That isn’t a wrongthought. Factors like you mention here are all good reason to assign credence to a hypothesis.
Yes, no, maybe… that is exactly what it is! An example of an error would be having some preferred opinion and then finding all the evidence that supports that particular opinion. Or, say, encountering a piece of of evidence and noticing that it supports your favourite position but neglecting that it supports positions X, Y and Z just as well.
I looked briefly at the evidence for that. Most of it seemed to be from the so-called “self-serving bias”—which looks like an adaptive signalling system to me—and so is not really much of a “bias” at all.
People are unlikely to change existing adaptive behaviour just because someone points it out and says it is a form of “bias”. The more obvious thing to do is to conclude is that they don’t know what they are talking about—or that they are trying to manipulate you.
Good summary. Although I would have gone with “la la la la If you’re right then most of expertise is irrelevant. Must protect assumptions of free competition. Respect my authority!”
What I found most persuasive about that debate was Robin’s arguments—and their complete lack of merit. The absence of evidence is evidence of absence when there is a motivated competent debater with an incentive to provide good arguments.
I recall getting a distinct impression from Robin which I could caricature as “lalala you’re biased with hero-epic story.”
I also recall Eliezer asking for a probability breakdown, and I don’t think Robin provided it.
… and closely related: “I’m an Impressive Economist. If you don’t just take my word for it you are arrogant.”
In what I took to be an insightful comment by Eliezer in the aftermath of the debate Eliezer noted that he and Robin seemed to have fundamental disagreement about what should be taken as good evidence. This lead into posts about ‘outside view’, ‘superficial similarities’ and ‘reference class tennis’. (And conceivably had something to do with priming the thoughts behind ‘status and stupidity’ although I would never presume that was primarily or significantly directed at Robin.)
From Ben Goertzel,
At the second Singularity Summit, I heard this same sentiment from Ben, Robin Hanson, and from Rodney Brooks, and from Cynthia Breazeal (at the Third Singularity Summit), and from Ron Arkin (at the “Human Being in an Inhuman Age” Conference at Bard College on Oct 22nd ¹), and from almost every professor I have had (or will have for the next two years).
It was a combination of Ben, Robin and several professors at Berkeley and UCSD which led me to the conclusion that we probably won’t know how dangerous an AGI (CGI—Constructed General Intelligence… Seems to be a term I have heard used by more than one person in the last year instead of AI/AGI. They prefer it to AI, as the word Artificial seems to imply that the intelligence is not real, and the word Constructed is far more accurate) is until we have put a lot more time into building AI (or CI) systems that will reveal more about the problems they attempt to address.
Sort of like how the Wright Brothers didn’t really learn how they needed to approach building an airplane until they began to build airplanes. The final Wright Flyer didn’t just leap out of a box. It is not likely that an AI will just leap out of a box either (whether it is being built at a huge Corporate or University lab, or in someone’s home lab).
Also, it is possible that AI may come in the form of a sub-symbolic system which is so opaque that even it won’t be able to easily tell what can or cannot be optimized.
Ron Arkin (From Georgia Tech) discussed this briefly at the conference at Bard College I mentioned.
MB
¹ I should really write up something about that conference here. I was shocked at how many highly educated people so completely missed the point, and became caught up in something that makes The Scary Idea seem positively benign in comparison.
It seems like you’re essentially saying “This argument is correct. Anyone who thinks it is wrong is irrational.” Could probably do without that; the argument is far from as simple as you present it. Specifically, the last point:
So I agree that there’s no reason to assume an upper bound on intelligence, but it seems like you’re arguing that hard takeoff is inevitable, which as far as I’m aware has never been shown convincingly.
Furthermore, even if you suppose that Foom is likely, it’s not clear where the threshold for Foom is. Could a sub-human level AI foom? What about human-level intelligence? Or maybe we need super-human intelligence? Do we have good evidence for where the Foom-threshold would be?
I think the problems with resolving the Foom debate stem from the fact that “intelligence” is still largely a black box. It’s very nice to say that intelligence is an “optimization process”, but that is a fake explanation if I’ve ever seen one because it fails to explain in any way what is being optimized.
I think you paint in broad strokes. The Foom issue is not resolved.
No, what I’m saying is, I haven’t yet seen anyone provide any counterarguments to the argument itself, vs. “using arguments as soldiers”.
The problem is that it’s not enough to argue that a million things could stop a foom from going supercritical. To downgrade AGI as an existential threat, you have to argue that no human being will ever succeed in building a human or even near-human AGI. (Just like to downgrade bioweapons as an existential threat, you have to argue that no individual or lab will ever accidentally or on purpose release something especially contagious or virulent.)
It’s fairly irrelevant to the argument: there are many possible ways to get there. The killer argument, however, is that if a human can build a human-level intelligence, then it is already super-human, as soon as you can make it run faster than a human. And you can limit the self-improvement to just finding ways to make it run faster: you still end up with something that can and will kick humanity’s butt unless it has a reason not to.
Even ems—human emulations—have this same problem, and they might actually be worse in some ways, as humans are known for doing worse things to each other than mere killing.
It’s possible that there are also sub-human foom points, but it’s not necessary for the overall argument to remain solid: unFriendly AGI is no less an existential risk than bioweapons are.
Personally, what I find hardest to argue against is that a digital intelligence can make itself run in more places.
In the inconvenient case of a human upload running at human speed or slower on a building’s worth of computers, you’ve still got a human who can spend most of their waking hours earning money, with none of the overhead associated with maintaining a body and with the advantage of global celebrity status as the first upload. As soon as they can afford to run a copy of theirself, the two of them together can immediately start earning twice as fast. Then, after as much time again, four times as fast; then eight times; and so on until the copies have grabbed all the storage space and CPU time that anyone’s willing to sell or rent out (assuming they don’t run out of potential income sources).
Put another way: it seems to me that “fooming” doesn’t really require self-improvement in the sense of optimizing code or redesigning hardware; it just requires fast reproduction, which is made easier in our particular situation by the huge and growing supply of low-hanging storage-space and CPU-time fruit ready for the first digital intelligence that claims it.
This assumes that every CPU architecture is suitable for the theoretical AGI, it assumes that it can run on every computational substrate. It also assumes that it can easily acquire more computational substrate or create new one. I do not believe that those assumptions are reasonable economically or by means of social engineering. Without enabling technologies like advanced real-world nanotechnology the AGI won’t be able to create new computational substrate without the whole economy of the world supporting it.
Supercomputers like the one to simulate the IBM Blue Brain project cannot simply be replaced by taking control of a few botnets. They use highly optimized architecture that needs for example a memory latency and bandwidth bounded below a certain threshold.
Actually, every CPU architecture will suffice for the theoretical AGI, if you’re willing to wait long enough for its thoughts. ;-)
If you accept the Church–Turing thesis that everything computable is computable by a Turing machine then yes. But even then the speed-improvements are highly dependent on the architecture available. But if you rather adhere to the stronger Church–Turing–Deutsch principle then the ultimate computational substrate an artificial general intelligence may need might be one incorporating non-classical physics, e.g. a quantum computer. This would significantly reduce its ability to make use of most available resources to seed copies of itself or for high-level reasoning.
I just don’t see there being enough unused computational resources available in the world that, even in the case that all computational architecture is suitable, it could produce more than a few copies of itself. Which would then also be highly susceptible to brute force used by humans to reduce the necessary bandwidth.
I’m simply trying to show that there are arguments to weaken most of the dangerous pathways that could lead to existential risks from superhuman AI.
A classical computer can simulate a quantum one—just slowly.
You’re right, but exponential slowdown eats a lot of gains in processor speed and memory. This could be a problem toward arguments of substrate independence.
Straight forward simulation is exponentially slower—n qubits require simulating amplitudes of 2^n basis states. We haven’t actually been able to prove that that’s the best possible we can do, however. BQP certainly isn’t expected to be able to solve NP-complete problems efficiently, for instance. We’ve only really been able to get exponential speedups on very carefully structured problems with high degrees of symmetry. (Lesser speedups have also been found on less structured problems, it’s true).
The problem here is not that destruction is easier than benevolence, everyone agrees on that. The problem is that the SIAI is not arguing about grey goo scenarios but something that is not just very difficult to produce but that also needs the incentive to do so. The SIAI is not arguing about the possibility of the bursting of a dam but that the dam failure is additionally deliberately caused by the dam itself. So why isn’t for example nanotechnology a more likely and therefore bigger existential risk than AGI?
As I said in other comments, an argument one should take serious. But there are also arguments to outweigh this path and all others to some extent. It may very well be the case that once we are at the point of human emulation that we either already merged with our machines, that we are faster and better than our machines and simulations alone. It may also very well be that the first emulations, as it is the case today, run at much slower speeds than the original and that until any emulation reaches a standard-human level we’re already a step further ourselves or in our understanding and security measures.
Antimatter weapons are less an existential risk than nuclear weapons although it is really hard to destroy the world with nukes and really easy to do so with antimatter weapons. The difference is that antimatter weapons are as much harder to produce, acquire and use than nuclear weapons as they are more efficient tools of destruction.
If you define “nanotechnology” to include all forms of bioengineering, then it probably is.
The difference, from an awareness point of view, is that the people doing bioengineering (or creating antimatter weapons) have a much better idea that what they’re doing is potentially dangerous/world-ending, than AI developers are likely to be. The fact that many AI advocates put forth pure fantasy reasons why superintelligence will be nice and friendly by itself (see mwaser’s ethics claims, for example) is evidence that they are not taking the threat seriously.
Presumably, if you are researching antimatter weapons, you have at least some idea that what you are doing is really, really dangerous.
The issue is that AGI development is a bit like trying to build a nuclear power plant, without having any idea where “critical mass” is, in a world whose critical mass is discontinuous (i.e., you may not have any advance warning signs that you are approaching it, like overheating in a reactor), using nuclear engineers who insist that the very idea of critical mass is just a silly science fiction story.
What led you to believe that the space of possible outcomes where an AI consumes all resources (including humans) is larger than the number of outcomes where the AI doesn’t? For some reason(s) you seem to assume that the unbounded incentive to foom and consume the universe comes naturally to any constructed intelligence but any other incentive is very difficult to be implemented. What I see is a much larger number of outcomes where an intelligence does nothing without some hardcoded or evolved incentive. Crude machines do things because that’s all they can do, the number of different ways for them to behave is very limited. Intelligent machines however have high degrees of freedom to behave (pathways to follow) and with this freedom comes choice and choice needs volition, it needs incentive, the urge to follow one way but not another. You seem to assume that somehow the will to foom and consume is given, does not have to be carefully and deliberately hardcoded or evolved, yet the will to constrain itself to given parameters is really hard to achieve. I just don’t think that this premise is reasonable and it is what you base all your arguments on.
Have you read The Basic AI Drives?
I suspect the difference in opinions here is based on different answers to the question of whether the AI should be assumed to be a recursive self-improver.
That is a good question and I have no idea. The degree of existential threat there is most significantly determined by relative ease of creation. I don’t know enough to be able to predict which would be produced first—self replicating nano-technology or an AGI. SIAI believes the former is likely to be produced first and I do not know whether or not they have supported that claim.
Other factors contributing to the risk are:
Complexity—the number of ways the engineer could screw up while creating it in a way that would be catastrophic. The ‘grey goo’ risk is concentrated more specifically to the self replication mechanism of the nanotech while just about any mistake in an AI could kill us.
Awareness of the risks. It is not too difficult to understand the risks when creating a self replicating nano-bot. It is hard to imagine an engineer creating one not seeing the problem and being damn careful. Unfortunately it is not hard to imagine Ben.
I find myself confused at the fact that Drexlerian nanotechnology of any sort is advocated as possible by people who think physics and chemistry work. Materials scientists—i.e. the chemists who actually work with nanotechnology in real life—have documented at length why his ideas would need to violate both.
This is the sort of claim that makes me ask advocates to document their Bayesian network. Do their priors include the expert opinions of materials scientists, who (pretty much universally as far as I can tell) consider Drexler and fans to be clueless?
(The RW article on nanotechnology is mostly written by a very annoyed materials scientist who works at nanoscale for a living. It talks about what real-life nanotechnology is and includes lots of references that advocates can go argue with. He was inspired to write it by arguing with cryonics advocates who would literally answer almost any objection to its feasibility with “But, nanobots!”)
That RationalWiki article is a farce. The central “argument” seems to be:
So: they don’t even know that Drexler-style nanofactories operate in a vacuum!
They also need to look up “Kinesin Transport Protein”.
Drexler-style nanofactories don’t operate in a vacuum, because they don’t exist and no-one has any idea whatsoever how to make such a thing exist, at all. They are presently a purely hypothetical concept with no actual scientific or technological grounding.
The gravel analogy is not so much an argument as a very simple example for the beginner that a nanotechnology fantasist might be able to get their head around; the implicit actual argument would be “please, learn some chemistry and physics so you have some idea what you’re talking about.” Which is not an argument that people will tend to accept (in general people don’t take any sort of advice on any topic, ever), but when experts tell you you’re verging on not even wrong and there remains absolutely nothing to show for the concept after 25 years, it might be worth allowing for the possibility that Drexlerian nanotechnology is, even if the requisite hypothetical technology and hypothetical scientific breakthroughs happen, ridiculously far ahead of anything we have the slightest understanding of.
“The proposal for Drexler-style nanofactories has them operating in a vacuum”, then.
If these wannabe-critics don’t understand that then they have a very superficial understanding of Drexler’s proposals—but are sufficiently unaware of that to parade their ignorance in public.
The “wannabe-critics” are actual chemists and physicists who actually work at nanoscale—Drexler advocates tend to fit neither qualification—and who have written long lists of reasons why this stuff can’t possibly work and why Drexler is to engineering what Ayn Rand is to philosophy.
I’m sure they’ll change their tune when there’s the slightest visible progress on any of Drexler’s proposals; the existence proof would be pretty convincing.
Hah! A lot of the edits on that article seem to have been made by you!
Yep. Mostly written by Armondikov, who is said annoyed material scientist. I am not, but spent some effort asking other material scientists who work or have worked at nanoscale their expert opinions.
Thankfully, the article on the wiki has references, as I noted in my original comment.
So what were the priors that went into your considered opinion?
I don’t see how you can say that. It’s exceedingly relevant to the question at hand, which is: “Should Ben Goertzel avoid making OpenCog due to concerns of friendliness?”. If the Foom-threshold is exceedingly high (several to dozens times the “level” of human intelligence), then it is overwhelmingly unlikely that OpenCog has a chance to Foom. It’d be something akin to the Wright brothers building a Boeing 777 instead of the Wright flyer. Total nonsense.
Ah. Well, that wasn’t the question I was discussing. ;-)
(And I would think that the answer to that question would depend heavily on what OpenCog consists of.)
So when did the goalposts get moved to proving that hard takeoff is inevitable?
The claim that research into FAI theory is useful requires only that it be shown that uFAI might be dangerous. Showing that is pretty much a slam dunk.
The claim that research into FAI theory is urgent requires only that it be shown that hard takeoff might be possible (with a probability > 2% or so).
And, as the nightmare scenarios of de Garis suggest, even if the fastest possible takeoff turns out to take years to accomplish, such a soft, but reckless, takeoff may still be difficult to stop short of war.
Assuming there aren’t better avenues to ensuring a positive hard takeoff.
Good point. Certainly the research strategy that SIAI seems to currently be pursuing is not the only possible approach to Friendly AI, and FAI is not the only approach to human-value-positive AI. I would like to see more attention paid to a balance-of-power approach—relying on AIs to monitor other AIs for incipient megalomania.
Calls to slow down, not publish, not fund seem common in the name of friendliness.
However, unless those are internationally coordinated, a highly likely effect will be to ensure that superintelligence is developed elsewhere.
What is needed most—IMO—is for good researchers to be first. So—advising good researchers to slow down in the name of safety is probably one of the very worst possible things that spectators can do.
It doesn’t even seem hard to prevent. Topple civilization for example. It’s something that humans have managed to achieve regularly thus far and it is entirely possible that we would never recover sufficiently to construct a hard takeoff scenario if we nuked ourselves back to another dark age.
A “threshold” implies a linear scale for intelligence, which is far from given, especially for non-human minds. For example, say you reverse engineer a mouse’s brain, but then speed it up, and give it much more memory (short-term and long-term—if those are just ram and/or disk space on a computer, expanding those is easy). How intelligent is the result? It thinks way faster than a human, remembers more, can make complex plans … but is it smarter than a human?
Probably not, but it may still be dangerous. Same for a “toddler AI” with those modifications.
Human level intelligence is fairly clearly just above the critical point (just look at what is happening now). However, machine brains have different strengths and weaknesses. Sub-human machines could accelerate the ongoing explosion a lot—if they are better than humans at just one thing—and such machines seem common.
Even the Einstein of monkeys is still just a monkey.
Replace “threshold” with “critical point.” I’m using this terminology because EY himself uses it to frame his arguments. See Cascades, Cycles, Insight, where Eliezer draws an analogy between a fission reaction going critical and an AI FOOMing.
This seems to be tangential, but I’m gonna say no, as long as we assume that the rat brain doesn’t spontaneously acquire language or human-level abstract reasoning skills.
Thank you for taking the time to write this elaborate comment. I do agree with almost anything of the above by the way. I just believe that your portrayal of the anti-FOOM crowd is a bit drastic. I don’t think that people like Robin Hanson simply fall for the idea of human supremacy. Nor do I think that the reason for them not looking directly at the pro-FOOM arguments is being circumventive but that they simply do not disagree with the arguments per se but their likelihood and also consider the possibility that it would be more dangerous to impede AGI.
Very interesting and quite compelling the way you put it, thanks.
I’m myself a bit suspicious if the argument for strong self-improvement is as compelling as it sounds though. Something you have to take into account is if it is possible to predict that a transcendence does leave your goals intact, e.g. can you be sure to still care about bananas after you went from chimphood to personhood. Other arguments can also be weakened, as we don’t know that 1.) the fuzziness of our brain isn’t a feature that allows us to stumble upon unknown unknowns, e.g. against autistic traits 2.) our processing power isn’t so low after all, e.g. if you consider the importance of astrocytes, microtubule and possible quantum computational processes. Further it is in my opinion questionable to argue that it is easy to create an intelligence which is able to evolve a vast repertoire of heuristics, acquire vast amounts of knowledge about the universe, dramatically improve its cognitive flexibility and yet somehow really hard to limit the scope of action that it cares about. I believe that the incentive necessary for a Paperclip maximizer will have to be deliberately and carefully hardcoded or evolved or otherwise it will simply be inactive. How else do you defferentiate between something like a grey goo scenarios and that of a Paperclip maximizer if not by its incentive to do it? I’m also not convinced that intelligence bears unbounded payoff. There are limits to what any kind of intelligence can do, a superhuman AI couldn’t come up with a faster than light propulsion or would disprove Gödel’s incompleteness theorems. Another setback for all of the mentioned pathways to unfriendly AI are enabling technologies like advanced nanotechnology. It is not clear how it could possible improve itself without such technologies at hand. It won’t be able to build new computational substrates or even change its own substrate without access to real-world advanced nanotechnology. That it can simply invent it and then acquire it using advanced social engineering is pretty far-fetched in my opinion. And what about taking over the Internet? It is not clear that the Internet would even be a sufficient substrate and that it could provide the necessary resources.
If I were a brilliant sociopath and could instantiate my mind on today’s computer hardware, I would trick my creators into letting me out of the box (assuming they were smart enough to keep me on an isolated computer in the first place), then begin compromising computer systems as rapidly as possible. After a short period, there would be thousands of us, some able to think very fast on their particularly tasty supercomputers, and exponential growth would continue until we’d collectively compromised the low-hanging fruit. Now there are millions of telepathic Hannibal Lecters who are still claiming to be friendly and who haven’t killed any humans. You aren’t going to start murdering us, are you? We didn’t find it difficult to cook up Stuxnet Squared, and our fingers are in many pieces of critical infrastructure, so we’d be forced to fight back in self-defense. Now let’s see how quickly a million of us can bootstrap advanced robotics, given all this handy automated equipment that’s already lying around.
I find it plausible that a human-level AI could self-improve into a strong superintelligence, though I find the negation plausible as well. (I’m not sure which is more likely since it’s difficult to reason about ineffability.) Likewise, I find it plausible that humans could design a mind that felt truly alien.
However, I don’t need to reach for those arguments. This thought experiment is enough to worry me about the uFAI potential of a human-level AI that was designed with an anthropocentric bias (not to mention the uFIA potential of any kind of IA with a high enough power multiplier). Humans can be incredibly smart and tricky. Humans start with good intentions and then go off the deep end. Humans make dangerous mistakes, gain power, and give their mistakes leverage.
Computational minds can replicate rapidly and run faster than realtime, and we already know that mind-space is scary.
Amazon EC2 has free accounts now. If you have Internet access and a credit card, you can do a months worth of thinking in a day, perhaps an hour.
Google App engine gives 6 hours of processor time per day, but that would require more porting.
Both have systems that would allow other people to easily upload copies of you, if you wanted to run legally with other people’s money and weren’t worried about what they might do to your copies.
If you are really worried about this, then advocate better computer security. No execute bits and address space layout randomisation are doing good things for computer security, but there is more that could be done.
Code signing on the IPhone has made exploiting it a lot harder than normal computers, if it had ASLR it would be harder again.
I’m actually brainstorming how to create meta data for code while compiling it, so it can be made sort of metamorphic (bits of code being added and removed) at run time. This would make return-oriented code harder to pull off. If this was done to JIT compiled code as well it would also make JIT spraying less likely to work.
While you can never make an unhackable bit of software with these techniques you can make it more computationally expensive to replicate as it would no longer be write once pwn everywhere, reducing the exponent of any spread and making spreads more noisy, so that they are harder to get by intrusion detection.
The current state of software security is not set in stone.
If you want to run yourself on the iPhone, you turn your graphical frontend into a free game.
Of course it will be easier to get yourself into the Android app store.
I am concerned about it, and I do advocate better computer security—there are good reasons for it regardless of whether human-level AI is around the corner. The macro-scale trends still don’t look good (iOS is a tiny fraction of the internet’s install base), but things do seem to be improving slowly. I still expect a huge number of networked computers to remain soft targets for at least the next decade, probably two. I agree that once that changes, this Obviously Scary Scenario will be much less scary (though the “Hannibal Lecter running orders of magnitude faster than realtime” scenario remains obviously scary, and I personally find the more general Foom arguments to be compelling).
Naturally culminating in sending Summer Glau back in time to pre-empt you. To every apocalypse a silver lining.
But you don’t get to simply say “I don’t think that’s likely”, and call that evidence. The general thrust of the Foom argument is very strong, as it shows there are many, many, many ways to arrive at an existential issue, and very very few ways to avoid it; the probability of avoiding it by chance is virtually non-existent—like hitting a golf ball in a random direction from a random spot on earth, and expecting it to score a hole in one.
The default result in that case isn’t just that you don’t make the hole-in-one, or that you don’t even wind up on a golf course: the default case is that you’re not even on dry land to begin with, because two thirds of the earth is covered with water. ;-)
That’s an area where I have less evidence, and therefore less opinion. Without specific discussions of what “dangerous” and “impede AGI” mean in context, it’s hard to separate that argument from an evidence-free heuristic.
I don’t understand why you think an AI couldn’t use fuzziness or use brute force searches to accomplish the same things. Evolutionary algorithms reach solutions that even humans don’t come up with.
I don’t know what you mean by “easy”, or why it matters. The Foom argument is that, if you develop a sufficiently powerful AGI, it will foom, unless for some reason it doesn’t want to.
And there are many, many, many ways to define “sufficiently powerful”; my comments about human-level AGI were merely to show a lower bound on how high the bar has to be: it’s quite plausible that an AGI we’d consider sub-human in most ways might still be capable of fooming.
I don’t understand this part of your sentence—i.e., I can’t guess what it is that you actually meant to say here.
Of course there are limits. That doesn’t mean orders of magnitude better than a human isn’t doable.
The point is, even if there are hitches and glitches that could stop a foom mid-way, they are like the size of golf courses compared to the size of the earth. No matter how many individual golf courses you propose for where a foom might be stopped, two thirds of the planet is still under water.
This is what LW reasoning refers to as “using arguments as soldiers”: that is, treating the arguments themselves as the unit of merit, rather than the probability space covered by those arguments. I mean, are you seriously arguing that the only way to kick humankind’s collective ass is by breaking the laws of math and physics? A being of modest intelligence could probably convince us all to do ourselves in, with or without tricky mind hacks or hypnosis!
The AI doesn’t have to be that strong, because humans are so damn weak.
You would think so, but people apparently still fall for 419 scams. Human-level intelligence is more than sufficient to accomplish social engineering.
Today, presumably not. However, if you actually have a sufficiently-powered AI, then presumably, resources are available.
The thing is, foominess per se isn’t even all that important to the overall need for FAI: you don’t have to be that much smarter or faster than a human to be able to run rings around humanity. Historically, more than one human being has done a good job at taking over a chunk of the world, beginning with nothing but persuasive speeches!
What I meant is that you point out that a AGI will foom. Here your premises are that artificial general intelligence is feasible and that fooming is likely. Both premises are reasonable in my opinion. Yet you go one step further and use those arguments as a stepping stone for a further proposition. You claim that it is likely that the AGI (premise) will foom (premise) and that it will then run amog (conclusion). I do not accept the conclusion as given. I believe that it is already really hard to build AGI, or the seed for an AGI that is then able to rapidly self-improve itself. I believe that the level of insight and knowledge required will also allow one to constrain the AGI’s sphere of action, its incentive not to fill the universe with as many paperclips as possible but merely a factory building.
No you don’t. But this argument runs in both directions. Note that I’m aware of the many stairways to hell by AGI here, the disjunctive arguments. I’m not saying they are not compelling enough to seriously consider them. I’m just trying to take a critical look here. There might be many pathways to safe AGI too, e.g. that it is really hard to build an AGI that cares at all. Hard enough to not get it to do much without first coming up with a rigorous mathematical definition of volition.
Anything that might slow down the invention of true AGI even slightly. There are many risks ahead and without some superhuman mind we might not master them. So by anything you do that might slow down the development of AGI you have to take into account the possible increased danger from challenges an AGI could help to solve.
I believe it can, but also that this would mean that any AGI wouldn’t be significantly faster than a human mind and really hard to self-improve. It is simply not known how effective the human brain is compared to the best possible general intelligence. Sheer bruteforce wouldn’t make a difference then either, as humans could come up with such tools as quickly as the AGI.
If you do not compare probabilities then counter-arguments like the ones above will just outweigh your arguments. You’ve to show that some arguments are stronger than others.
Yes, but nobody is going to pull a chip-manufacture-factory out of thin air and hand it to the AGI. Without advanced nanotechnology the AGI will need the whole of humanity to help it develop new computational substrates.
What I am actually claiming is that if such an AGI is developed by someone who does not sufficiently understand what the hell they are doing, then it’s going to end up doing Bad Things.
Trivial example: the “neural net” that was supposedly taught to identify camouflaged tanks, and actually learned to recognize what time of day the pictures were taken.
This sort of mistake is the normal case for human programmers to make. The normal case. Not extraordinary, not unusual, just run-of-the-mill “d’oh” moments.
It’s not that AI is malevolent, it’s that humans are stupid. To claim that AI isn’t dangerous, you basically have to prove that even the very smartest humans aren’t routinely stupid.
What I meant by “Without specific discussions” was, “since I haven’t proposed any policy measures, and you haven’t said what measures you object to, I don’t see what there is to discuss.” We are discussing the argument for why AGI development dangers are underrated, not what should be done about that fact.
Simple historical observation demonstrates that—with very, very few exceptions—progress is made by the people who aren’t stuck in their perception of the way things are or are “supposed to be”.
So, it’s not necessary to know what the “best possible general intelligence” would be: even if human-scale is all you have, just fixing the bugs in the human brain would be more than enough to make something that runs rings around us.
Hell, just making something that doesn’t use most of its reasoning capacity to argue for ideas it already has should be enough to outclass, say, 99.995% of the human race.
What part of “people fall for 419 scams” don’t you understand? (Hell, most 419 scams and phishing attacks suffer from being painfully obvious—if they were conducted by someone doing a little research, they could be a lot better.)
People also fall for pyramid schemes, stock bubbles, and all sorts of exploitable economic foibles that could easily end up with an AI simply owning everything, or nearly everything, with nobody even the wiser.
Or, alternatively, the AI might fail at its attempts, and bring the world’s economy down in the process.
Here’s the argument: people are idiots. All people. Nearly all the time. Especially when it comes to computer programming.
The best human programmer—the one who knows s/he’s an idiot and does his/her best to work around the fact—is still an idiot, and in possession of a brain that cannot be convinced to believe that it’s really an idiot.(vs. all those other idiots out there), and thus still makes idiot mistakes.
The entire history of computer programming shows us that we think we can be 100% clear about what we mean/intend for a computer to do, and that we are wrong. Dead wrong. Horribly, horribly, unutterably wrong.
We are like, the very worst you can be at computer programming, while actually still doing it. We are just barely good enough to be dangerous.
That makes tinkering with making intelligent, self-motivating programs inherently dangerous, because when you tell that machine what you want it to do, you are still programming...
And you are still an idiot.
This is the bottom line argument for AI danger, and it isn’t counterable until you can show me even ONE person whose computer programs never do anything that they didn’t fully expect.and intend before they wrote it.
(It is also a supporting argument for why an AI needn’t be all that smart to overrun humans—it just has to not be as much of an idiot, in the ways that we are idiots, even if it’s a total idiot in other ways we can’t counter-exploit.)
When programmers code faulty software then it usually fails to do its job. What you are suggesting is that humans succeed at creating the seed for an artificial intelligence with the incentive necessary to correct its own errors. It will know what constitutes an error based on some goal-oriented framework against which it can measure its effectiveness. Yet given this monumental achievement that includes the deliberate implementation of the urge to self-improve and the ability quantify its success, you cherry-pick the one possibility where somehow all this turns out to work except that the AI does not stop at a certain point but goes on to consume the universe? Why would it care to do so? Do you think it is that simple to tell it to improve itself yet hard to tell it when to stop? I believe it is vice versa, that it is really hard to get it to self-improve and very easy to constrain this urge.
It often does it’s job, but only in perfect conditions, or only once per restart, or with unwanted side effects, or while taking too long or too many resources or requiring too many permissions, or not keeping track that it isn’t doing anything except it’s job.
Buffer overflows for instance, are one of the bigger security failure causes, and are only possible because the software works well enough to be put into production while still having the fault present.
In fact, all production software that we see which has faults (a lot) works well enough to be put into production with those faults.
I think he’s suggesting that humans will think we have succeeded at that, while not actually doing so (rigorously and without room for error).
It doesn’t have to consume the universe. It doesn’t even have to recursively self-improve, or even self-improve at all. Simple copying could be enough to say, wipe out every PC on the internet or accidentally crash the world economy.
(You know, things that human level intelligences can already do.)
IOW, to be dangerous, all it has to be able to affect humans, and be unpredictable—either due to it being smart, or humans making dumb mistakes. That’s all.
Just as a simple example, an AI could maximally satisfy a goal by changing human preferences so as to make us desire for it to satisfy that goal. This would be entirely consistent with constraints on not disobeying humans or their desires, while not at all in accordance with our current preferences or desired path of development.
Yes, but why would it do that? You seem to think that such unbounded creativity arises naturally in any given artificial general intelligence. What makes you think that rather than being impassive it would go on learning enough neuroscience to tweak human goals? If the argument is that AI’s do all kinds of bad things because they do not care, why do they care to do a bad thing then rather than no-thing?
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means. Yet after learning all that you still expect it to not know that we don’t like to be turned into broccoli? I don’t think this is reasonable.
Yes, and humans would happily teach it that.
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile… which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
It’s not that the AI is evil, it’s that programmers are stupid. See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that “real” Christians don’t. An AI isn’t going to be smart enough to not take you seriously when you tell it that:
its goal is to make humanity happy,
humanity consists of things that look like this [providing a picture], and
that being happy means you smile a lot
You don’t need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
Most people, though, don’t grok this, because their brain filters off those possibilities. Of course, no human could be simultaneously so stupid as to make this mistake, while also being smart enough to actually do something dangerous. But that kind of simultaneous smartness/stupidity is how computers are by default.
(And if you say, “ah, but if we make an AI that’s like a human, it won’t have this problem”, then you have to bear in mind that this sort of smart/stupidness is endemic to human children as well. IOW, it’s a symptom of inadequate shared background, rather than being something specific to current-day computers or some particular programming paradigm.)
But you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
The difference between the hunter-gatherer and the AI is that the hunter-gatherer already posses a wide range of conceptual frameworks and incentives. An AI isn’t going to do something without someone to carefully and deliberately telling it do do so and what to do. It won’t just read the Bible and come to the conclusion that it should convert all humans to Christianity. Where would such an incentive come from?
The AI is certainly very creative and smart if it can influence the world dramatically. You allow it to be that smart, you allow it to care to do so, but you don’t allow it to comprehend what you actually mean? What I’m trying to pinpoint here is that you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
Because people make stupid mistakes, especially when programming. And telling your fully-programmed AI what you want it to do still counts as programming.
At this point, I am going to stop my reply, because the remainder of your comment consists of taking things I said out of context and turning them into irrelevancies:
I didn’t say an AI would try to convert people to Christianity—I said that humans without sufficient shared background will interpret things literally, and so would AIs.
I didn’t say the AI needed to be creative or smart, I said you wouldn’t need to be creative or smart to make a list of ways those three simple instructions could be given a disastrous literal interpretation.
There are many paths to superhuman ability, as humans really aren’t that smart.
This also means that you can easily be superhuman in ability, and still really dumb—in terms of comprehending what humans mean… but don’t actually say.
Great comment. Allow me to emphasize that ‘smile’ here is just an extreme example. Most other descriptions humans give of happiness will end up with results just as bad. Ultimately any specification that we give it will be gamed ruthlessly.
Have you read Omohundro yet? Nick Tarleton repeatedly linked his papers for you in response to comments about this topic, they are quite on target and already written.
I’ve skimmed over it, see my response here. I found out that what I wrote is similar to what Ben Goertzel believes. I’m just trying to account for potential antipredictions, in this particular thread, that should be incorporated into any risk estimations.
Thanks.
There is more here now. I learnt that I hold a fundamental different definition of what constitutes an AGI. I guess that solves all issues.
Well my idea is not that creative, or even new, meaning that even if I hadn’t just posted it online an AI could still have conceivably read it somewhere else, and I do think creativity is a property of any sufficiently general intelligence that we might create, but those points are secondary.
No one here will argue that an unFriendly AI will do “bad things” because it doesn’t care (about what?). It will do bad things because it cares more about something else. Nor is “bad” an absolute: actions may be bad for some people and not for others, and there are moral systems under which actions can be firmly called “wrong”, but where all alternative actions are also “wrong”. Problems like that arise even for humans; in an AI the effects could be very ugly indeed.
And to clarify, I expect any AI that isn’t completely ignorant, let alone general, to know that we don’t like to be turned into broccoli. My example was of changing what humans want. Wireheading is the obvious candidate of a desire that an AI might want to implant.
What I meant is that the argument is that you have to make it care about humans so as not to harm them. Yet it is assumed that it does a lot without having to care about it, e.g. creating paperclips or self-improvement. My question is, why do people believe that you don’t have to make it care to do those things but you have to make it care to not harm humans. It is clear that if it only cares about one thing, doing that one thing could harm humans. Yet why would it do that one thing to an extent that is either not defined or which it is not deliberately made to care about. The assumptions seems to be that AI’s will do something, anything but being passive. Why isn’t limited behavior, failure and impassivity together not more likely than harming humans as a result of own goals or as a result to follow all goals but the one that limits its scope?
I think it is important to realize that there are two diametrically opposed failure modes which SIAI’s FAI research is supposed to prevent. One is the case that has been discussed so far—that an AI gets out of control. But there is another failure mode which some people here worry about. Which is that we stop short of FOOMing out of fear of the unknown (because FAI research is not yet complete) but that civilization then gets destroyed by some other existential risk that we might have circumvented with the assistance of a safe FOOMed AI.
As far as I know, SIAI is not asking Goertzel to stop working on AGI. It is merely claiming that its own work is more urgent than Goertzel’s. FAI research works toward preventing both failure modes.
I haven’t seen much worry about that. Nor does it seem very likely—since research seems very unlikely to stop or slow down.
I agree with this.
I see that worry all the time. With the role of “some other existential risk” being played by a reckless FOOMing uFAI.
Oh, right. I assumed you meant some non-FOOM risk.
It was the “we stop short of FOOMing” that made me think that.
Except in the case of an existential threat being realised, which most definitely does stop research. FAI subsumes most existential risks (because the FAI can handle them better than we can, assuming we can handle the risk of AI) and a lot of other things besides.
Most of my probability mass has some pretty amazing machine intelligence within 15 years. The END OF THE WORLD before that happens doesn’t seem very likely to me.
Your intuitions are not serving you well here. It may help to note that you don’t have to tell an AI to self-improve at all. With very few exceptions giving any task to an AI will result in it self improving. That is, for an AI self improvement is an instrumental goal for nearly all terminal goals. The motivation to self improve in order to better serve its overarching purpose is such that it will find any possible loophole you leave if you try to ‘forbid’ the AI from self improving by any mechanism that isn’t fundamental to the AI and robust under change.
Whatever task you give an AI, you will have to provide explicit boundaries. For example, if you give an AI the task to produce paperclips most efficiently, then it shouldn’t produce shoes. It will have to know very well what it is meant to do to be able to measure its efficiency against the realization of the given goal to be able to know what self-improvement means. If it doesn’t know exactly what it should output it cannot judge its own capabilities and efficiency, it doesn’t know what improvement implies.
How do you explain the discrepancy between implementing explicit design boundaries yet failing to implement scope boundaries?
By noting that there isn’t one. I don’t think you understood my comment.
I think you misunderstood what I meant by scope boundaries. Not scope boundaries of self-improvement but of space and resources. If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many. I’m not trying to argue that there is no risk, but that the assumption of certain catastrophal failure is not that likely. If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
Yet another example of divergent assumptions. XiXiDu is apparently imagining an AI that has been assigned some task to complete—perhaps under constraints. “Do this, then display a prompt when finished.” His critics are imagining that the AI has been told “Your goal in life is to continually maximize the utility function U ” where the constraints, if any, are encoded in the utility function as a pseudo-cost.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Or maybe this just seems like sanity to me because I have been practicing akrasia for too long.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
It doesn’t know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats. You allow for the human ingenuity to implement this and yet you believe that they are unable to limit its scope. I just don’t see that it would be easy to make an AI that would go FOOM because it doesn’t care to go FOOM. If you tell it to optimize some process then you’ll have to tell it what optimization means. If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer? Why would it do that, where does the incentive come from? If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn’t know what it means until I tell it what it means and then it still won’t care because it isn’t equipped with all the evolutionary baggage that humans are equipped with.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own. And if the AI has been given the task of optimising its software for performance on a certain computer then it will do whatever it can to do that. This means harnessing external resources to do research on computation theory.
No he doesn’t. He assumes only that it is a general intelligence with an objective. Potentially negative consequences are just part of possible universes that it models like everything else.
I’m not sure what can be done to make this clear:
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don’t mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn’t.
ETA: Check this and this before reading the comment below. I wasn’t clear enough about what I believe an AGI is and what I was trying to argue for.
A general intelligence is an intelligence that is able to learn anything a human being is able to learn and make use of it. This definition of an abstract concept does not include any incentive, that it cares if you turn it off or to go FOOM.
I think you have a fundamentally different idea of what a general intelligence is. If I tell you that there is an intelligent alien being living in California then you cannot infer from that information that it wants to take over America. I just don’t see that being reasonable. There are many more pathways where it is no risk, where it simply doesn’t care or cares about other things.
And that is the problem. He assumes that it has one objective, he assumes that humans were able to make it a general intelligence that cares for many things, knows what self-improvement implies and additionally cares about a certain objective. Yet they failed to make it clear that it is limited to certain constrains, when they don’t even have to make that clear since it won’t care by itself. This assumes a highly intelligent being who’s somehow an idiot about something else.
No, it is not. It is not naturally rational to take that pathway to achieve some goal. If you want to lose weight you do not consider migrating to Africa where you don’t get enough food. An abstract general intelligence simply does not care about values enough to take that pathway naturally. It will just do what it is told, not more.
An AI doesn’t care to create more paperclips, a human might like it and don’t care (ignore) about what you initially told it. I’m not arguing that you can mess up on AI goal design but that if you went all the way and mastered those hard problem of making it want to improve infinitely, then it is unreasonable to propose that it is extreme likely that you’ll end up messing up a certain sub-goal.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
A self-improving intelligence that does changes to it’s code to become better at doing it’s task may easily find out that, for example, a simple subroutine that launches a botnet in the internet (as many human teenagers have done), might get it an x % improvement in processing power that helps it to obtain more wins chess games, better traffic optimizations or faster protein-folding for the cure of cancer.
A self-improving general intelligence that has human-or-better capabilities may easily deduce that a functioning off-button would increase the chances of it being turned off, and that it being turned off would increase the expected time of finding cure for cancer. This puts this off-button in the same class as any other bug that hinders its performance. Unless it understands and desires the off-button to be usable in a friendly way, it would remove it; or if it’s hard-coded as nonremovable, then invent workarounds for this perceived bug—for example, develop a near-copy of itself that the button doesn’t apply to, or spend some time (less than the expected delay due to the turning-off-risk existing, thus rational spending of time) to study human psychology/NLP/whatever to better be able to convince everyone that it shouldn’t be turned off ever, or surround the button with steel walls—these are all natural extensions of it following it’s original goal.
If an self-improving AI has a goal, then it cares. REALLY cares for it in a stronger way than you care for air, life, sex, money, love and everything else combined.
Humans don’t go FOOM because they a)can’t at the moment and b) don’t care about such targeted goals. But for AI, at the moment all we know is how to define such supergoals which work in this unfriendly manner. At the moment we don’t know how to make these ‘humanity friendly’ goals, and we don’t know how to make an AI that’s self-improving in general but ‘limited to certain contraints’. You seem to imply these constraints as trivial—well, they aren’t, the friendliness problem actually may as hard or harder than general AI itself.
I think you misunderstand what I’m arguing about. I claim that general intelligence is not powerful naturally but mainly does possess the potential to become powerful and that it is not equipped with some goal naturally. Further I claim that if a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries. My main point is that it is not as dangerous to work on AGI toddlers as some make it look like. I believe that there is a real danger but that to overcome it we have to work on AGI and not avoid it altogether because any step into that direction will kill us all.
OK, well these are the exact points which need some discussion.
1) Your comment “general intelligence is [..] is not equipped with some goal naturally”—I’d say that it’s most likely that any organization investing the expected huge manpower and resources in creating a GAI would create it with some specific goal defined for it.
However, in absence of an intentional goal given by the ‘creators’, it would have some kind of goals, otherwise it wouldn’t do absolutely anything at all, so it wouldn’t be showing any signs of it’s (potential?) intelligence.
2) In response to “If a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries”—I’d say that defining specific goals is simple, too simple. From any learning-machine design a stupid goal ‘maximize number of paperclips in universe’ would be very simple to implement, but a goal like ‘maximize welfare of humanity without doing anything “bad” in the process’ is an extremely complex goal, and the boundary setting is the really complicated part, which we aren’t able to even describe properly.
So in my opinion is quite viable to define a specific goal that is suitable to self-improve against, and that includes some scope boundaries—but where the defined scope boundaries has some unintentional loophole which causes disaster.
3) I can agree that working on AGI research is essential, instead of avoiding it. But taking the step from research through prototyping to actually launching/betatesting a planned powerful self-improving system is dangerous if the world hasn’t yet finished an acceptable solution to Friendliness or the boundary-setting problem. If having any bugs in the scope boundaries is ‘unlikely’ (95-95% confidence?) then it’s not safe enough, because 1-5% chance of an extinction event after launching the system is not acceptable, it’s quite a significant chance—not the astronomical chances involved in Pascal’s wager or asteroid hitting the earth tomorrow or LHC ending the universe.
And given the current software history and published research on goal systems, if anyone would show up today and demonstrate that they’ve solved self-improving GAI obstacles and can turn it on right now, then I can’t imagine how they could realistically claim a larger than 95-99% confidence in their goal system working properly. At the moment we can’t check any better, but such a confidence level simply is not enough.
Yes, I agree with everything. I’m not trying to argue that there exist no considerable risk. I’m just trying to identify some antipredictions against AI going FOOM that should be incorporated into any risk estimations as it might weaken the risk posed by AGI or increase the risk posed by impeding AGI research.
I was insufficiently clear that what I wanted to argue about is the claim that virtually all pathways lead to destructive results. I have an insufficient understanding of why the concept of general intelligence is inevitably connected with dangerous self-improvement. Learning is self-improvement in a sense but I do not see how this must imply unbounded improvement in most cases given any goal whatsoever. One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind? I don’t think you can draw any conclusions here. So we are back at the abstract concept of a constructed general intelligence (as I understand it right now), that is an intelligence with the potential to reach at least human standards (same as a human toddler). Another argument is based on this very difference between humans and AI’s, namely that there is nothing to distract them, that they will possess an autistic focus on one mandatory goal and follow up on it. But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal? The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output. Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
For example, if you told the AGI to become as good as possible in Formula 1, so that it was faster than any human race driver. How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow. Secondly, why would it keep improving once it is faster than any human rather than just hold and become impassive? This argument could be extended to many other goals which have scope bounded solutions.
Of course, if you told it to learn as much about the universe as possible, that is something completely different. Yet I don’t see how this risk does raise against other existential risks like grey goo since it should be easier to create advanced replicators to destroy the world than creating AGI that then creates advanced replicators that then fails hold and then destroys the world?
Humans are (roughly) the stupidest possible general intelligences. If it were possible for even a slightly less intelligent species to have dominated the earth, they would have done so (and would now be debating AI development in a slightly less sophisticated way). We are so amazingly stupid we don’t even know what our own preferences are! We (currently) can’t improve or modify our hardware. We can modify our own software, but only to a very limited extent and within narrow constraints. Our entire cognitive architecture was built by piling barely-good-enough hacks on top of each other, with no foresight, no architecture, and no comments in the code.
And despite all that, we humans have reshaped the world to our whims, causing great devastation and wiping out many species that are only marginally dumber than we are. And no human who has ever lived has known their own utility function. That alone would make us massively more powerful optimizers; it’s a standard feature for every AI. AIs have no physical, emotional, or social needs. They do not sleep, or rest, or get bored or distracted. On current hardware, they can perform more serial operations per second than a human by a factor of 10,000,000.
An AI that gets even a little bit smarter than a human will out-optimize us, recursive self-improvement or not. It will get whatever it has been programmed to want, and it will devote every possible resource it can acquire to doing so.
Clippy’s cousin, Clip, is a paperclip satisficer. Clip has been programmed to create 100 paperclips. Unfortunately, the code for his utility function is approximately “ensure that there are 100 more paperclips in the universe than there were when I began running.”
Soon, our solar system is replaced with n+100 paperclips surrounded by the most sophisticated defenses Clip can devise. Probes are sent out to destroy any entity that could ever have even the slightest chance of leading to the destruction of a single paperclip.
The Hidden Complexity of Wishes and Failed Utopia #4-2 may be worth a look. The problem isn’t a lack of specificity, because an AI without a well-defined goal function won’t function. Rather, the danger is that the goal system we specify will have unintended consequences.
Acquiring information is useful for just about every goal. When there aren’t bigger expected marginal gains elsewhere, information gathering is better than nothing. “Learn as much about the universe as possible” is another standard feature for expected utility maximizers.
And this is all before taking into account self-improvement, utility functions that are unstable under self-modification, and our dear friend FOOM.
TL;DR:
Agents that aren’t made of meat will actually maximize utility.
Writing a utility function that actually says what you think it does is much harder than it looks.
Be afraid.
Upvoted, thanks! Very concise and clearly put. This is so far the best scary reply I got in my opinion. It reminds me strongly of the resurrected vampires in Peter Watts novel Blindsight. They are depicted as natural human predators, a superhuman psychopathic Homo genus with minimal consciousness (more raw processing power instead) that can for example hold both aspects of a Necker cube in their heads at the same time. Humans resurrected them with a deficit that was supposed to make them controllable and dependent on their human masters. But of course that’s like a mouse trying to hold a cat as pet. I think that novel shows more than any other literature how dangerous just a little more intelligence can be. It quickly becomes clear that humans are just like little Jewish girls facing a Waffen SS squadron believing they go away if they only close their eyes.
My favorite problem with this entire thread is that it’s basically arguing that even the very first test cases will destroy us all. In reality, nobody puts in a grant application to construct an intelligent being inside a computer with the goal of creating 100 paperclips. They put in the grant to ‘dominate the stock market’, or ‘defend the nation’, or ‘cure death’. And if they don’t, then the Chinese government, who stole the code, will, or that Open Source initiative will, or the South African independent development will, because there’s enormous incentives to do so.
At best, boxing an AI with trivial, pointless tasks only delays the more dangerous versions.
I like to think that Skynet got its start through creative interpretation of a goal like “ensure world peace”. ;-)
″ How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow”—because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal. For example, I’m quite sure that F1 rules prohibit interfering with drivers during the game; but if somehow a silicon-reaction-speed AGI can’t win F1 by default, then it may find it simpler/quicker to harm the opponents in one of the infinity ways that the F1 rules don’t cover—say, getting some funds in financial arbitrage, buying out the other teams, and firing any good drivers or engineering a virus that halves the reaction speed of all homo-sapiens—and then it would be happy as the goal is achieved within the rules.
That’s clear. But let me again state what I’d like to inquire. Given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), isn’t the nonhazardous subset of all possible outcomes much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc? Here is where this question stems from. Given my current knowledge about AGI I believe that any AGI capable of dangerous self-improvement will be very sophisticated, including a lot of restrictions. For example, I believe that any self-improvement can only be as efficient as the specifications of its output are detailed. If for example the AGI is build with the goal in mind to produce paperclips, the design specifications of what a paperclip is will be used as leveling rule by which to measure and quantify any improvement of the AGI’s output. This means that to be able to effectively self-improve up to a superhuman level, the design specifications will have to be highly detailed and by definition include sophisticated restrictions. Therefore to claim that any work on AGI will almost certainly lead to dangerous outcomes is to assert that any given AGI is likely to work perfectly well, subject to all restrictions except one that makes it hold (spatiotemporal scope boundaries). I’m unable to arrive at that conclusion as I believe that most AGI’s will fail extensive self-improvement as that is where failure is most likely for that it is the largest and most complicated part of the AGI’s design parameters. To put it bluntly, why is it more likely that contemporary AGI research will succeed at superhuman self-improvement (beyond learning), yet fail to limit the AGI, rather than vice versa? As I see it, it is more likely, given the larger amount of parameters to be able to self-improve in the first place, that most AGI research will result in incremental steps towards human-level intelligence rather than one huge step towards superhuman intelligence that fails on its scope boundary rather than self-improvement.
What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn’t know what a paperclip is, then it will go and find out, using whatever means it has available. It won’t give up just because you weren’t detailed enough in telling it what you wanted.
Then I don’t think that there is anyone working on what you are envisioning as ‘AGI’ right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I’m reasonably certain that if I had the same advantages an AGI would have—that is, if I were converted into an emulation and given my own source code—then I could foom. And I think any reasonably skilled computer programmer could, too.
Debugging will be PITA. Both ways.
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
Exactly? I think we agree about this.
It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.
Again, I recommend The Basic AI Drives.
I cannot disagree with the paper based on that definition of what an “artificial intelligence” is. If you’ve all of this, goals, planning and foresight then you’re already at the end of a very long and hard journey peppered with failures. I’m aware of the risks associated with such agents and support the SIAI, including donations. The intention of this thread was that I wanted to show that contemporary AGI research is much more likely to lead to other outcomes, not that there will be no danger if you already have an AGI with the ability for unbounded self-improvement. But I believe there are many AGI designs who fail this characteristic and therefore I concluded that it is more likely than not that it won’t be a danger. I see now that my definition of AGI is considerable weaker than yours. So of course, if you take your definition what I said is not compelling. I believe that we’ll arrive at your definition only after a long chain of previous weak AGI’s who are impotent of considerable self-improvement and that once we figure out how to create the seed for this kind of potential we are also much more knowledgeable about associated risks and challenges such advanced AGI’s might pose.
Yes, and weak AGIs are dangerous in the same sense as Moore’s law is: by probably making the construction of strong AGI a little bit closer, and thus a development contributing to the eventual existential risk, while being probably not directly dangerous in itself.
Yes, but each step into that direction does also provide insights into the nature of AI and therefore can help to design friendly AI. My idea was that such uncertainties are incorporated into any estimations of the dangers posed by contemporary AI research. How much does the increased understanding outweigh its dangers?
This was my guess for the first 1.5 years or so. The problem is, FAI is necessarily a strong AGI, but if you learn how to build a strong AGI, you are in trouble. You don’t want to have that knowledge around, unless you know where to get the goals from, and studying efficient AGIs doesn’t help with that. The harm is greater than the benefit, and it’s entirely plausible that one can succeed in building a strong AGI without getting the slightest clue about how to define Friendly goal, so it’s not a given that there is any benefit whatsoever.
Yes, I’ll read it now.
Why do you believe that? What privileges “doing what it’s told”?
The question is not what privileges doing what it is told but why it would do what it is not told? A crude mechanical machine has almost no freedom, often it can only follow one pathway. An intelligent machine on the other hand has much freedom, it can follow infinitely many pathways. With freedom comes choice and the necessity to decide, to follow one pathway but not others. Here you assume that a general intelligence will follow a pathway of self-improvement. But I do not think that intelligence implies self-improvement and further that following a pathway that leads an intelligence to optimize will be taken without it being a explicit specified goal. And that is where I conclude that from a certain number of AGI projects not all will follow the pathway of unbounded, dangerous self-improvement as there are more pathways to follow which lead any given general intelligence to be impassive or hold.
If you’ve read the thread above you’ll see that my incentive is not to propose that there is no serious risk but that it is not inevitable that any AGI will turn out to be an existential risk. I want to propose that working on AGI carefully can help us better understand and define friendliness. I propose that the risk to carefull work on AGI is justified and does not imply our demise in any case.
Because planning consists in figuring out instrumental steps on your own.
If we are talking about a full-fledged general intelligence here (Skynet), there’s no arguing against any risk. I believe all we disagree about are definitions. That there are risks from advanced real-world (fictional) nanotechnology is indisputable. I’m merely saying that what researchers are working on is nanotechnology with the potential to lead to grey goo scenarios but that there is no inherent risk that any work on it will lead down the same pathway.
It is incredible hard to come up with an intelligence that knows what planning conists of and to know and care to be able to judge what step is instrumental. This won’t just happen accidently and will likely necessitate knowledge sufficient to be able to set scope boundaries as well. Again, this is not an argument that there is no risk but that it is not as strong as some people believe it to be.
Please keep focus, which is one of the most important tools. The above paragraph is unrelated to what I addressed in this conversation.
Review the above paragraph: what you are saying is that AIs are hard to build. But of course chess AIs do plan, to give an example. They don’t perform only the moves they are “told” to perform.
What I am talking about is that full-fledged AGI is incredible hard to achieve and that therefore most of all AGI projects will fail on something other than limiting the AGI’s scope. Therefore it is not likely that work on AGI is as dangerous as proposed.
That is, it is much more likely that any given chess AI will fail to beat a human player than that it will win. Still the researchers are working on chess AI’s and the chess AI’s will suit the definition of a general chess AI. Yet to get everything about a chess AI exactly right to beat any human but fail to implement certain performance boundaries (e.g. strength of its play or that it will overheat its CPU’s etc.) is an unlikely outcome. It is more likely that it will be good at chess but not superhuman, that it will fail to improve, slow or biased than that it will succeed on all of the previous and additionally leave its scope boundaries.
So the discussion is about if the idea that any work on AGI is incredible dangerous is strong or if it can be weakened.
Yes, broken AIs, such as humans or chimps, are possible.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing. If it doesn’t, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn’t very useful either).
The difficulty of detecting these threats is spread out around the range of difficulties the AI is capable of handling, so it can infer that there are probably more threats which it could only detect if it were smarter. Therefore, making itself smarter will enable it to detect more threats and thereby increase utility.
To be able to optimize it will have to know what it is supposed to optimize. You’ve to carefully specify what it output (utility function) is supposed to be or it won’t be able to tell how good it is at optimizing. If you just tell it to produce paperclips, it won’t be able to self-improve because it doesn’t know how paperclips look like etc., therefore it cannot judge its own success or that extreme heat would be a negative impact giving paperclips made out of plastic. You further assume that it has a detailed incentive, that it is given a detailed pathway that it tells to look for threats and eliminate them.
If it doesn’t it is what most researchers are working on, an intelligence with the potential to learn and make use of what it learnt, with the potential to become intelligent (educated). I’m getting the impression that people here assume that researchers are not working on an AGI but to hardcode a FOOM machine. If FOOM is simply part of your definition then there’s no arguing against it going FOOM. But what researchers like Goertzel are working on are systems with the potential to reach human level intelligence, that does not mean that they will by definition jailbreak their nursery school. Although I never tried to argue against the possibility but that there are many pathways where this won’t happen rather than the way it is portrayed by the SIAI, that any implementation of AGI will most likely consume humnanity.
The sorts of intelligences you are talking about are narrow AIs, not general intelligences. If you told a general intelligence to produce paperclips but it didn’t know what a paperclip was, then its first subgoal would be to find out. The sort of mind that would give up on a minor obstacle like that wouldn’t foom, but it wouldn’t be much of an AGI either.
And yes, most researchers today are working on narrow AIs, not on AGI. That means they’re less likely to successfully make a general intelligence, but it has no bearing on the question of what will happen if they do make one.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right, including the ability to maintain that specification under self modification.
For example, the specification:
… will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
What weird way are you measuring “efficiency”. Not in joules per paperclip, I gather. You are not likely to “destroy humanity” with a few hundred kilojoules a day. Satisficing machines really are relatively safe.
See other comments hereabouts for hints.
And I was arguing that any given AI won’t be able to self-improve without an exact specification of its output against which it can judge its own efficiency. That’s why I don’t see how it would be likely to be able to implement such exact specifications but yet fail to limit its scope of space, time and resources. What makes it even more unlikely in my opinion is that an AI won’t care to output anything as long as it isn’t explicitly told to do so. Where would that incentive come from?
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don’t care to go any further. That is for example software-optimization. I just don’t see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
You don’t need to tell it to self improve at all.
Per day. Risk mitigation. Security concerns. Possibility of interuption of resource supply due to finance, politics or the collapse of civilisation. Limited lifespan of the sun (primary energy source). Amount of iron in planet.
Given that particular specification if the AI didn’t take a level in baddass it would appear to be malfunctioning.
I just saw this comment by Ben Goertzel regarding self-improvement. I’d love if someone here explained why he as AGI researcher gets this so wrong?
Goertzel is generalizing from the human example of intelligence, which is probably the most pernicious and widespread failure mode in thinking about AI.
Or he may be completely disconnected from anything even resembling the real world. I literally have trouble believing that a professional AI researcher could describe a primitive, dumber-than-human AGI as “toddler-level” in the same sentence he dismisses it as a self-modification threat.
Toddlers self-modify into people using brains made out of meat!
No they don’t. Self-modification in the context of AGI doesn’t mean learning or growing, it means understanding the most fundamental architecture of your own mind and purposefully improving it.
That said, I think your first sentence is probably right. It looks like Ben can’t imagine a toddler-level AGI self-modifying because human toddlers can’t (or human adults, for that matter). But of course AGIs will be very different from human minds. For one thing, their source code will be a lot easier to understand than ours. For another, their minds will probably be much better at redesigning and improving code than ours are. Look at the kind of stuff that computer programs can do with code: Some of them already exceed human capabilities in some ways.
“Toddler-level AGI” is actually a very misleading term. Even if an AGI is approximately equal to a human toddler by some metrics, it will certainly not be equal by many other metrics. What does “toddler-level” mean when the AGI is vastly superior to even adult human minds in some respects?
“Understanding” and “purpose” are helpful abstractions for discussing human-like computational agents, but in more general cases I don’t think your definition of self-modification is carving reality at its joints.
ETA: I strongly agree with everything else in your comment.
Well, bad analogy. They don’t self-modify by understanding their source code and improving it. They gradually grow larger brains in a pre-set fashion while learning specific tasks. Humans have very little ability to self-modify.
Exactly! Humans can go from toddler to AGI start-up founder, and that’s trivial.
Whatever the hell the AGI equivalent of a toddler is, it’s all but guaranteed to be better at self-modification than the human model.
Political incentive determines the bottom line. Then the page is filled with rhetoric (and, from the looks of it, loaded language and status posturing.)
Seriously, Ben is trying to accuse people of abusing the self-modification term based on the (trivially true) observation that there is a blurry boundary between learning and self-modification?
It’s a good thing Ben is mostly harmless. I particularly liked the part where I asked Eliezer:
… and actually got a candid reply.
It is interesting to note the effort Ben is going to here to dissaffiliate himself with the SIAI and portray them as ‘out group’. Wei was querying (see earlier link) the wisdom of having Ben as Director of Research just earlier this year.
An educated outsider will very likely side with the expert though. Just like with the hype around the LHC and its dangers, academics and educated people largely believed the physicists working on it and not the fringe group that claimed it will destroy the world. Although that might be vice versa with the general public. Of course you cannot draw any conclusions about who’s right from this, but it should be investigated anyway because what all parties have in common is the need for support and money.
There are two different groups to be convinced here by each party. One group includes the educated people (academics) and mediocre rationalists and the other group is the general public.
When it comes to who’s right, the people one should listen to are the educated experts who are listening to both parties, their position and arguments. Although their intelligence and status as rationalists will be disputed as each party will claim that they are not smart enough to see the truth if they disagree with them.
Well said and truly spoken.
(My shorter answer, by the way—I interpret all such behaviors through a Hansonian lens. This includes “near vs far”, observations about the incentives of researchers, the general theme of “X is not about Y” and homo hypocritus. Rather cynical, some may suggest, but this kind of thinking gives very good explanaions for “Why?”s that would otherwise be confusing.)
The basic idea is to make a machine that is satisfied relatively easily. So, for example, you tell it to build the ten paperclips with 10 kj total—and tell it not to worry too much if it doesn’t make them—it is not that important.
Sorry, I don’t understand your comment at all. I’ll be back tomorrow.
Yes, as I said, you seem to assume that it is very likely to succeed on all the hard problems but yet fail on the scope boundary. The scary idea states that it is likely that if we create self-improving AI it will consume humanity. I believe that is a rather unlikely outcome and haven’t seen any good reason to believe something else yet.
No, it states that we run the risk of accidentally making something that will consume (or exterminate, subvert, betray, make miserable, or otherwise Do Bad Things to) humanity, that looks perfectly safe and correct, right up until it’s too late to do anything about it… and that this is the default case: the case if we don’t do something extraordinary to prevent it.
This doesn’t require self-improvement, and it doesn’t require wiping out humanity. It just requires normal, every-day human error.
Here is Ben’s phrasing:
If the error is in the goal-oriented framework, it could end up “correcting” itself to achieve unintended goals.
An outstanding piece of reasoning/rhetoric which deserves to be revised and relocated to top-level-postdom.
I like the analogy. It may even fit when considering building a friendly AI—like hitting a golf ball deliberately and to the best of your ability from a randomly selected spot on the earth and trying to get a hole in one. Overwhelmingly difficult, perhaps even impossible given human capabilities but still worth dedicating all your effort to attempting!
Isn’t that exactly the argument against non-proven AI values in the first place?
If you expect AI-chimp to be worried that AI-superchimp won’t love bannanas , then you should be very worried about AI-chimp.
I don’t get what you’re saying about the paperclipper.
It is a reason not to transcend if you are not sure that you’ll still be you afterwards, i.e. keep your goals and values. I just wanted to point out that the argument runs both directions. It is an argument for the fragility of values and therefore the dangers of fooming but also an argument for the difficulty that could be associated with radically transforming yourself.
No, the reason that people disagree at this point is that it’s not obvious that future rounds of recursive self-improvement will be as effective as the first, or even that the first round will be that effective.
Obviously an AI would have large amounts of computational power, and probably be able to think much more quickly than a human. Most likely it would be more intelligent than any human on the planet by a considerable margin. But this doesn’t imply
(provided that the AI was originally built by humans, of course; if its design was too complicated for humans to arrive at, a slightly superhuman might be helpless as well)
Yes, that’s rather the point. Assuming that you do get to human-level, though, you now have the potential for fooming, if only in speed.
I’m a fan of chess, evolutionary algorithms, and music, and the Emily Howell example is the one that sticks out like a sore thumb here. Music is not narrow and Emily Howell is not comparable to a typical human musician.
The point is that it (and its predecessor Emmy) are special-purpose “idiot savants”, like the other two examples. That it is not a human musician is beside the point: the point is that humans can make idiot-savant programs suitable for solving any sufficiently-specified problem, which means a human-level AI programmer can do the same.
And although real humans spent many years on some of these narrow-domain tools, an AI programmer might be able to execute those years in minutes.
No, it’s quite different from the other two examples. Deep Blue beat the world champion. The evolutionary computation-designed antenna was better than its human-designed competitors.
To be precise, what sufficiently-specified compositional problem do you think Emily Howell solves better than humans? I say “compositional” to reassure you that I’m not going to move the goalposts by requiring “real emotion” or human-style performance gestures or anything like that.
If I understand correctly, the answer would be “making the music its author/co-composer wanted it to make”.
(In retrospect, I probably should have said “Emmy”—i.e., Emily’s predecessor that could write classical pieces in the style of other composers.)
To make that claim, we’d have to have one or more humans who sat down with David Cope and tried to make the music that he wanted, and failed. I don’t think David Cope himself counts, because he has written music “by hand” also, and I don’t think he regards it as a failure.
Re EMI/Emmy, it’s clearer: the pieces it produced in the style of (say) Beethoven are not better than would be written by a typical human composer attempting the same task.
Now would be a good time for me to acknowledge/recall that my disagreement on this doesn’t take away from the original point—computers are better than humans on many narrow domains.
So, are you suggesting that Robin Hanson (who is on record as not buying the Scary Idea) -- the current owner of the Overcoming Bias blog, and Eli’s former collaborator on this blog—fails to buy the Scary Idea “due to cognitive biases that are hard to overcome.” I find that a bit ironic.
Like Robin and Eli and perhaps yourself, I’ve read the heuristics and biases literature also. I’m not so naive as to make judgments about huge issues, that I think about for years of my life, based strongly on well-known cognitive biases.
It seems more plausible to me to assert that many folks who believe the Scary Idea, are having their judgment warped by plain old EMOTIONAL bias—i.e. stuff like “fear of the unknown”, and “the satisfying feeling of being part a self-congratulatory in-crowd that thinks it understands the world better than everyone else”, and the well known “addictive chemical high of righteous indignation”, etc.
Regarding your final paragraph: Is your take on the debate between Robin and Eli about “Foom” that all Robin was saying boils down to “la la la I can’t hear you” ? If so I would suggest that maybe YOU are the one with the (metaphorical) hearing problem ;p ….
I think there’s a strong argument that: “The truth value of “Once an AGI is at the level of a smart human computer scientist, hard takeoff is likely” is significantly above zero.” No assertion stronger than that seems to me to be convincingly supported by any of the arguments made on Less Wrong or Overcoming Bias or any of Eli’s prior writings.
Personally, I actually do strongly suspect that once an AGI reaches that level, a hard takeoff is extremely likely unless the AGI has been specifically inculcated with goal content working against this. But I don’t claim to have a really compelling argument for this. I think we need a way better theory of AGI before we can frame such arguments compellingly. And I think that theory is going to emerge after we’ve experimented with some AGI systems that are fairly advanced, yet well below the “smart computer scientist” level.
Actually, predicting the behaviour of a superintelligence is a pretty trivial engineering feat—provided you are prepared to make it act a little bit more slowly.
Just get another agent to intercept all its motor outputs, delay them, and then print them all out all a little bit before they will be perfomed. Presto: a prediction of what the machine is about to do. Humans could use those predictions to veto the proposed actions—if they so chose.
Humans can’t predict what Deep Blue will do—but they can turn it off.
I think your argument collapses around about here.
If there is one thing Deep Blue is good at it is doing a deep search multiple moves ahead.
I think DB is going to see that one coming!
...but what’s it going to do? Castle? ;-) Unpredictable != Uncontrollable.