“We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief.”
So wrote Leo Szilard, describing the events of March 3, 1939, when he demonstrated a neutron-induced uranium fission reaction. According to the historian Richard Rhodes, Szilard had the idea for a neutron-induced chain reaction on September 12, 1933, while crossing the road next to Russell Square in London. The previous day, Ernest Rutherford, a world authority on radioactivity, had given a “warning…to those who seek a source of power in the transmutation of atoms – such expectations are the merest moonshine.”
Thus, the gap between authoritative statements of technological impossibility and the “miracle of understanding” (to borrow a phrase from Nathan Myhrvold) that renders the impossible possible may sometimes be measured not in centuries, as Rod Brooks suggests, but in hours.
None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore’s law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.
This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius. AI research has been accelerating rapidly as pieces of the conceptual framework fall into place, the building blocks gain in size and strength, and commercial investment outstrips academic research activity. Senior AI researchers express noticeably more optimism about the field’s prospects than was the case even a few years ago, and correspondingly greater concern about the potential risks.
No one in the field is calling for regulation of basic research; given the potential benefits of AI for humanity, that seems both infeasible and misdirected. The right response seems to be to change the goals of the field itself; instead of pure intelligence, we need to build intelligence that is provably aligned with human values. For practical reasons, we will need to solve the value alignment problem even for relatively unintelligent AI systems that operate in the human environment. There is cause for optimism, if we understand that this issue is an intrinsic part of AI, much as containment is an intrinsic part of modern nuclear fusion research. The world need not be headed for grief.
Finally some common sense. I was seriously disappointed in statements made by people I usually admire (Pinker, Schremer). It just shows how much we still have to go in communicating AI risk to the general public when even the smartest intellectuals dismiss this idea before any rational analysis.
I’m really looking forward to Elon Musk’s comment.
Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
ELI5...
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
Why is “spontaneous emergence of consciousness and evil intent” not a risk?
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
If the AI is aware of the pauses, it can try to eliminate them (if the pauses are triggered by a circumstance X, it can find a clever way to technically avoid X), or to make itself receive the “instruction” it wants to receive (e.g. by threating or hypnotising a human, or by doing something that technically counts as human input).
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
Because instructions are words, and “ask for instructions” implies an ability to understand and a desire to follow. The desire to follow instructions according to their givers’ intentions is more-or-less a restatement of the Hard Problem of FAI itself: how do we formally specify a utility function that converges to our own in the limit of increasing optimization power and autonomy?
If you are worrying about the dangers of human level or greater AI, you are tacitly taking the problem of natural language interpretation to have been solved, so the above is an appeal to Mysterious Selective Stupidity.
you are tacitly taking the problem of natural language interpretation to have been solved
No, I am not. Just because an AGI can solve the natural-language interpretation problem does not mean the natural-language interpretation problem was solved separately from the AGI problem, in terms of narrow NLP models. In fact, more or less the entire point of AGI is to have a single piece of software to which we can feed any and all learning problems without having to figure out how to model them formally ourselves.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added”), the question is how to actually program that.
Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.
The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn’t really help when we need to encode something we’re fuzzy about rather than being able to specify it formally.
Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI
I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn’t helpful.
agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
I don’t see why that would follow, and in fact I argued against it.
knowing is quite different from caring.
I know.
What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
That’s not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really meaneven under reflection as more knowledge and intelligence are added”), the question is how to actually program that
That’s kind of what I was saying.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.
Stuart Russell contributes a response to the Edge.org article from earlier this month.
Finally some common sense. I was seriously disappointed in statements made by people I usually admire (Pinker, Schremer). It just shows how much we still have to go in communicating AI risk to the general public when even the smartest intellectuals dismiss this idea before any rational analysis.
I’m really looking forward to Elon Musk’s comment.
ELI5...
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
Why is “spontaneous emergence of consciousness and evil intent” not a risk?
If the AI is aware of the pauses, it can try to eliminate them (if the pauses are triggered by a circumstance X, it can find a clever way to technically avoid X), or to make itself receive the “instruction” it wants to receive (e.g. by threating or hypnotising a human, or by doing something that technically counts as human input).
I see.
This is the gist of the AI Box experiment, no?
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
No. Bribes and rational persuasion are fair game too.
Because instructions are words, and “ask for instructions” implies an ability to understand and a desire to follow. The desire to follow instructions according to their givers’ intentions is more-or-less a restatement of the Hard Problem of FAI itself: how do we formally specify a utility function that converges to our own in the limit of increasing optimization power and autonomy?
If you are worrying about the dangers of human level or greater AI, you are tacitly taking the problem of natural language interpretation to have been solved, so the above is an appeal to Mysterious Selective Stupidity.
No, I am not. Just because an AGI can solve the natural-language interpretation problem does not mean the natural-language interpretation problem was solved separately from the AGI problem, in terms of narrow NLP models. In fact, more or less the entire point of AGI is to have a single piece of software to which we can feed any and all learning problems without having to figure out how to model them formally ourselves.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added”), the question is how to actually program that.
Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.
The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn’t really help when we need to encode something we’re fuzzy about rather than being able to specify it formally.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn’t helpful.
I don’t see why that would follow, and in fact I argued against it.
I know.
That’s not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.
That’s kind of what I was saying.
Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.