Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
ELI5...
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
Why is “spontaneous emergence of consciousness and evil intent” not a risk?
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
If the AI is aware of the pauses, it can try to eliminate them (if the pauses are triggered by a circumstance X, it can find a clever way to technically avoid X), or to make itself receive the “instruction” it wants to receive (e.g. by threating or hypnotising a human, or by doing something that technically counts as human input).
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
Because instructions are words, and “ask for instructions” implies an ability to understand and a desire to follow. The desire to follow instructions according to their givers’ intentions is more-or-less a restatement of the Hard Problem of FAI itself: how do we formally specify a utility function that converges to our own in the limit of increasing optimization power and autonomy?
If you are worrying about the dangers of human level or greater AI, you are tacitly taking the problem of natural language interpretation to have been solved, so the above is an appeal to Mysterious Selective Stupidity.
you are tacitly taking the problem of natural language interpretation to have been solved
No, I am not. Just because an AGI can solve the natural-language interpretation problem does not mean the natural-language interpretation problem was solved separately from the AGI problem, in terms of narrow NLP models. In fact, more or less the entire point of AGI is to have a single piece of software to which we can feed any and all learning problems without having to figure out how to model them formally ourselves.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added”), the question is how to actually program that.
Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.
The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn’t really help when we need to encode something we’re fuzzy about rather than being able to specify it formally.
Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI
I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn’t helpful.
agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
I don’t see why that would follow, and in fact I argued against it.
knowing is quite different from caring.
I know.
What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
That’s not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really meaneven under reflection as more knowledge and intelligence are added”), the question is how to actually program that
That’s kind of what I was saying.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.
ELI5...
Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
Why is “spontaneous emergence of consciousness and evil intent” not a risk?
If the AI is aware of the pauses, it can try to eliminate them (if the pauses are triggered by a circumstance X, it can find a clever way to technically avoid X), or to make itself receive the “instruction” it wants to receive (e.g. by threating or hypnotising a human, or by doing something that technically counts as human input).
I see.
This is the gist of the AI Box experiment, no?
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
No. Bribes and rational persuasion are fair game too.
Because instructions are words, and “ask for instructions” implies an ability to understand and a desire to follow. The desire to follow instructions according to their givers’ intentions is more-or-less a restatement of the Hard Problem of FAI itself: how do we formally specify a utility function that converges to our own in the limit of increasing optimization power and autonomy?
If you are worrying about the dangers of human level or greater AI, you are tacitly taking the problem of natural language interpretation to have been solved, so the above is an appeal to Mysterious Selective Stupidity.
No, I am not. Just because an AGI can solve the natural-language interpretation problem does not mean the natural-language interpretation problem was solved separately from the AGI problem, in terms of narrow NLP models. In fact, more or less the entire point of AGI is to have a single piece of software to which we can feed any and all learning problems without having to figure out how to model them formally ourselves.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added”), the question is how to actually program that.
Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.
The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn’t really help when we need to encode something we’re fuzzy about rather than being able to specify it formally.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn’t helpful.
I don’t see why that would follow, and in fact I argued against it.
I know.
That’s not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.
That’s kind of what I was saying.
Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.