In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added”), the question is how to actually program that.
Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.
The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn’t really help when we need to encode something we’re fuzzy about rather than being able to specify it formally.
Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI
I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn’t helpful.
agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
I don’t see why that would follow, and in fact I argued against it.
knowing is quite different from caring.
I know.
What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
That’s not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really meaneven under reflection as more knowledge and intelligence are added”), the question is how to actually program that
That’s kind of what I was saying.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.
In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.
Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.
And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.
No, I’m insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, “Do what I mean!” and Bob’s your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language—which makes desirable behavior quite improbable.
Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI’s concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.
Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.
We have some idea of a safe goal function for the AGI (it’s essentially a longer-winded version of “Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added”), the question is how to actually program that.
Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.
The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn’t really help when we need to encode something we’re fuzzy about rather than being able to specify it formally.
If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.
I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn’t helpful.
I don’t see why that would follow, and in fact I argued against it.
I know.
That’s not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.
That’s kind of what I was saying.
Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.