I can think of several good reasons why values might not be incorporated into a system as “ought” beliefs. If my AI isn’t very good at reasoning, I might, for instance, find it simpler to construct a black-box “does this action have consequence X” property-checker and incorporate that into the system somewhere. The rest of the system has no access to the internals of the black box—it just supplies a proposed course of action and gets back a YES or a NO.
You ask whether there’s “any AI research ongoing that would support the notion that AIs wouldn’t have some sort of systematic categorization of beliefs and values?”
Most of what’s currently published at major AI research conferences describes systems that don’t have any such systematic characterization. Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That soft of system strikes me as the likeliest one to meet the bar of “AGI” in the next few years. It isn’t particularly far from current research.
Before you quibble about whether that’s the kind of system we’re talking about—I haven’t seen a good definition of “self-improving” program, and I suspect it is not at all straightforward to define. Among other reasons, I don’t know a good definition that separates ‘code’ and ‘data’. So if you don’t like the example above, you should make sure that there’s a clear difference between choosing what inputs to read (which modifies internal state) and choosing what code to load (which also modifies internal state).
As to the human example: my sense is that humans don’t get locked to any one set of goals; that goals continue to evolve, without much careful pruning, over a human lifetime. Expecting an AI to tinker with its goals for a while, and then stop, is asking it to do something that neither natural intelligences or existing software seems to do or even be capable of doing.
Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That sort of system strikes me as the likeliest one to meet the bar of “AGI” in the next few years. It isn’t particularly far from current research.
This seems like a plausible way of blowing up the universe, but not in the next few years. This kind of thing requires a lot of development, I’d give it 30-60 years at least.
Most of what’s currently published at major AI research conferences describes systems that don’t have any such systematic characterization. Suppose we built a super-duper Watson
… I think we’re having a major breakdown of communication because to my understanding Watson does exactly what you just claimed no AI at research conferences is doing.
Before you quibble about whether that’s the kind of system we’re talking about—I haven’t seen a good definition of “self-improving” program, and I suspect it is not at all straightforward to define.
I’m sure. But there’s a few generally sound assertions we can make:
To be self-improving the machine must be able to examine its own code / be “metacognitive.”
To be self-improving the machine must be able to produce a target state.
From these two the notion of value fixation in such an AI would become trivial. Even if that version of the AI would have man-made value-fixation, what about the AI it itself codes? If the AI were actually smarter than us, that wouldn’t exactly be the safest route to take. Even Asimov’s Three Laws yielded a Zeroth Law.
Expecting an AI to tinker with its goals for a while, and then stop,
Don’t anthropomorphize. :)
If you’ll recall from my description, I have no such expectation. Instead, I spoke of recursive refinement causing apparent fixation in the form of “gravity” or “stickiness” towards a specific set of values.
Why is this unlike how humans normally are? Well, we don’t have much access to our own actual values.
I can think of several good reasons why values might not be incorporated into a system as “ought” beliefs. If my AI isn’t very good at reasoning, I might, for instance, find it simpler to construct a black-box “does this action have consequence X” property-checker and incorporate that into the system somewhere. The rest of the system has no access to the internals of the black box—it just supplies a proposed course of action and gets back a YES or a NO.
You ask whether there’s “any AI research ongoing that would support the notion that AIs wouldn’t have some sort of systematic categorization of beliefs and values?”
Most of what’s currently published at major AI research conferences describes systems that don’t have any such systematic characterization. Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That soft of system strikes me as the likeliest one to meet the bar of “AGI” in the next few years. It isn’t particularly far from current research.
Before you quibble about whether that’s the kind of system we’re talking about—I haven’t seen a good definition of “self-improving” program, and I suspect it is not at all straightforward to define. Among other reasons, I don’t know a good definition that separates ‘code’ and ‘data’. So if you don’t like the example above, you should make sure that there’s a clear difference between choosing what inputs to read (which modifies internal state) and choosing what code to load (which also modifies internal state).
As to the human example: my sense is that humans don’t get locked to any one set of goals; that goals continue to evolve, without much careful pruning, over a human lifetime. Expecting an AI to tinker with its goals for a while, and then stop, is asking it to do something that neither natural intelligences or existing software seems to do or even be capable of doing.
This seems like a plausible way of blowing up the universe, but not in the next few years. This kind of thing requires a lot of development, I’d give it 30-60 years at least.
… I think we’re having a major breakdown of communication because to my understanding Watson does exactly what you just claimed no AI at research conferences is doing.
I’m sure. But there’s a few generally sound assertions we can make:
To be self-improving the machine must be able to examine its own code / be “metacognitive.”
To be self-improving the machine must be able to produce a target state.
From these two the notion of value fixation in such an AI would become trivial. Even if that version of the AI would have man-made value-fixation, what about the AI it itself codes? If the AI were actually smarter than us, that wouldn’t exactly be the safest route to take. Even Asimov’s Three Laws yielded a Zeroth Law.
Don’t anthropomorphize. :)
If you’ll recall from my description, I have no such expectation. Instead, I spoke of recursive refinement causing apparent fixation in the form of “gravity” or “stickiness” towards a specific set of values.
Why is this unlike how humans normally are? Well, we don’t have much access to our own actual values.