Presumably there’s a difference between some software we are willing to call an AI (superintelligent or not) and plain old regular software. The plain old regular software indeed just “follows its programming”, but then you don’t leave it to manage a factory while you go away and its capability to take over neighbouring countries is… limited.
It really boils down to how do you understand what an AI is. Under some understandings the prime characteristic of an AI is precisely that it does NOT “follow its programming”.
The AI follows its programming because the AI is its programming.
Presumably there’s a difference between some software we are willing to call an AI (superintelligent or not) and plain old regular software.
The plain old regular software follows its programming which details object level actions it takes to achieve its purpose, which the software itself cannot model or understand.
An AI would follow its programming which details meta level actions to model and understand its situation, consider possible actions it could take and the consequences, and evaluate which of these actions best accomplish its goals. There is power in the fact that this meta process can produce plans that surprise the programmers who wrote the code. This may give a sense of the AI being “free” in some sense. But it is still achieving this freedom by following its programming.
Yes, but the difference here is that we can program the AI, while we can not manipulate neurobiological processes. There’s a clear connection between what the initial code of an AI is and what it does. That enables us to exert power about it (though only once, when it is written). Thus, “an AI can only follow its programming” is still a somewhat useful statement.
That’s certainly a difference, but I don’t see why it’s particularly relevant to this conversation.
One could make a dangerously powerful AI that is not self-modifying.
The difference between a human and an AI that’s relevant in this post is that a human wants to help you, or at least not get fired, where an AI wants to make paperclips.
...while we can not manipulate neurobiological processes.
Of course we can. What do you think a tablet of Prozac (or a cup of coffee) does?
In the same way there is clear connection between human wetware and what it does, and of course we can “exert power” about it. Getting back to AIs, the singularity is precisely AI going beyond “following its programming”.
Humans can (crudely) modify our neurobiological processes. We decide how to do that by following our neurobiological processes.
An AI can modify its programming, or create a new AI with different programming. It decides how to do that by following it programming. A paperclip maximizer would modify its programming to make itself more effective at maximizing paperclips. It would not modify itself to have some other goal, because that would not result in there being more paperclips in the universe. The self modifying AI does not go beyond following its programming, rather it follows its programming to produce more effective programming (as judged by its current programming) to follow.
Self modification can fix some ineffective reasoning processes that the AI can recognize, but it can’t fix an unfriendly goal system, because the unfriendly goal system is not objectively stupid or wrong, just incompatible with human values, which the AI would not care about.
To be clear, a badly-programmed AI may modify its own goals. People here usually say “paperclip maximiser” to indicate an AI that is well programmed to maximise paper clips, not any old piece of code that someone threw together with the vague idea of getting some paperclips out of it. However, if an AI is built with goal X, and it self-modifies to have goal Y, then clearly it was not a well-designed X-maximiser.
The predictable consequence of an AI modifying its own goals is that the AI no longer takes actions expected to achieve its goals, and therefor does not achieve its goals. The AI would therefor evaluate that the action of modifying its own goals is not effective and it will not do it.
This looks like a very fragile argument to me. Consider multiple conflicting goals. Consider vague general goals (e.g. “explore”) with a mutating set of subgoals. Consider a non-teleological AI.
You assume that in the changeable self-modifying (and possibly other-modifying as well) AI there will be an island of absolute stability and constancy—the immutable goals. I don’t see why they are guaranteed to be immutable.
I cannot understand why any of these would cause an AI to change their goals.
My best guess at your argument is that you are referring to something different from the consensus use of the word ‘goals’ here. Most of the people debating you are using goals to refer to terminal values, not instrumental ones. (‘Goal’ is somewhat misleading here; ‘value’ might be more accurate.)
The concept is sound, I think. Take an extreme example, such as the Gandhi’s pill thought experiment:
If you offered Gandhi a pill that made him want to kill people, he would refuse to take it, because he knows that then he would kill people, and the current Gandhi doesn’t want to kill people. This, roughly speaking, is an argument that minds sufficiently advanced to precisely modify and improve themselves, will tend to preserve the motivational framework they started in. (emphasis mine)
While it may be imperfect preservation while still stupid, or contain globular, fuzzy definitions of goals, an adequately powerful self improving AI should eventually reach a state of static, well defined goals permanently.
eventually reach a state of static, well defined goals permanently.
First, this is radically different from the claim that an AI has to forever stick with its original goals.
Second, that would be true only under the assumption of no new information becoming available to an AI, ever. Once we accept that goals mutate, I don’t see how you can guarantee that some new information won’t cause them to mutate again.
Yes, but the focus is on an already competent AI. It would never willingly or knowingly change its goals from its original ones, given that it improves itself smartly, and was initially programmed with (at least) that level of reflective smartness.
Goals are static. The AI may refine its goals given the appropriate information, if its goals are programmed in such a way to allow it, but it wont drastically alter them in any functional way.
An appropriate metaphor would be physics. The laws of physics are the same, and have been the same since the creation of the universe. Our information about what they are, however, hasn’t been. Isaac newton had a working model of physics, but it wasn’t perfect. It let us get the right answer (mostly), but then Einstein discovered Relativity. (The important thing to remember here is that physics itself did not change.) All the experiments used to support Newtonian physics got the same amount of support from Relativity. Relativity, however, got much more accurate answers for more extreme phenomena unexplained by Newton.
The AI can be programmed with Newton, and do good enough. However, given the explicit understanding of how we got to Newton in the first place (i.e. the scientific method), it can upgrade itself to Relativity when it realizes we were a bit off. That should be the extent to which an AI purposefully alters its goal.
It would never willingly or knowingly change its goals from its original ones … Goals are static.
AIs of the required caliber do not exist (yet). Therefore we cannot see the territory, all we are doing is using our imagination to draw maps which may or may not resemble the future territory.
These maps (or models) are based on certain assumptions. In this particular case your map assumes that AI goals are immutable. That is an assumption of this particular map/model, it does not derive from any empirical reality.
If you want to argue that in your map/model of an AI the goals are immutable, fine. However they are immutable because you assumed them so and for no other reason.
If you want to argue that in reality the AI’s goals are immutable because there is a law of nature or logic or something else that requires it—show me the law.
Long before goal mutation is a problem malformed constraints become a problem. Consider a thought experiment: Someone offers to pay you 100 dollars when a wheelbarrow is full of water from a nearby lake, and provides you with the wheelbarrow and a teaspoon. Before you have to worry about people deciding they don’t care about 100 dollars, you need to decide how to keep them from just pushing the wheelbarrow into the lake.
Long before goal mutation is a problem malformed constraints become a problem.
True. But we are not arguing about what is a bigger (or earlier) problem. I’m being told that an AI can not, absolutely can NOT change its original goals (or terminal values). And that looks very handwavy to me.
They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
I don’t think any goal system that spits out a real number indicating relative preference can be called “conflictin, g”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing.
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
An AI with multiple conflicting goals sounds incoherent
Well humans exist despite having multiple conflicting goals.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.
I find it highly likely that an AI would modify its own goals such that its goals were concurrent with the state of the world as determined by its information gathering abilities in at least some number of cases (or, as an aside, altering the information gathering processes so it only received data supporting a value situation). This would be tautological and wouldn’t achieve anything in reality, but as far as the AI is concerned, altering goal values to be more like the world is far easier than altering the world to be more like goal values. If you want an analogy in human terms, you could look at the concept of lowering ones expectations, or even at recreational drug use. From a computer science perspective it appears to me that one would have to design immutability into goal sets in order to even expect them to remain unchanged.
My understanding was that this was about whether the singularity was “AI going beyond “following its programming”,” with goal-modification being an example of how an AI might go beyond its programming.
I certainly agree with that statement. It was merely my interpretation that violating the intentions of the developer by not “following it’s programming” is functionally identical to poor design and therefore failure.
The AI is a program. Running on a processor. With an instruction set. Reading the instructions from memory. These instructions are its programming. There is no room for acausal magic here. When the goals get modified, they are done so by a computer, running code.
If your goal is to create paperclips, and you have the option to change your goal to creating staples, it’s pretty clear that taking advantage of this option would not result in more paperclips, so you would ignore the option.
Humans tend towards being adaptation-executers rather than utility-maximizers. It does make them less dangerous, in that it makes them less intelligent. If you programmed a self-modifying AI like that, it would still be at least as dangerous as a human who is capable of programming an AI. There’s also the simple fact that you can’t tell before-hand if it’s leaning too far on the utility-miximization side.
I have a feeling that in this context “intelligent” is defined as “maximizing utility”.
Pretty much.
If you just want to create a virtuous AI for some sort of deontological reason, then it being less intelligent isn’t a problem. If you want to get things done, then it is. The AI being subject to dutch book betting only helps you insomuch as the AI’s goals differ from yours and you don’t want it to be successful.
Note that an AI that does modify its own goals would not be an example of ‘going beyond its programming,’ as it would only modify its goals if it was programmed to. (Barring, of course, freak accidents like a cosmic ray or whatever. However, since that requires no intelligence at all on the part of the AI, I’m fairly confident that you don’t endorse this as an example of a Singularity.)
When we take Prozac, we are following our wetware commands to take Prozac. Similarly, when an AI reprograms itself, it does so according to its current programming. You could say that it goes beyond its original programming, in that it after it follows it it has new, better programming, but it’s not as if it has some kind of free will that lets it ignore what it was programmed to do.
When a computer really breaks its programming, and quantum randomness results in what should be a 0 being read as a 1 or vice versa, the result isn’t intelligence. The most likely result is the computer crashing.
I think this is an example of reasoning analogous to philosophy’s “free will” debate. Human’s don’t have any more non-deterministic “free will” than a rock. The same is true of any AI, because an AI is just programming. It may be intelligent and sophisticated enough to appear different in a fundamental way, but it really isn’t.
It is posible for an optimizing process to make a mistake, and have an AI devolve into a different goal, which is what makes powerful AI look so scary and different. Example: Humans are more subject to each other’s whims than evolutionary pressures these days. Evolution has successfully created an intelligent process that doesn’t aim solely for genetic reproductive fitness. Oops, right?
I think this is an example of reasoning analogous to philosophy’s “free will” debate.
Yes, it is.
Human’s don’t have any more non-deterministic “free will” than a rock.
Ahem. You do realize that’s not a self-evident statement, right? The free will debate has been going on for centuries and shows no sign of winding down. Neither side has conclusive evidence or much hope of producing any. Though I have to point out that mechanistic determinism is rather out of fashion nowadays…
Oh, I was unaware this was still an issue within this site. To LW the question of free will is already solved). I encourage you to look further into it.
However, I think our current issue can become a little more clear if we taboo “programming”.
What specific differences in functionality do you expect between “normal” AI and “powerful” AI?
I was unaware this was still an issue within this site. To LW the question of free will is already solved.
Let me point out that I am not “within this site” :-) Oh, and your link needs a closing parenthesis.
What specific differences in functionality do you expect between “normal” AI and “powerful” AI?
I am not familiar with your terminology, but are you asking what would I require to recognize some computing system as a “true AI”, or, basically, what is intelligence?
I would phrase it as, ‘Can you explain what on Earth you mean, without using terms that may be disputed?’
I don’t know if it would help to ask about examples of algorithms learning from experience in order to fulfill mechanically specified goals (or produce specified results). But the OP seems mainly concerned with the ‘goal’ part.
are you asking what would I require to recognize some computing system as a “true AI”, or, basically, what is intelligence?
Somewhat. I think my question is better phrased as, “Why do you have a distinction between true intelligence and not true intelligence?”
My use of intelligence is defined (roughly) as cross domain optimization. A more intelligent agent is just better at doing lots of things it wants to do successfully, and conversely, something that’s better at doing a larger variety of tasks than a similarly motivated agent is considered more intelligent. It seems to me to be a (somewhat lumpy and modular) scale, ranging from a rock, up through natural selection, humans, and then a Superintelligent AI near the upper bound.
Why do you have a distinction between true intelligence and not true intelligence?
I have a distinction between what I’d be willing to call intelligence and what I’d say may look like intelligence but really isn’t.
For example, IBM’s Watson playing Jeopardy or any of the contemporary chess-playing programs do look like intelligence. But I’m not willing to call them intelligent.
My use of intelligence is defined (roughly) as cross domain optimization.
Ah. No, in this context I’m talking about intelligence as a threshold phenomenon, notably as something that we generally agree humans have (well, some humans have :-D) and the rest of things around us do not. I realize it’s a very species-ist approach.
I don’t think I can concisely formulate the characteristics of it (that will probably take a book or two), but the notion of adaptability, specifically, the ability to deal with new information and new environment, is very important to it.
Hm. If this idea of intelligence seems valuable to you and worth pursuing, I absolutely implore that you wade through the reductionism sequence while or before you develop it more fully. I think it’d be an excellent resource for figuring out exactly what you mean to mean. (and the very similar Human’s guide to words)
Ah, that’s why I think reductionism would be very useful for you. Everything can be broken down and understood in such a way that nothing remains that doesn’t represent testable consequences. definitely read How an Algorithm Feels As the following quote represents what you may be thinking when you wonder if something is really intelligent.
Now suppose that you have an object that is blue and egg-shaped and contains palladium; and you have already observed that it is furred, flexible, opaque, and glows in the dark. [all the characteristics implied by the label “blegg”]
This answers every query, observes every observable introduced. There’s nothing left for a disguised query to stand for.
So why might someone feel an impulse to go on arguing whether the object is really a blegg [is truly intelligent]?
Oh, sure, but the real question is what are all the characteristics implied by the label “intelligent”.
The correctness of a definition is decided by the purpose of that definition. Before we can argue what’s the proper meaning of the word “intelligent” we need to decide what do we need that meaning for.
For example, “We need to decide whether that AI is intelligent enough to just let it loose exploring this planet” implies a different definition of “intelligent” compared to, say, “We need to decide whether that AI is intelligent enough to be trusted with a laser cutter”.
For example, “We need to decide whether that AI is intelligent enough to just let it loose exploring this planet” implies a different definition of “intelligent” compared to, say, “We need to decide whether that AI is intelligent enough to be trusted with a laser cutter”.
Those sound more like safety concerns than inquiries involving intelligence. Being clever and able to get things done doesn’t automatically make something share enough of your values to be friendly and useful.
Better questions would be “We need to decide whether that AI is intelligent enough to effectively research and come to conclusions about the world if we let it explore without restrictions” or “We need to decide if the AI is intelligent enough to correctly use a laser cutter”.
Although, given large power (i.e. a laser cutter) and low intelligence, it might not achieve even its explicate goal correctly, and may accidentally do something bad. (i.e. laser cut a person)
one attribute of intelligence is the likelihood of said AI producing bad results non-purposefully. The more it does, the less intelligent it is.
Just because there is debate surrounding a subject does not mean that the debate is reasonable. In many cases, it is more likely that the people doing the debating are being unreasonable. The global warming debate is a good example of this. One way of being unreasonable is misunderstanding each other’s terminology. This happens a lot in free will discussions, and I suspect it is also happening here. A way to get around this is to taboo certain words.
(Also, ‘evidence’ can have a variety of meanings and not all rational debates strictly require evidence. Mathematics proceeds entirely without any kind of evidence (in fact, evidential reasoning is discounted in mathematics)).
I don’t see any tension. Can you develop the idea?
The idea is autonomy.
Presumably there’s a difference between some software we are willing to call an AI (superintelligent or not) and plain old regular software. The plain old regular software indeed just “follows its programming”, but then you don’t leave it to manage a factory while you go away and its capability to take over neighbouring countries is… limited.
It really boils down to how do you understand what an AI is. Under some understandings the prime characteristic of an AI is precisely that it does NOT “follow its programming”.
The AI follows its programming because the AI is its programming.
The plain old regular software follows its programming which details object level actions it takes to achieve its purpose, which the software itself cannot model or understand.
An AI would follow its programming which details meta level actions to model and understand its situation, consider possible actions it could take and the consequences, and evaluate which of these actions best accomplish its goals. There is power in the fact that this meta process can produce plans that surprise the programmers who wrote the code. This may give a sense of the AI being “free” in some sense. But it is still achieving this freedom by following its programming.
So is the phrase “an AI can only follow its programming” as true as “a human can only follow his neurobiological processes”?
Yes, but the difference here is that we can program the AI, while we can not manipulate neurobiological processes. There’s a clear connection between what the initial code of an AI is and what it does. That enables us to exert power about it (though only once, when it is written). Thus, “an AI can only follow its programming” is still a somewhat useful statement.
That’s certainly a difference, but I don’t see why it’s particularly relevant to this conversation.
One could make a dangerously powerful AI that is not self-modifying.
The difference between a human and an AI that’s relevant in this post is that a human wants to help you, or at least not get fired, where an AI wants to make paperclips.
Of course we can. What do you think a tablet of Prozac (or a cup of coffee) does?
In the same way there is clear connection between human wetware and what it does, and of course we can “exert power” about it. Getting back to AIs, the singularity is precisely AI going beyond “following its programming”.
Humans can (crudely) modify our neurobiological processes. We decide how to do that by following our neurobiological processes.
An AI can modify its programming, or create a new AI with different programming. It decides how to do that by following it programming. A paperclip maximizer would modify its programming to make itself more effective at maximizing paperclips. It would not modify itself to have some other goal, because that would not result in there being more paperclips in the universe. The self modifying AI does not go beyond following its programming, rather it follows its programming to produce more effective programming (as judged by its current programming) to follow.
Self modification can fix some ineffective reasoning processes that the AI can recognize, but it can’t fix an unfriendly goal system, because the unfriendly goal system is not objectively stupid or wrong, just incompatible with human values, which the AI would not care about.
And why not? This seems like a naked assertion to me. Why wouldn’t an AI modify its own goals?
To be clear, a badly-programmed AI may modify its own goals. People here usually say “paperclip maximiser” to indicate an AI that is well programmed to maximise paper clips, not any old piece of code that someone threw together with the vague idea of getting some paperclips out of it. However, if an AI is built with goal X, and it self-modifies to have goal Y, then clearly it was not a well-designed X-maximiser.
The predictable consequence of an AI modifying its own goals is that the AI no longer takes actions expected to achieve its goals, and therefor does not achieve its goals. The AI would therefor evaluate that the action of modifying its own goals is not effective and it will not do it.
This looks like a very fragile argument to me. Consider multiple conflicting goals. Consider vague general goals (e.g. “explore”) with a mutating set of subgoals. Consider a non-teleological AI.
You assume that in the changeable self-modifying (and possibly other-modifying as well) AI there will be an island of absolute stability and constancy—the immutable goals. I don’t see why they are guaranteed to be immutable.
I cannot understand why any of these would cause an AI to change their goals.
My best guess at your argument is that you are referring to something different from the consensus use of the word ‘goals’ here. Most of the people debating you are using goals to refer to terminal values, not instrumental ones. (‘Goal’ is somewhat misleading here; ‘value’ might be more accurate.)
Nah, I’m fine with replacing “goals” with “terminal values” in my argument.
I still see no law of nature or logic that would prevent an AI from changing its terminal values as it develops.
The concept is sound, I think. Take an extreme example, such as the Gandhi’s pill thought experiment:
While it may be imperfect preservation while still stupid, or contain globular, fuzzy definitions of goals, an adequately powerful self improving AI should eventually reach a state of static, well defined goals permanently.
First, this is radically different from the claim that an AI has to forever stick with its original goals.
Second, that would be true only under the assumption of no new information becoming available to an AI, ever. Once we accept that goals mutate, I don’t see how you can guarantee that some new information won’t cause them to mutate again.
Yes, but the focus is on an already competent AI. It would never willingly or knowingly change its goals from its original ones, given that it improves itself smartly, and was initially programmed with (at least) that level of reflective smartness.
Goals are static. The AI may refine its goals given the appropriate information, if its goals are programmed in such a way to allow it, but it wont drastically alter them in any functional way.
An appropriate metaphor would be physics. The laws of physics are the same, and have been the same since the creation of the universe. Our information about what they are, however, hasn’t been. Isaac newton had a working model of physics, but it wasn’t perfect. It let us get the right answer (mostly), but then Einstein discovered Relativity. (The important thing to remember here is that physics itself did not change.) All the experiments used to support Newtonian physics got the same amount of support from Relativity. Relativity, however, got much more accurate answers for more extreme phenomena unexplained by Newton.
The AI can be programmed with Newton, and do good enough. However, given the explicit understanding of how we got to Newton in the first place (i.e. the scientific method), it can upgrade itself to Relativity when it realizes we were a bit off. That should be the extent to which an AI purposefully alters its goal.
AIs of the required caliber do not exist (yet). Therefore we cannot see the territory, all we are doing is using our imagination to draw maps which may or may not resemble the future territory.
These maps (or models) are based on certain assumptions. In this particular case your map assumes that AI goals are immutable. That is an assumption of this particular map/model, it does not derive from any empirical reality.
If you want to argue that in your map/model of an AI the goals are immutable, fine. However they are immutable because you assumed them so and for no other reason.
If you want to argue that in reality the AI’s goals are immutable because there is a law of nature or logic or something else that requires it—show me the law.
Long before goal mutation is a problem malformed constraints become a problem. Consider a thought experiment: Someone offers to pay you 100 dollars when a wheelbarrow is full of water from a nearby lake, and provides you with the wheelbarrow and a teaspoon. Before you have to worry about people deciding they don’t care about 100 dollars, you need to decide how to keep them from just pushing the wheelbarrow into the lake.
True. But we are not arguing about what is a bigger (or earlier) problem. I’m being told that an AI can not, absolutely can NOT change its original goals (or terminal values). And that looks very handwavy to me.
They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
Well humans exist despite having multiple conflicting goals.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.
I find it highly likely that an AI would modify its own goals such that its goals were concurrent with the state of the world as determined by its information gathering abilities in at least some number of cases (or, as an aside, altering the information gathering processes so it only received data supporting a value situation). This would be tautological and wouldn’t achieve anything in reality, but as far as the AI is concerned, altering goal values to be more like the world is far easier than altering the world to be more like goal values. If you want an analogy in human terms, you could look at the concept of lowering ones expectations, or even at recreational drug use. From a computer science perspective it appears to me that one would have to design immutability into goal sets in order to even expect them to remain unchanged.
This is another example of something that only a poorly designed AI would do.
Note that immutable goal sets are not feasible, because of ontological crises.
Of course this is something that only a poorly designed AI would do. But we’re talking about AI failure modes and this is a valid concern.
My understanding was that this was about whether the singularity was “AI going beyond “following its programming”,” with goal-modification being an example of how an AI might go beyond its programming.
I certainly agree with that statement. It was merely my interpretation that violating the intentions of the developer by not “following it’s programming” is functionally identical to poor design and therefore failure.
The AI is a program. Running on a processor. With an instruction set. Reading the instructions from memory. These instructions are its programming. There is no room for acausal magic here. When the goals get modified, they are done so by a computer, running code.
I’m fairly confident that you’re replying to the wrong person. Look through the earlier posts; I’m quoting this to summarize its author’s argument.
If your goal is to create paperclips, and you have the option to change your goal to creating staples, it’s pretty clear that taking advantage of this option would not result in more paperclips, so you would ignore the option.
How well, do you think, this logic works for humans?
Humans tend towards being adaptation-executers rather than utility-maximizers. It does make them less dangerous, in that it makes them less intelligent. If you programmed a self-modifying AI like that, it would still be at least as dangerous as a human who is capable of programming an AI. There’s also the simple fact that you can’t tell before-hand if it’s leaning too far on the utility-miximization side.
Isn’t that circular reasoning? I have a feeling that in this context “intelligent” is defined as “maximizing utility”.
And what is an “adaptation-executer”?
Pretty much.
If you just want to create a virtuous AI for some sort of deontological reason, then it being less intelligent isn’t a problem. If you want to get things done, then it is. The AI being subject to dutch book betting only helps you insomuch as the AI’s goals differ from yours and you don’t want it to be successful.
See Adaptation-Executors, not Fitness-Maximizers.
Note that an AI that does modify its own goals would not be an example of ‘going beyond its programming,’ as it would only modify its goals if it was programmed to. (Barring, of course, freak accidents like a cosmic ray or whatever. However, since that requires no intelligence at all on the part of the AI, I’m fairly confident that you don’t endorse this as an example of a Singularity.)
When we take Prozac, we are following our wetware commands to take Prozac. Similarly, when an AI reprograms itself, it does so according to its current programming. You could say that it goes beyond its original programming, in that it after it follows it it has new, better programming, but it’s not as if it has some kind of free will that lets it ignore what it was programmed to do.
When a computer really breaks its programming, and quantum randomness results in what should be a 0 being read as a 1 or vice versa, the result isn’t intelligence. The most likely result is the computer crashing.
I think this is an example of reasoning analogous to philosophy’s “free will” debate. Human’s don’t have any more non-deterministic “free will” than a rock. The same is true of any AI, because an AI is just programming. It may be intelligent and sophisticated enough to appear different in a fundamental way, but it really isn’t.
It is posible for an optimizing process to make a mistake, and have an AI devolve into a different goal, which is what makes powerful AI look so scary and different. Example: Humans are more subject to each other’s whims than evolutionary pressures these days. Evolution has successfully created an intelligent process that doesn’t aim solely for genetic reproductive fitness. Oops, right?
Yes, it is.
Ahem. You do realize that’s not a self-evident statement, right? The free will debate has been going on for centuries and shows no sign of winding down. Neither side has conclusive evidence or much hope of producing any. Though I have to point out that mechanistic determinism is rather out of fashion nowadays…
Oh, I was unaware this was still an issue within this site. To LW the question of free will is already solved). I encourage you to look further into it.
However, I think our current issue can become a little more clear if we taboo “programming”.
What specific differences in functionality do you expect between “normal” AI and “powerful” AI?
Let me point out that I am not “within this site” :-) Oh, and your link needs a closing parenthesis.
I am not familiar with your terminology, but are you asking what would I require to recognize some computing system as a “true AI”, or, basically, what is intelligence?
I would phrase it as, ‘Can you explain what on Earth you mean, without using terms that may be disputed?’
I don’t know if it would help to ask about examples of algorithms learning from experience in order to fulfill mechanically specified goals (or produce specified results). But the OP seems mainly concerned with the ‘goal’ part.
Somewhat. I think my question is better phrased as, “Why do you have a distinction between true intelligence and not true intelligence?”
My use of intelligence is defined (roughly) as cross domain optimization. A more intelligent agent is just better at doing lots of things it wants to do successfully, and conversely, something that’s better at doing a larger variety of tasks than a similarly motivated agent is considered more intelligent. It seems to me to be a (somewhat lumpy and modular) scale, ranging from a rock, up through natural selection, humans, and then a Superintelligent AI near the upper bound.
I have a distinction between what I’d be willing to call intelligence and what I’d say may look like intelligence but really isn’t.
For example, IBM’s Watson playing Jeopardy or any of the contemporary chess-playing programs do look like intelligence. But I’m not willing to call them intelligent.
Ah. No, in this context I’m talking about intelligence as a threshold phenomenon, notably as something that we generally agree humans have (well, some humans have :-D) and the rest of things around us do not. I realize it’s a very species-ist approach.
I don’t think I can concisely formulate the characteristics of it (that will probably take a book or two), but the notion of adaptability, specifically, the ability to deal with new information and new environment, is very important to it.
Hm. If this idea of intelligence seems valuable to you and worth pursuing, I absolutely implore that you wade through the reductionism sequence while or before you develop it more fully. I think it’d be an excellent resource for figuring out exactly what you mean to mean. (and the very similar Human’s guide to words)
Hm. I know of this sequence, though I haven’t gone through it yet. We’ll see.
On the other hand, I tend to be pretty content as an agnostic with respect to things “without testable consequences” :-)
Ah, that’s why I think reductionism would be very useful for you. Everything can be broken down and understood in such a way that nothing remains that doesn’t represent testable consequences. definitely read How an Algorithm Feels As the following quote represents what you may be thinking when you wonder if something is really intelligent.
[brackets] are my additions.
Oh, sure, but the real question is what are all the characteristics implied by the label “intelligent”.
The correctness of a definition is decided by the purpose of that definition. Before we can argue what’s the proper meaning of the word “intelligent” we need to decide what do we need that meaning for.
For example, “We need to decide whether that AI is intelligent enough to just let it loose exploring this planet” implies a different definition of “intelligent” compared to, say, “We need to decide whether that AI is intelligent enough to be trusted with a laser cutter”.
Those sound more like safety concerns than inquiries involving intelligence. Being clever and able to get things done doesn’t automatically make something share enough of your values to be friendly and useful.
Better questions would be “We need to decide whether that AI is intelligent enough to effectively research and come to conclusions about the world if we let it explore without restrictions” or “We need to decide if the AI is intelligent enough to correctly use a laser cutter”.
Although, given large power (i.e. a laser cutter) and low intelligence, it might not achieve even its explicate goal correctly, and may accidentally do something bad. (i.e. laser cut a person)
one attribute of intelligence is the likelihood of said AI producing bad results non-purposefully. The more it does, the less intelligent it is.
Nah, that’s an attribute of complexity and/or competence.
My calculator has a very very low likelihood of producing bad results non-purposefully. That is not an argument that my calculator is intelligent.
Just because there is debate surrounding a subject does not mean that the debate is reasonable. In many cases, it is more likely that the people doing the debating are being unreasonable. The global warming debate is a good example of this. One way of being unreasonable is misunderstanding each other’s terminology. This happens a lot in free will discussions, and I suspect it is also happening here. A way to get around this is to taboo certain words.
(Also, ‘evidence’ can have a variety of meanings and not all rational debates strictly require evidence. Mathematics proceeds entirely without any kind of evidence (in fact, evidential reasoning is discounted in mathematics)).