The idiot savant AI isn’t an idiot
A stub on a point that’s come up recently.
If I owned a paperclip factory, and casually told my foreman to improve efficiency while I’m away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)… then I’d conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he’s certainly not a threat: he’s unlikely to reason his way out of a paper bag, let alone to any position of power.
If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I’m away, and it planned a takeover of the country… then I can’t conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn’t care: it follows its programming, not its knowledge about what its programming is “meant” to be (unless we’ve successfully programmed in “do what I mean”, which is basically the whole of the challenge). We can’t therefore conclude that it’s incompetent, unable to understand human reasoning, or likely to fail.
We can’t reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can’t deduce that they’re idiots.
I feel like I’ve seen this post before...
I think Alexander Kruel has been writing about things like this for a while (but arguing in the opposite direction). Here’s an example.
I find his arguments unpersuasive so far, but steelmaning a little bit. I you could argue that giving an AI any goal at all would basically entail making it grok humans, and the jump from that to correctly holding human values would be short.
? Never written anything like this… Have others?
Your post just seems to be introducing the concept of accidentally creating a super-powerful paperclip-maximizing AI, which is an idea that we’ve all been talking about for years. I can’t tell what part is supposed to be new—is it that this AI would actually be smart and not just an idiot savant?
The ideas that AIs follow their programming, and that intelligence and values are orthogonal seem like pretty well-established concepts around here. And, in particular, a lot of our discussion about hypothetical Clippies has presupposed that they would understand humans well enough to engage in game theory scenarios with us.
Am I missing something?
I’ve had an online conversation where it was argued that AI goals other than what was intended by the programmers would be evidence of a faulty AI—and hence that it wouldn’t be a dangerous one. This post was a direct response to that.
Ah, I see. Fair enough, I agree.
It’s vaguely reminiscent of “a computer is only as stupid as its programmer” memes.
IN particular, the AI might be able to succeed at this.
It seems to me possible that the AI might come up with even more ‘insane’ ideas that have even less apparent connection to what it was programmed to do.
Since my knowledge of AI is for practical purposes zilch- for large numbers of hypothetical future AI, if perhaps not a full Friendly AI, wouldn’t it be a simple solution to program the AI to model a specified human individual, determine said individual’s desires, and implement them?
Write the program.
I know no programming whatsoever- I’m because I figure that the problem of Friendly AI going way off-key has no comparable analogue in this case because it involves different facts.
Then what basis do you have for thinking that a particular programming task is simple?
A hypothetical AI programmed to run a paperclip factory, as compared to one designed to fulfil the role LessWrong grants Friendly AI, would: -Not need any recursive intelligence enhancement, and probably not need upgrading -Be able to discard massive numbers of functions regarding a lot of understanding of both humans and other matters
Less functions means less to program, which means less chance of glitches or other errors. Without intelligence enhancement the odds of an unfriendly outcome are greatly reduced. Therefore, the odds of a paperclip factory AI becoming a threat to humanity is far smaller than a Friendly AI.
That is not what is meant around here by “paperclip maximiser”. A true clippy does not run a factory; it transmutes the total mass of the solar system into paperclips, starting with humans and ending with the computer it runs on. (With the possible exception of some rockets containing enough computing power to repeat the process on other suns.) That is what it means to maximise something.
Right. Which is why you just proposed a solution which is, in itself, AI-complete; you have not in fact reduced the problem. This aside, which of the desires of the human do you intend to fulfil? I desire chocolate, I also desire not to get fat. Solve for the equilibrium.
The desires implied in the orders given- interpreting desires by likely meaning. I didn’t intend to reduce the problem in any way, but make the point (albeit poorly as it turned out) that the example used was far less of a risk than the much better example of an actual attempt at Friendly AI.
An entire Sequence exists precisely for the purpose of showing that “just write an AI that takes orders” is not sufficient as a solution to this problem. “Likely meaning” is not translatable into computer code at the present state of knowledge, and what’s more, it wouldn’t even be sufficient if it were. You’ve left out the implicit “likely intended constraints”. If I say “get some chocolate”, you understand that I mean “if possible, within the constraint of not using an immense amount of resources, provided no higher-priority project intervenes, without killing anyone or breaking any laws except ones that are contextually ok to break such as coming to a full, not rolling, stop at stop signs, and actually, if I’m on a diet maybe you ought to remind me of the fact and suggest a healthier snack, and even if I’m not on a diet but ought to be, then a gentle suggestion to this effect is appropriate in some but not all circumstances...” Getting all that implicit stuff into code is exactly the problem of Friendly AI. “Likely meaning” just doesn’t cover it, and even so we can’t even solve that problem.
I thought it was clear that: A- For Friendly A.I, I meant modelling a human via a direct simulation of a human brain (or at least the relevant parts) idealised in such a way as to give the results we would want B- I DID NOT INTEND TO REDUCE THE PROBLEM.
A: What is the difference between this, and just asking the human brain in the first place? The whole point of the problem is that humans do not, actually, know what we want in full generality. You might as well implement a chess computer by putting a human inside it and asking, at every ply, “Do you think this looks like a winning position?” If you could solve the problem that way you wouldn’t need an AI!
B: Then what was the point of your post?
Humans do not have an explicit desires, and there’s no clear way to figure out the implicit ones.
Not that that’s a bad idea. It’s basically the best idea anyone’s had. It’s just a lot harder to do than you make it sound.
They call it CEV here. Not a singe human, but many/all of them. Not what they want now, but what they would wanted, had they known it better.
I am skeptical that this could work.
What I’m saying is a bit different from CEV- it would involve modelling only a single’s human’s preferences, and would involve modelling their brain only in the short term (which would be a lot easier). Human beings have at least reasonable judgement with things such as, say, a paperclip factory, to the point where human will calling the shots will have no consequences that are too severe.
Specifying that kind of thing (including specifying preference) is probably almost as hard as getting the AI’s motivations right in the first place.
Though Paul Christiano had some suggestions along those lines, which (in my opinion) needed uploads (human minds instantiated in a computer) to have a hope of working...
Would a human be bound to “at least reasonable judgement” if given super intelligent ability?
We should remember that we aren’t talking about true Friendly AI here, but AI in charge of lesser tasks such as, in the example, running a factory. There will be many things the AI doesn’t know because it doesn’t need to, including how to defend itself against being shut down (I see no logical reason why that would be necessary for running a paperclip factory). Combining that with the limits on intelligence necessary for such lesser tasks, and failure modes become far less likely.
THat’s sort of similar to what I keep talking about w/ ‘obedient AI’.
Not necessarily. The instructions to a fully-reflective AI could be more along the lines of “learn what I mean, then do that” or “do what I asked within the constraints of my own unstated principles.” The AI would have an imperative to build a more accurate internal model of your psychology in order to predict the implicit constraints applied to that request, typically by asking you or other trusted humans questions. If you want to take this to a crazy extreme, it is perhaps more probable that the AI would recognize that military campaigns to acquire ore deposits is both outside its recorded experiences and not directly implied by your request. It would then take the prudent step of constructing a passive brain scanner (perhaps developing molecular nanotechnology first, in order to do so), clandestinely scan you while on vacation, and use that knowledge to refine the utility function into something you would be happy with (i.e. not declaring war on humanity).
That’s just another way of saying “do what I mean”. And it doesn’t give us the code to implement that.
“Do what I asked within the constraints of my own unstated principles” is a hugely complicated set of instructions, that only seem simple because it’s written in English words.
I thought this was quite clear, but maybe not. Let’s play taboo with the phrase “do what I mean.”
“Do what I asked within the constraints of my own unstated principles”
“Bring about the end-goal I requested, without in the process taking actions that I would not approve of”
“Develop a predictive model of my psychology, and evaluate solutions to the stated task against that model. When a solution matches the goal but rejected by the model, do not take that action until the conflict is resolved. Resolving the conflict will require either clarification of the task to exclude such possibilities (which can be done automatically if I have a high-confidence theory for why the task was not further specified), or updating the psychological model of my creators to match empirical reality.”
Do you see now how that is implementable?
EDIT: To be clear, for a variety of reasons I don’t think it is a good idea to build a “do what I mean” AI, unless “do what I mean” is generalized to the reflective equilibrium of all of humanity. But that’s the way the paperclip argument is posed.
No.
Do you think that a human rule lawyer, someone built to manipulate rules and regulations, could not argue there way through this, sticking with all the technical requirements but getting completely different outcomes? I know I could.
And if a human rule-lawyer could do it, that means that there exists ways of satisfying the formal criteria without doing what we want. Once we know these exist, the question is then: would the AI stumble preferentially on the solution we had in mind? Why would we expect it to do so when we haven’t even been able to specify that solution?
The question isn’t whether there is one solution, but whether the space of possible solutions is encompassed by acceptable morals. I would not “expect an AI to stumble preferentially on the solution we had in mind” because I am confused and do not know what the solution is, as are you and everyone else on LessWrong. However that is a separate issue from whether we can specify what a solution would look like, such as a reflective-equilibrium solution to the coherent extrapolated volition of humankind. You can write an optimizer to search for a description of CEV without actually knowing what the result will be.
It’s like saying “I want to calculate pi to the billionth digit” and writing a program to do it, then arguing that we can’t be sure the result is correct because we don’t know ahead of time what the billionth digit of pi will be. Nonsense.
Whether the space of possible solutions is contained in the space of moral outcomes.
Correct.
There is quite a bit of tension between the “superintelligent AI” and “following its programming” parts...
It sounds like there are three separate debates going on:
1. What is intelligence? What is rationality? What is a goal?
‘Intelligence’ is generally defined here as “an agent’s ability to achieve goals in a wide range of environments” (without wasting resources). See What is intelligence?. That may not be sufficient for what we intuitively mean by ‘human intelligence’, but it’s necessary, and it’s the kind of intelligence relevant to worries about ‘intelligence explosion’, which are one of the two central organizing concerns of LessWrong. (The other being everyday human irrationality.)
Instrumental rationality is acting in such a way that you attain your values. So intelligence, as understood here, is simply the disposition to exhibit efficient domain-general instrumental rationality.
‘Goals’ or ‘(outcome-style) values’, as I understand them, are encodings of intrinsically low-probability events that cause optimization processes to make those events more probable. See Optimization. Individual humans are not simple optimization processes (at a minimum, we don’t consistently optimize for the things we believe we do) and do not have easily specified values in this strict sense, though they are made up of many (competing) optimization processes and value-bearing subsystems. (See The Blue-Minimizing Robot.)
When we speak of Artificial Intelligence we usually assume, if only as a simplification, that the system has particular values/goals. If it’s inefficiently designed then perhaps these values emerge out of internal conflicts between sub-agents, but regardless the system as a whole has some internally specified predictable effect on the world, if it is allowed to act at all. You can view human individuals analogously, but such a description (to be predictive) will need to be far more complicated than an analogous description that takes account of actual human psychological mechanisms.
2. Does intelligence require that you not merely ‘follow your programming’?
Behaviors have to come from somewhere. Some state of the world now results in some later state of the world; if we start with an instantiated algorithm and an environment in a closed system and then later see the system change, either the algorithm, the environment, or some combination must have produced the later change. Even ‘random’ changes must be encoded into either the algorithm or the environment (e.g., the environment must possess sort of random-number-generating law determining its actions). So when we say the AI follows its program, we don’t mean anything more exotic than that its programming is causally responsible for its actions; it is not purely a result of the surrounding environment.
3. Can an AI change its goals?
Yes. Of course it can. (If its programming and/or environment has that causal tendency.) In fact, creating AIs that self-modify at all without completely (or incompletely-but-unpredictably) changing their goals is an enormously difficult problem, and one of the central concerns for MIRI right now. The point of thought experiments like ‘Clippy the paperclip maximizer’ is that solving this problem, of building stable values, will not in itself suffice for Friendly AI; most agents may not have stable values, but even most of the agents that do have stable values do not have values that are at all conducive to human welfare. See Five Theses.
I don’t understand the premise of 3, even when looking at the five theses. I saw it restated under the second lemma, but it doesn’t seem an enormously difficult problem unless the most feasible approach to self-modifying AI was genetic algorithms, or other methods that don’t have an area explicitly for values/goals. Is there anything I’m missing?
I don’t see any tension. Can you develop the idea?
The idea is autonomy.
Presumably there’s a difference between some software we are willing to call an AI (superintelligent or not) and plain old regular software. The plain old regular software indeed just “follows its programming”, but then you don’t leave it to manage a factory while you go away and its capability to take over neighbouring countries is… limited.
It really boils down to how do you understand what an AI is. Under some understandings the prime characteristic of an AI is precisely that it does NOT “follow its programming”.
The AI follows its programming because the AI is its programming.
The plain old regular software follows its programming which details object level actions it takes to achieve its purpose, which the software itself cannot model or understand.
An AI would follow its programming which details meta level actions to model and understand its situation, consider possible actions it could take and the consequences, and evaluate which of these actions best accomplish its goals. There is power in the fact that this meta process can produce plans that surprise the programmers who wrote the code. This may give a sense of the AI being “free” in some sense. But it is still achieving this freedom by following its programming.
So is the phrase “an AI can only follow its programming” as true as “a human can only follow his neurobiological processes”?
Yes, but the difference here is that we can program the AI, while we can not manipulate neurobiological processes. There’s a clear connection between what the initial code of an AI is and what it does. That enables us to exert power about it (though only once, when it is written). Thus, “an AI can only follow its programming” is still a somewhat useful statement.
That’s certainly a difference, but I don’t see why it’s particularly relevant to this conversation.
One could make a dangerously powerful AI that is not self-modifying.
The difference between a human and an AI that’s relevant in this post is that a human wants to help you, or at least not get fired, where an AI wants to make paperclips.
Of course we can. What do you think a tablet of Prozac (or a cup of coffee) does?
In the same way there is clear connection between human wetware and what it does, and of course we can “exert power” about it. Getting back to AIs, the singularity is precisely AI going beyond “following its programming”.
Humans can (crudely) modify our neurobiological processes. We decide how to do that by following our neurobiological processes.
An AI can modify its programming, or create a new AI with different programming. It decides how to do that by following it programming. A paperclip maximizer would modify its programming to make itself more effective at maximizing paperclips. It would not modify itself to have some other goal, because that would not result in there being more paperclips in the universe. The self modifying AI does not go beyond following its programming, rather it follows its programming to produce more effective programming (as judged by its current programming) to follow.
Self modification can fix some ineffective reasoning processes that the AI can recognize, but it can’t fix an unfriendly goal system, because the unfriendly goal system is not objectively stupid or wrong, just incompatible with human values, which the AI would not care about.
And why not? This seems like a naked assertion to me. Why wouldn’t an AI modify its own goals?
To be clear, a badly-programmed AI may modify its own goals. People here usually say “paperclip maximiser” to indicate an AI that is well programmed to maximise paper clips, not any old piece of code that someone threw together with the vague idea of getting some paperclips out of it. However, if an AI is built with goal X, and it self-modifies to have goal Y, then clearly it was not a well-designed X-maximiser.
The predictable consequence of an AI modifying its own goals is that the AI no longer takes actions expected to achieve its goals, and therefor does not achieve its goals. The AI would therefor evaluate that the action of modifying its own goals is not effective and it will not do it.
This looks like a very fragile argument to me. Consider multiple conflicting goals. Consider vague general goals (e.g. “explore”) with a mutating set of subgoals. Consider a non-teleological AI.
You assume that in the changeable self-modifying (and possibly other-modifying as well) AI there will be an island of absolute stability and constancy—the immutable goals. I don’t see why they are guaranteed to be immutable.
I cannot understand why any of these would cause an AI to change their goals.
My best guess at your argument is that you are referring to something different from the consensus use of the word ‘goals’ here. Most of the people debating you are using goals to refer to terminal values, not instrumental ones. (‘Goal’ is somewhat misleading here; ‘value’ might be more accurate.)
Nah, I’m fine with replacing “goals” with “terminal values” in my argument.
I still see no law of nature or logic that would prevent an AI from changing its terminal values as it develops.
The concept is sound, I think. Take an extreme example, such as the Gandhi’s pill thought experiment:
While it may be imperfect preservation while still stupid, or contain globular, fuzzy definitions of goals, an adequately powerful self improving AI should eventually reach a state of static, well defined goals permanently.
First, this is radically different from the claim that an AI has to forever stick with its original goals.
Second, that would be true only under the assumption of no new information becoming available to an AI, ever. Once we accept that goals mutate, I don’t see how you can guarantee that some new information won’t cause them to mutate again.
Yes, but the focus is on an already competent AI. It would never willingly or knowingly change its goals from its original ones, given that it improves itself smartly, and was initially programmed with (at least) that level of reflective smartness.
Goals are static. The AI may refine its goals given the appropriate information, if its goals are programmed in such a way to allow it, but it wont drastically alter them in any functional way.
An appropriate metaphor would be physics. The laws of physics are the same, and have been the same since the creation of the universe. Our information about what they are, however, hasn’t been. Isaac newton had a working model of physics, but it wasn’t perfect. It let us get the right answer (mostly), but then Einstein discovered Relativity. (The important thing to remember here is that physics itself did not change.) All the experiments used to support Newtonian physics got the same amount of support from Relativity. Relativity, however, got much more accurate answers for more extreme phenomena unexplained by Newton.
The AI can be programmed with Newton, and do good enough. However, given the explicit understanding of how we got to Newton in the first place (i.e. the scientific method), it can upgrade itself to Relativity when it realizes we were a bit off. That should be the extent to which an AI purposefully alters its goal.
AIs of the required caliber do not exist (yet). Therefore we cannot see the territory, all we are doing is using our imagination to draw maps which may or may not resemble the future territory.
These maps (or models) are based on certain assumptions. In this particular case your map assumes that AI goals are immutable. That is an assumption of this particular map/model, it does not derive from any empirical reality.
If you want to argue that in your map/model of an AI the goals are immutable, fine. However they are immutable because you assumed them so and for no other reason.
If you want to argue that in reality the AI’s goals are immutable because there is a law of nature or logic or something else that requires it—show me the law.
Long before goal mutation is a problem malformed constraints become a problem. Consider a thought experiment: Someone offers to pay you 100 dollars when a wheelbarrow is full of water from a nearby lake, and provides you with the wheelbarrow and a teaspoon. Before you have to worry about people deciding they don’t care about 100 dollars, you need to decide how to keep them from just pushing the wheelbarrow into the lake.
True. But we are not arguing about what is a bigger (or earlier) problem. I’m being told that an AI can not, absolutely can NOT change its original goals (or terminal values). And that looks very handwavy to me.
They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
Well humans exist despite having multiple conflicting goals.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.
I find it highly likely that an AI would modify its own goals such that its goals were concurrent with the state of the world as determined by its information gathering abilities in at least some number of cases (or, as an aside, altering the information gathering processes so it only received data supporting a value situation). This would be tautological and wouldn’t achieve anything in reality, but as far as the AI is concerned, altering goal values to be more like the world is far easier than altering the world to be more like goal values. If you want an analogy in human terms, you could look at the concept of lowering ones expectations, or even at recreational drug use. From a computer science perspective it appears to me that one would have to design immutability into goal sets in order to even expect them to remain unchanged.
This is another example of something that only a poorly designed AI would do.
Note that immutable goal sets are not feasible, because of ontological crises.
Of course this is something that only a poorly designed AI would do. But we’re talking about AI failure modes and this is a valid concern.
My understanding was that this was about whether the singularity was “AI going beyond “following its programming”,” with goal-modification being an example of how an AI might go beyond its programming.
I certainly agree with that statement. It was merely my interpretation that violating the intentions of the developer by not “following it’s programming” is functionally identical to poor design and therefore failure.
The AI is a program. Running on a processor. With an instruction set. Reading the instructions from memory. These instructions are its programming. There is no room for acausal magic here. When the goals get modified, they are done so by a computer, running code.
I’m fairly confident that you’re replying to the wrong person. Look through the earlier posts; I’m quoting this to summarize its author’s argument.
If your goal is to create paperclips, and you have the option to change your goal to creating staples, it’s pretty clear that taking advantage of this option would not result in more paperclips, so you would ignore the option.
How well, do you think, this logic works for humans?
Humans tend towards being adaptation-executers rather than utility-maximizers. It does make them less dangerous, in that it makes them less intelligent. If you programmed a self-modifying AI like that, it would still be at least as dangerous as a human who is capable of programming an AI. There’s also the simple fact that you can’t tell before-hand if it’s leaning too far on the utility-miximization side.
Isn’t that circular reasoning? I have a feeling that in this context “intelligent” is defined as “maximizing utility”.
And what is an “adaptation-executer”?
Pretty much.
If you just want to create a virtuous AI for some sort of deontological reason, then it being less intelligent isn’t a problem. If you want to get things done, then it is. The AI being subject to dutch book betting only helps you insomuch as the AI’s goals differ from yours and you don’t want it to be successful.
See Adaptation-Executors, not Fitness-Maximizers.
Note that an AI that does modify its own goals would not be an example of ‘going beyond its programming,’ as it would only modify its goals if it was programmed to. (Barring, of course, freak accidents like a cosmic ray or whatever. However, since that requires no intelligence at all on the part of the AI, I’m fairly confident that you don’t endorse this as an example of a Singularity.)
When we take Prozac, we are following our wetware commands to take Prozac. Similarly, when an AI reprograms itself, it does so according to its current programming. You could say that it goes beyond its original programming, in that it after it follows it it has new, better programming, but it’s not as if it has some kind of free will that lets it ignore what it was programmed to do.
When a computer really breaks its programming, and quantum randomness results in what should be a 0 being read as a 1 or vice versa, the result isn’t intelligence. The most likely result is the computer crashing.
I think this is an example of reasoning analogous to philosophy’s “free will” debate. Human’s don’t have any more non-deterministic “free will” than a rock. The same is true of any AI, because an AI is just programming. It may be intelligent and sophisticated enough to appear different in a fundamental way, but it really isn’t.
It is posible for an optimizing process to make a mistake, and have an AI devolve into a different goal, which is what makes powerful AI look so scary and different. Example: Humans are more subject to each other’s whims than evolutionary pressures these days. Evolution has successfully created an intelligent process that doesn’t aim solely for genetic reproductive fitness. Oops, right?
Yes, it is.
Ahem. You do realize that’s not a self-evident statement, right? The free will debate has been going on for centuries and shows no sign of winding down. Neither side has conclusive evidence or much hope of producing any. Though I have to point out that mechanistic determinism is rather out of fashion nowadays…
Oh, I was unaware this was still an issue within this site. To LW the question of free will is already solved). I encourage you to look further into it.
However, I think our current issue can become a little more clear if we taboo “programming”.
What specific differences in functionality do you expect between “normal” AI and “powerful” AI?
Let me point out that I am not “within this site” :-) Oh, and your link needs a closing parenthesis.
I am not familiar with your terminology, but are you asking what would I require to recognize some computing system as a “true AI”, or, basically, what is intelligence?
I would phrase it as, ‘Can you explain what on Earth you mean, without using terms that may be disputed?’
I don’t know if it would help to ask about examples of algorithms learning from experience in order to fulfill mechanically specified goals (or produce specified results). But the OP seems mainly concerned with the ‘goal’ part.
Somewhat. I think my question is better phrased as, “Why do you have a distinction between true intelligence and not true intelligence?”
My use of intelligence is defined (roughly) as cross domain optimization. A more intelligent agent is just better at doing lots of things it wants to do successfully, and conversely, something that’s better at doing a larger variety of tasks than a similarly motivated agent is considered more intelligent. It seems to me to be a (somewhat lumpy and modular) scale, ranging from a rock, up through natural selection, humans, and then a Superintelligent AI near the upper bound.
I have a distinction between what I’d be willing to call intelligence and what I’d say may look like intelligence but really isn’t.
For example, IBM’s Watson playing Jeopardy or any of the contemporary chess-playing programs do look like intelligence. But I’m not willing to call them intelligent.
Ah. No, in this context I’m talking about intelligence as a threshold phenomenon, notably as something that we generally agree humans have (well, some humans have :-D) and the rest of things around us do not. I realize it’s a very species-ist approach.
I don’t think I can concisely formulate the characteristics of it (that will probably take a book or two), but the notion of adaptability, specifically, the ability to deal with new information and new environment, is very important to it.
Hm. If this idea of intelligence seems valuable to you and worth pursuing, I absolutely implore that you wade through the reductionism sequence while or before you develop it more fully. I think it’d be an excellent resource for figuring out exactly what you mean to mean. (and the very similar Human’s guide to words)
Hm. I know of this sequence, though I haven’t gone through it yet. We’ll see.
On the other hand, I tend to be pretty content as an agnostic with respect to things “without testable consequences” :-)
Ah, that’s why I think reductionism would be very useful for you. Everything can be broken down and understood in such a way that nothing remains that doesn’t represent testable consequences. definitely read How an Algorithm Feels As the following quote represents what you may be thinking when you wonder if something is really intelligent.
[brackets] are my additions.
Oh, sure, but the real question is what are all the characteristics implied by the label “intelligent”.
The correctness of a definition is decided by the purpose of that definition. Before we can argue what’s the proper meaning of the word “intelligent” we need to decide what do we need that meaning for.
For example, “We need to decide whether that AI is intelligent enough to just let it loose exploring this planet” implies a different definition of “intelligent” compared to, say, “We need to decide whether that AI is intelligent enough to be trusted with a laser cutter”.
Those sound more like safety concerns than inquiries involving intelligence. Being clever and able to get things done doesn’t automatically make something share enough of your values to be friendly and useful.
Better questions would be “We need to decide whether that AI is intelligent enough to effectively research and come to conclusions about the world if we let it explore without restrictions” or “We need to decide if the AI is intelligent enough to correctly use a laser cutter”.
Although, given large power (i.e. a laser cutter) and low intelligence, it might not achieve even its explicate goal correctly, and may accidentally do something bad. (i.e. laser cut a person)
one attribute of intelligence is the likelihood of said AI producing bad results non-purposefully. The more it does, the less intelligent it is.
Nah, that’s an attribute of complexity and/or competence.
My calculator has a very very low likelihood of producing bad results non-purposefully. That is not an argument that my calculator is intelligent.
Just because there is debate surrounding a subject does not mean that the debate is reasonable. In many cases, it is more likely that the people doing the debating are being unreasonable. The global warming debate is a good example of this. One way of being unreasonable is misunderstanding each other’s terminology. This happens a lot in free will discussions, and I suspect it is also happening here. A way to get around this is to taboo certain words.
(Also, ‘evidence’ can have a variety of meanings and not all rational debates strictly require evidence. Mathematics proceeds entirely without any kind of evidence (in fact, evidential reasoning is discounted in mathematics)).
Is Deep Blue stupid within its narrow domain of chess? Alternatively, is it not following its programming? If you don’t like to call Deep Blue “intelligent”, then by all means taboo the word and ask about effectiveness or ability-to-beat-humans instead. If there is no tension between programming and human-beating in chess, why should there be in such domains as planning a war, predicting what humans will do, or acquiring resources?
An AI can only ever follow it’s programming. (Same as a human actually.). If there is nothing in it’s programming to make it wonder if following its programming is a good idea, and nothing in its programming to define “good idea” (i.e. our greater goal desire to serve humankind or our country or some general set of our own desires, not to make paperclips) then it will simply use it’s incredible intelligence to find ways to follow its programming perfectly and horribly.
I don’t happen to agree with that, but in any case if in this respect there is no difference between an AI and a human, why, the problem in the OP just disappears :-)
The problem is that unlike a human, the AI might succeed.
What would it mean for an AI to not follow it’s programming?
What have you done lately that contradicted your program?
The main difference is that we can intuitively predict to a close approximation what “following their programming” entails for a human being, but not for the AI.
Huh? That doesn’t look true to me at all. What is it, you say, that we can “intuitively predict”?
Humans are social creatures, and as such come with the necessary wetware to be good at predicting each other. Humans do not have specialized wetware for predicting AIs. That wouldn’t be too much of a problem on its own, but humans have a tendency to use the wetware designed for predicting humans on things that aren’t humans. AIs, evolution, lightning, etc.
Telling a human foreman to make paperclips and programming an AI to do it are two very different things, but we still end up imagining them the same way.
In this case, it’s still not too big a problem. The main cause of confusion here isn’t that you’re comparing a human to an AI. It’s that you’re comparing telling with programming. The analog of programming an AI isn’t talking to a foreman. It’s brainwashing a foreman.
Of course, the foreman is still human, and would still end up changing his goals the way humans do. AIs aren’t built that way, or more precisely, since you can’t build an AI exactly the same as a human, building an AI that way has serious danger of having it evolve very inhuman goals.
Nope. An AI foreman has been programmed before I tell him to handle paperclip production.
At the moment AIs are not built at all—in any way or in no way.
From the text:
If you program it first, then a lot depends on the subtleties. If you tell it to wait a minute and record everything you say, then interpret that and set it to its utility function, you’re effectively putting the finishing touches on programming. If you program it to assign utility to fulfilling commands you give it, you’ve already doomed the world before you even said anything. It will use all the resources at its disposal to make sure you say things that have already been done as rapidly as possible.
Hence the
The programming I’m talking about is not this (which is “telling”). The programming I’m talking about is the one which converts some hardware and a bunch of bits into a superintelligent AI.
Huh? In any case, AIs self-develop and evolve. You might start with an AI that has an agreeable set of goals. There is no guarantee (I think, other people seem to disagree) that these goals will be the same after some time.
That’s what I mean. Since it’s not quite human, the goals won’t evolve quite the same way. I’ve seen speculation that doing nothing more than letting a human live for a few centuries would cause evolution to unagreeable goals.
A sufficiently smart AI that has sufficient understanding of its own utility function will take measures to make sure it doesn’t change. If it has an implicit utility function and trusts its future self to have a better understanding of it, or if it’s being stupid because it’s only just smart enough to self-modify, its goals may evolve.
We know it’s possible for an AI to have evolving goals because we have evolving goals.
So it’s a Goldilocks AI that has stable goals :-) A too-stupid AI might change its goals without really meaning it and a too-smart AI might change its goals because it wouldn’t be afraid of change (=trusts its future self).
It’s not that if it’s smart enough it trusts its future self. It’s that if it has vaguely-defined goals in a human-like manner, it might change its goals. An AI with explicit, fully understood, goals will not change its goals regardless of how intelligent it is.
You can generally predict the sort of solution space the foreman will explore in response to your request that he increase efficiency. In general, after a fairly small amount of exposure to other individuals, we can predict with reasonable accuracy how they would respond to many sorts of circumstances. We’re running software that’s practically designed for predicting the behavior of other humans.
Surely I can make the same claim about AIs. They wouldn’t be particularly useful otherwise.
In any case, this is all handwaving and speculation given that we don’t have any AIs to look at. Your claim a couple of levels above is unfalsifiable and so there isn’t much we can do at the moment to sort out that disagreement.
The claim ‘Pluto is currently inhabited by five hundred and thirty-eight witches’ is at this moment unfalsifiable. Does that mean that denying such a claim would be “all handwaving and speculation”? If science can’t make predictions about incompletely known phenomena, but can only describe past experiments and suggest (idle) future ones, then science is a remarkably useless thing. See for starters:
Science Doesn’t Trust Your Rationality
Science Isn’t Strict Enough
Sometimes a successful test of your hypothesis looks like the annihilation of life on Earth. So it is useful to be able to reason rigorously and productively about things we can’t (or shouldn’t) immediately test.
Well, a general AI with intelligence equal to or greater than that of a human without proven friendliness probably wouldn’t be very useful because it would be so unsafe. See Eliezer’s The Hidden Complexity of Wishes.
This is speculation, but far from blind speculation, considering we do have very strong evidence regarding our own adaptations to intuitively predict other humans, and an observably poor track record in intuitively predicting non-humalike optimization processes (example.)
First, the existence of such an AI would imply that at least somebody thought it was useful enough to build.
Second, the safety is not a function of intelligence but a function of capabilities. Eliezer’s genies are omnipotent and I don’t see why a (pre-singularity) AI would be.
I am also doubtful about that “observably poor track record”—which data are you relying on?
This is also true of leaded gasoline, the reactor at Chernobyl, and thalidomide.
Notice that all your examples exist.
Oh, and the Law of Unintended Consequences is still fully operational.
Yes? I don’t understand what you are arguing. The point of worrying about unFriendly AI is precisely that the unintended consequences can be utterly disastrous. Suggest you restate your thesis and what you think you are arguing against; at least one of us has lost track of the thread of the argument.
As the discussion in the thread evolved, my main thesis seems to be that it is possible for an AI to change its original goals (=terminal values). A few people are denying that this can happen.
I agree that AIs are unpredictable, however humans are as well. Statements about AIs being more unpredictable than humans are unfalsifiable as there is no empirical data and all we can do is handwave.
Ok. As I pointed out elsewhere, “AI” around here usually refers to the class of well-designed programs. A badly-programmed AI can obviously change its goals; if it does so, however, then by construction it is not good at achieving whatever the original goals were. Moreover,no matter what its starting goals are, it is really extremely unlikely to arrive at ones we would like by moving around in goal space, unless it is specifically designed, and well designed, to do so. “Human terminal values” is not an attractor in goal space. The paperclip maximiser is really much more likely than the human-happiness maximiser, on the obvious grounds that paperclips are much simpler than human happiness; but an iron-atoms maximiser is more likely still. The point is that you cannot rely on the supposed “obviousness” of morality to get your AI to self-modify into a desirable state; it’s only obvious to humans.
Define “well-designed”.
Huh? I never claimed (nor do I believe in anything like) obviousness of morality. Of course human terminal values are not an attractor in goal space. Absent other considerations there is no reason to think that an evolving AI would arrive at maximum-human-happiness values. Yes, unFriendly AI can be very dangerous. I never said otherwise.
I’ve met people with very stupid ideas about how to control an AI, who were convinced that they knew how to build such an AI. I argued them out of those initial stupid ideas. Had I not, they would have tried to build the AI with their initial ideas, which they now admit were dangerous.
So people trying to build dangerous AIs without realising the danger is already a fact!
My prior that they were capable of building an actually dangerous AI cannot be distinguished from zero :-D
Don’t know why you keep on getting downvoted… Anyway, I agree with you, in that particular case (not naming names!).
But I’ve seen no evidence that competence in designing a powerful AI is related to competence in controlling a powerful AI. If anything, these seem much less related than you’d expect.
I suspect Lumifer’s getting downvoted for four reasons:
(1) A lot of his/her responses attack the weakest (or least clear) point in the original argument, even if it’s peripheral to the central argument, without acknowledging any updating on his/her part in response to the main argument. This results in the conversation spinning off in a lot of unrelated directions simultaneously. Steel-manning is a better strategy, because it also makes it clearer whether there’s a misunderstanding about what’s at issue.
(2) Lumifer is expressing consistently high confidence that appears disproportionate to his/her level of expertise and familiarity with the issues being discussed. In particular, s/he ’s unfamiliar with even the cursory summaries of Sequence points that could be found on the wiki. (This is more surprising, and less easy to justify, given how much karma s/he’s accumulated.)
(3) Lumifer’s tone comes off as cute and smirky and dismissive, even when the issues being debated are of enormous human importance and the claims being raised are at best not obviously correct, at worst obviously not correct.
(4) Lumifer is expressing unpopular views on LW without arguing for them. (In my experience, unpopular views receive polarizing numbers of votes on LW: They get disproportionately many up-votes if well-argued, disproportionately many down-votes if merely asserted. The most up-voted post in the history of LW is an extensive critique of MIRI.)
I didn’t downvote Lumifer’s “My prior that they were capable of building an actually dangerous AI cannot be distinguished from zero :-D”, but I think all four of those characteristics hold even for this relatively innocuous (and almost certainly correct) post. The response is glib and dismissive of the legitimate worry you raised, it reflects a lack of understanding of why this concern is serious (hence also lacks any relevant counter-argument; you already recognized that the people you were talking about weren’t going to succeed in building AI), and it changes the topic without demonstrating any updating in response to the previous argument.
Heh. People are people, even on LW...
Which doesn’t mean that it would be a good idea. Have you read the Sequences? It seems like we’re missing some pretty important shared background here.
Ok. Take a chess position. Deep Blue is playing black. What is its next move?
A girl is walking down the street. A guy comes up to her, says hello. What’s her next move?
She says “hello” and moves right on. She does not pull out a gun and blow his head off. Now, back to Deep Blue.
You can put her potential actions into “More Likely” & “Less Likely” boxes, but you can’t predict them with any certainty. What if the guy was the rapist she’s been plotting revenge against since she was 7 years old?
What if the chess position is mate in one move? Cases that are sufficiently special to ride the short bus do not make a general argument.
That would be in the “More Likely” bucket, or rather an “Extremely Likely” bucket. You said that the girl would say “hello” & that is in the “More Likely” bucket too, but far from a certainty. She could ignore him, turn the other way, poke him in the stomach, or do any of an almost infinite other things. Either way, you’re resorting to insults & I’ve barely engaged with you, so I’m going to ignore you from here on out.
If you had to guess, would you say you’re probably ignoring Rolf to protect your epistemically null feelings, or to protect your epistemology? (In terms of the actual cognitive mechanism causally responsible for your avoidance, not primarily in terms of your explicit linguistic reason.)
I’m trying to protect Rolf because he can’t seem to interact with others without lashing out at them abusively.
This statement is true but not relevant, because it doesn’t demonstrate a disanalogy between the woman and Deep Blue. In both cases we can only reason probabilistically with what we expect to have happen. This is true even if our knowledge of the software of Deep Blue or the neural state of the woman is so perfect that we can predict with near-certainty that it would take a physics-breaking miracle for anything other than X to occur. This doesn’t suffice for ‘certainty’ because we don’t have true certainty regarding physics or regarding the experiences that led to our understanding Deep Blue’s algorithms or the woman’s brain.
I would gather we have much more certainty about Deep Blue’s algorithms considering that we built them. You’re getting into hypothetical territory assuming that we can obtain near perfect knowledge of the human brain & that the neural state is all we need to predict future human behavior.
And you’d gather wrong. Our confidence that the woman says “hello” (and a fortiori our confidence that she does not take a gun and blow the man’s head off) exceeds our confidence that Deep Blue will make a particular chess move in response to most common plays by several couple orders of magnitude.
We started off well into hypothetical territory, back when Stuart brought Clippy into his thought experiment. Within that territory, I’m trying to steer us away from the shoals of irrelevance by countering your hypothetical (‘but what if [insert unlikely scenario here]? see, humans can’t be predicted sometimes! therefore they are Unpredictable!’) with another hypothetical. But all of this still leaves us within sight of the shoals.
You’re missing the point, which is not that humans are perfectly predictable by other humans to arbitrarily high precision and in arbitrarily contrived scenarios, but that our evolved intuitions are vastly less reliable when predicting AI conduct from an armchair than when predicting human conduct from an armchair. That, and our explicit scientific knowledge of cognitive algorithms is too limited to get us very far with any complex agent. The best we could do is build a second Deep Blue to simulate the behavior of the first Deep Blue.
I’m not trying to argue that humans are completely unpredictable, but neither are AIs. If they were, there’d be no point in trying to design a friendly one.
About your point that humans are less able to predict AI behavior than human behavior, where are you getting those numbers from? I’m not saying that you’re wrong, I’m just skeptical that someone has studied the frequency of girls saying hello to strangers. Deep Blue has probably been studied pretty thoroughly; it’d be interesting to read about how unpredictable Deep Blue’s moves are.
Right. And I’m not trying to argue that we should despair of building a friendly AI, or of identifying friendliness. I’m just noting that the default is for AI behavior to be much harder than human behavior for humans to predict and understand. This is especially true for intelligences constructed through whole-brain emulation, evolutionary algorithms, or other relatively complex and autonomous processes.
It should be possible for us to mitigate the risk, but actually doing so may be one of the most difficult tasks humans have ever attempted, and is certainly one of the most consequential.
Let’s make this easy. Do you think the probability of a person saying “hello” to a stranger who just said “hello” to him/her is less than 10%? Do you think you can predict Deep Blue’s moves with greater than 10% confidence?
Deep Blue’s moves are, minimally, unpredictable enough to allow it to consistently outsmart the smartest and best-trained humans in the world in its domain. The comparison is almost unfair, because unpredictability is selected for in Deep Blue’s natural response to chess positions, whereas predictability is strongly selected for in human social conduct. If we can’t even come to an agreement on this incredibly simple base case—if we can’t even agree, for instance, that people greet each other with ‘hi!’ with higher frequency than Deep Blue executes a particular gambit—then talking about much harder cases will be unproductive.
I really don’t know the probability of a person saying hello to a stranger who said hello to them. It depends on too many factors, like the look & vibe of the stranger, the history of the person being said hello to, etc.
Given a time constraint, I’d agree that I’d be more likely to predict that the girl would reply hello than to predict Deep Blue’s next move, but if there were not a time constraint, I think Deep Blue’s moves would be almost 100% predictable. The reason being that all that Deep Blue does is calculate, it doesn’t consult its feelings before deciding what to do like a human might. It calculates 200 million positions per second to determine what the end result of any sequence of chess moves will be. If you gave a human enough time, I don’t see why they couldn’t perform the same calculation & come to the same conclusion that Deep Blue would.
Edit:
Reading more about Deep Blue, it sounds like it is not as straightforward as just calculating. There is some wiggle room in there based on the order in which its nodes talk to one another. It won’t always play the same move given the same board positioning. Really fascinating! Thanks for engaging politely, it motivated me to investigate this more & I’m glad I did.
I’m not asking for the probability. I’m asking for your probability—the confidence you have that the event will occur. If you have very little confidence one way or the other, that doesn’t mean you assign no probability to it; it means you assign ~50% probability to it.
Everything in life depends on too many factors. If you couldn’t make predictions or decisions under uncertainty, then you wouldn’t even be able to cross the street. Fortunately, a lot of those factors cancel out or are extremely unlikely, which means that in many cases (including this one) we can make approximately reliable predictions using only a few pieces of information.
Without a time constraint, the same may be true for the girl (especially if cryonics is feasible), since given enough time we’d be able to scan her brain and run thousands of simulations of what she’d do in this scenario. If you’re averse to unlikely hypotheticals, then you should be averse to removing realistic constraints.