As such, a sufficiently smart agent would apparently have a “DWIM” (do what the creator means) imperative built-in, which would even supersede its actually given goals—being sufficiently smart, it would understand that its goals are “wrong” (from some other agent’s point of view), and self-modify, or it would not be superintelligent.
Here is a description of a real-world AI by Microsoft’s chief AI researcher:
Without any programming, we just had an ai system that watched what people did.
For about three months.
Over the three months, the system started to learn, this is how people behave when they want to enter an elevator.
This is the type of person that wants to go to the third floor as opposed to the fourth floor.
After that training.
Period, we switched off the learning period and said go ahead and control the leaders.
Without any programming at all, the system was able to understand people’s intentions and act on their behalf.
Does it have a DWIM imperative? As far as I can tell, no. Does it have goals? As far as I can tell, no. Does it fail by absurdly misinterpreting what humans want? No.
This whole talk about goals and DWIM modules seems to miss how real world AI is developed and how natural intelligences like dogs work. Dogs can learn the owners goals and do what the owner wants. Sometimes they don’t. But they rarely maul their owners when what the owner wants it to do is to scent out drugs.
I think we need to be very careful before extrapolating from primitive elevator control systems to superintelligent AI. I don’t know how this particular elevator control system works, but probably it does have a goal, namely minimizing the time people have to wait before arriving at their target floor. If we built a superintelligent AI with this sort of goal it might have done all sorts of crazy thing. For example, it might create robots that will constantly enter and exit the elevator so their average elevator trips are very short and wipe out the human race just so they won’t interfere.
“Real world AI” is currently very far from human level intelligence, not speaking of superintelligence. Dogs can learn what their owners want but dogs already have complex brains that current technology is not able of reproducing. Dogs also require displays of strength to be obedient: they consider the owner to be their pack leader. A superintelligent dog probably won’t give a dime about his “owner’s” desires. Humans have human values, so obviously it’s not impossible to create a system that has human values. It doesn’t mean it is easy.
I think we need to be very careful before extrapolating from primitive elevator control systems to superintelligent AI.
I am extrapolating from a general trend, and not specific systems. The general trend is that newer generations of software less frequently crash or exhibit unexpected side-effects (just look at Windows 95 vs. Windows 8).
If we want to ever be able to build an AI that can take over the world then we will need to become really good at either predicting how software works or at spotting errors. In other words, if IBM Watson would have started singing, or if it got stuck on a query, then it would have lost at Jeopardy. But this trend contradicts the idea of an AI killing all humans in order to calculate 1+1. If we are bad enough at software engineering to miss such failure modes then we won’t be good enough to enable our software to take over the world.
In other words, you’re saying that if someone is smart enough to build a superintelligent AI, she should be smart enough it make it friendly.
Well, firstly this claim doesn’t imply we should be researching FAI and/or that MIRI’s work is superfluous. It just implies that nobody will build a superintelligent AI before the problem of friendliness is solved.
Secondly, I’m not at all convinced this claim is true. It sounds like saying “if they are smart enough to build the Chernobyl nuclear power plant, they are smart enough to make it safe”. But they weren’t.
Improvement in software quality is probably due to improvement in design and testing methodologies and tools, response to increasing market expectations etc. I wouldn’t count on these effects to safe-guard against an existential catastrophe. If a piece of software is buggy, it becomes less likely to be released. If an AI has a poorly designed utility function but a perfectly designed decision engine, there might be no time to pull the plug. The product manager won’t stop the release because the software will release itself.
If growth of intelligence due to self-improvement is a slow process than the creators of the AI will have time to respond and fix the problems. However, if “AI foom” is real, they won’t have time to do it. One moment it’s a harmless robot driving around the room and building castles from colorful cubes. Another moment the whole galaxy is on its way to become a pile of toy castles.
The engineers who build the first superintelligent AI might simply lack the imagination to believe it will really become superintelligent. Imagine one of them inventing a genius mathematical theory of self-improving intelligent systems. Suppose she never heard about AI existential risks etc. Will she automatically think “hmm, once I implement this theory the AI will become so powerful it will paperclip the universe”? I seriously doubt it. More likely it would be “wow, that formula came out really neat, I wonder how good my software will become once I code it in”. I know I would think it. But then, maybe I’m just too stupid to build an AGI...
Feedback systems are much more powerful in existing intelligences. I don’t know if you ever played Black and White but it had an explicitly learning through experience based AI. And it was very easy to accidentally train it to constantly eat poop or run back and forth stupidly. An elevator control module is very very simple: It has a set of options of floors to go to, and that’s it. It’s barely capable of doing anything actively bad. But what if a few days a week some kids had come into the office building and rode the elevator up and down for a few hours for fun? It might learn that kids love going to all sorts of random floors. This would be relatively easy to fix, but only because the system is so insanely simple and it’s very clear to see when it’s acting up.
Here is a description of a real-world AI by Microsoft’s chief AI researcher:
Does it have a DWIM imperative? As far as I can tell, no. Does it have goals? As far as I can tell, no. Does it fail by absurdly misinterpreting what humans want? No.
This whole talk about goals and DWIM modules seems to miss how real world AI is developed and how natural intelligences like dogs work. Dogs can learn the owners goals and do what the owner wants. Sometimes they don’t. But they rarely maul their owners when what the owner wants it to do is to scent out drugs.
I think we need to be very careful before extrapolating from primitive elevator control systems to superintelligent AI. I don’t know how this particular elevator control system works, but probably it does have a goal, namely minimizing the time people have to wait before arriving at their target floor. If we built a superintelligent AI with this sort of goal it might have done all sorts of crazy thing. For example, it might create robots that will constantly enter and exit the elevator so their average elevator trips are very short and wipe out the human race just so they won’t interfere.
“Real world AI” is currently very far from human level intelligence, not speaking of superintelligence. Dogs can learn what their owners want but dogs already have complex brains that current technology is not able of reproducing. Dogs also require displays of strength to be obedient: they consider the owner to be their pack leader. A superintelligent dog probably won’t give a dime about his “owner’s” desires. Humans have human values, so obviously it’s not impossible to create a system that has human values. It doesn’t mean it is easy.
I am extrapolating from a general trend, and not specific systems. The general trend is that newer generations of software less frequently crash or exhibit unexpected side-effects (just look at Windows 95 vs. Windows 8).
If we want to ever be able to build an AI that can take over the world then we will need to become really good at either predicting how software works or at spotting errors. In other words, if IBM Watson would have started singing, or if it got stuck on a query, then it would have lost at Jeopardy. But this trend contradicts the idea of an AI killing all humans in order to calculate 1+1. If we are bad enough at software engineering to miss such failure modes then we won’t be good enough to enable our software to take over the world.
In other words, you’re saying that if someone is smart enough to build a superintelligent AI, she should be smart enough it make it friendly.
Well, firstly this claim doesn’t imply we should be researching FAI and/or that MIRI’s work is superfluous. It just implies that nobody will build a superintelligent AI before the problem of friendliness is solved.
Secondly, I’m not at all convinced this claim is true. It sounds like saying “if they are smart enough to build the Chernobyl nuclear power plant, they are smart enough to make it safe”. But they weren’t.
Improvement in software quality is probably due to improvement in design and testing methodologies and tools, response to increasing market expectations etc. I wouldn’t count on these effects to safe-guard against an existential catastrophe. If a piece of software is buggy, it becomes less likely to be released. If an AI has a poorly designed utility function but a perfectly designed decision engine, there might be no time to pull the plug. The product manager won’t stop the release because the software will release itself.
If growth of intelligence due to self-improvement is a slow process than the creators of the AI will have time to respond and fix the problems. However, if “AI foom” is real, they won’t have time to do it. One moment it’s a harmless robot driving around the room and building castles from colorful cubes. Another moment the whole galaxy is on its way to become a pile of toy castles.
The engineers who build the first superintelligent AI might simply lack the imagination to believe it will really become superintelligent. Imagine one of them inventing a genius mathematical theory of self-improving intelligent systems. Suppose she never heard about AI existential risks etc. Will she automatically think “hmm, once I implement this theory the AI will become so powerful it will paperclip the universe”? I seriously doubt it. More likely it would be “wow, that formula came out really neat, I wonder how good my software will become once I code it in”. I know I would think it. But then, maybe I’m just too stupid to build an AGI...
Feedback systems are much more powerful in existing intelligences. I don’t know if you ever played Black and White but it had an explicitly learning through experience based AI. And it was very easy to accidentally train it to constantly eat poop or run back and forth stupidly. An elevator control module is very very simple: It has a set of options of floors to go to, and that’s it. It’s barely capable of doing anything actively bad. But what if a few days a week some kids had come into the office building and rode the elevator up and down for a few hours for fun? It might learn that kids love going to all sorts of random floors. This would be relatively easy to fix, but only because the system is so insanely simple and it’s very clear to see when it’s acting up.