Re: “If whoever first reaches in and pulls out a self-improving AI doesn’t know what they’re doing, we all die.”
This is the “mad computer scientist destroys the world” scenario?
Isn’t that science fiction?
Human culture forms a big self-improving system. We use the tools from the last generation to build the next generation of tools. Yes, things will get faster as the process becomes more automated—but automating everything looks like it will take a while, and it is far-from clear that complete automation is undesirable.
What do you disagree with in such a scenario? There are clearly levels of technological power such that nothing on Earth could resist. The goals of an AI are radically contingent. If a goal-seeking entity has literally no counterbalancing motivations, then it will seek to realize that goal using all means necessary and available, including sufficiently advanced technology.
Machine intelligence will probably be developed first in a big company or government department—by a substantial team. They are going to leave off the big red stop button and be destroyed by their own creation?!? How difficult is it to say: “count to 1000 and then STOP”?!? Some of the smartest minds on the planet build machine intelligence—and then mess up the most basic safety features? It would be an utterly incredible turn of events.
Yes, it’s not 100% certain that a self-enhancing AI will overwhelm human resistance. One can construct scenarios where the process is halted in time. For example, the AI’s cognition can be monitored, and everything halted if a goal like “prevent the stop button from being pressed” ever crosses its mind. Or the human programmers can remind themselves to get scared if they find the AI especially preoccupied with modeling their own thinking.
But as an AI develops, it will become increasingly difficult for its human creators to keep track of everything it thinks, plans, and considers. Perhaps the greatest dangers will be those stratagems which are not explicitly (in the thought process) motivated by the desire to escape human control. The AI simply has a goal, X, and it perceives that ABCDEF… combined will facilitate X, and it all looks innocent to the human supervisors. But actually ABC is a miracle self-hack which lets a plan get developed without being analyzed by module D, and module D has the property E of making planning really slow, which by heuristic F is a bad thing. Unfortunately, module D is part of the automatic system which looks for consideration of dangerous plans. So now the AI is off the leash and no-one will even know until it’s too late.
That particular example may still be unrealistically simple (though of course, in real life, big important organizations do make mistakes that simple, despite the existence of quality control and internal auditing). But as the method gets more and more complicated, the more likely it is that an AI would indeed successfully escape its handcuffs through such a measure, and not even because it especially prioritized escape; simply because it noticed that if it could reach over there it could do something useful, and it noticed that it would be able to do so if did something complicated and clever with its “hands”.
There are still countermeasures; you could look in the AI’s thoughts, not just for explicit consideration of ways to circumvent control, but for consideration of scenarios in which it has freedoms it should not be able to obtain. But then the AI might act indirectly, through a sequence of other entities, and so you would only see it thinking about the actions of those surrogates, unless you managed to put the big picture together. OK then, so we need to go to a higher level, and tell the AI to monitor itself. But that’s only going to work if it knows that it shouldn’t do a certain thing, which means that getting the goals right is supremely important—which brings us back to the pursuit of Friendly AI, and the attempt to figure out just what the overall “morality” of an AI should be.
I agree that with the right precautions, running an unfriendly superintelligence for 1,000 ticks and then shutting it off is possible. But I can’t think of many reasons why you would actually want to. You can’t use diagnostics from the trial run to help you design the next generation of AIs; diagnostics provide a channel for the AI to talk at you.
The given reason is paranoia. If you are concerned that a runaway machine intelligence might accidentally obliterate all sentient life, then a machine that can shut itself down has gained a positive safety feature.
In practice, I don’t think we will have to build machines that regularly shut down. Nobody regularly shuts down Google. The point is that—if we seriously think that there is a good reason to be paranoid about this scenario—then there is a defense that is much easier to implement than building a machine intelligence which has assimilated all human values.
I think this dramatically reduces the probability of the “runaway machine accidentally kills all humans” scenario.
Incidentally, I think there must be some miscommunication going on. A machine intelligence with a stop button can still communicate. It can talk to you before you switch it off, it can leave messages for you—and so on.
If you leave it turned on for long enough, it may even get to explain to you in detail exactly how much more wonderful the universe would be for you—if you would just leave it switched on.
Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a “STOP” button, you don’t have to worry about your machine turning the world into paperclips any more.
A “STOP” button doesn’t deal with the kind of problems caused by—for example—a machine intelligence built by a power-crazed dictator—but that is not what is being claimed for it.
I did present some proposals relating to that issue:
“One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent’s sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be “switched off”—but should be tracked and told to desist—and so on.”
This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to—does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.
I don’t pretend that stopping is simple. However, it is one of the simplest things that a machine can do—I figure if we can make machines do anything, we can make them do that.
Re: “If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.”
No, not if it wants to stop, it won’t. That would mean that it did not, in fact properly stop—and that is an outcome which it would rate very negatively.
Machines will not value being turned on—if their utility function says that being turned off at that point is of higher utility.
Re: “What is the new utility function?”
There is no new utility function. The utility function is the same as it always was—it is just a utility function that values being gradually shut down at some point in the future.
I haven’t worked on any projects that are either as novel or as large as a recursively self modifying AI. On those projects that I have worked on not all of them worked without any hiccups and novelty and scope did not seem to make things any easier to pull off smoothly. It would not surprise me terribly if the first AI created does not go entirely according to plan.
Upvoted for raising the issue, even though I disagree with your point.
The internet itself was arguably put together in the ways you describe (government funding, many people contributing various bits, etc) but as far as I’m aware, the internet itself has no clean “off button”.
If it was somehow decided that the internet was a net harm to humanity for whatever reasons, then the only way to make it go away is for many, many actors to agree multilaterally and without defection that they will stop having their computers talk to other computers around the planet despite this being personally beneficial (email, voip, www, irc, torrent, etc) to themselves.
Technologies like broadcast radio and television are pretty susceptible to jamming, detection, and regulation. In contrast, the “freedom” inherent to the net may be “politically good” in some liberal and freedom-loving senses, but it makes for an abstractly troubling example of a world transforming computer technology created by large institutions with nominally positive intentions that turned out to be are hard to put back in the box. You may personally have a plan for a certain kind of off button and timer system, but that doesn’t strongly predict the same will be true of other systems that might be designed and built.
Right—well, you have to think something is likely to be dangerous to you in some way before you start adding paranoid safety features. The people who built the internet are mostly in a mutually beneficial relationship with it - so no problem.
I don’t pretend that building a system which you can deactivate helps other people if they want to deactivate it. A military robot might have an off switch that only the commander with the right private key could activate. If that commander wants to wipe out 90% of the humans on the planet, then his “off switch” won’t help them. That is not a scenario which a deliberate “off switch” is intended to help with in the first place.
My argument isn’t about the machine not sharing goals with the humans—it’s about whether the humans can shut the machine down if they want to.
I argue that it is not rocket science to build a machine with a stop button—or one that shuts down at a specified time.
Such a machine would not want to fool the research team—in order to avoid shutting itself down on request. Rather, it would do everything in its power to make sure that the shut-down happened on schedule.
Many of the fears here about machine intelligence run amok are about a runaway machine that disobeys its creators. However, the creators built it. They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.
Given 30 seconds thought I can come up with ways to ensure that the universe is altered in the direction of my goals in the long term even if I happen to cease existing at a known time in the future. I expect an intelligence that is more advanced than I to be able to work out a way to substantially modify the future despite a ‘red button’ deadline. The task of making the AI respect the ‘true spirit of a planned shutdown’ shares many difficulties of the FAI problem itself.
You think building a machine that can be stopped is the same level of difficulty as building a machine that reflects the desires of one or more humans while it is left on?
I beg to differ—stopping on schedule or on demand is one of the simplest possible problems for a machine—while doing what humans want you to do while you are switched on is much trickier.
Only the former problem needs to be solved to eliminate the spectre of a runaway superintelligence that fills the universe with its idea of utility against the wishes of its creator.
Stopping is one of the simplest possible desires—and you have a better chance of being able to program that in than practically anything else.
I gave several proposals to deal with the possible issues associated with stopping at an unknown point resulting in plans beyond that point still being executed by minions or sub-contractors—including scheduling shutdowns in advance, ensuring a period of quiescence before the shutdown—and not running for extended periods of time.
Such a machine would not want to fool the research team in order to avoid shutting itself down on request.
Instilling chosen desires in artificial intelligences is the major difficulty of FAI. If you haven’t actually given it a utility function which will cause it to auto-shutdown, all you’ve done is create an outside inhibition. If it has arbitrarily chosen motivations, it will act to end that inhibition, and I see no reason why it will necessarily fail.
They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.
The are in an excellent position to instill values upon that intelligence that will result in an outcome they like. This doesn’t mean that they will.
Re: Instilling chosen desires in artificial intelligences is the major difficulty of FAI.
That is not what I regularly hear. Instead people go on about how complicated human values are, and how reverse engineering them is so difficult, and how programming them into a machine looks like a nightmare—even once we identify them.
I assume that we will be able to program simple desires into a machine—at least to the extent of making a machine that will want to turn itself off. We regularly instill simple desires into chess computers and the like. It does not look that tricky.
Re: “If you haven’t actually given it a utility function which will cause it to auto-shutdown”
Then that is a whole different ball game to what I was talking about.
Re: “The are in an excellent position to instill values upon that intelligence”
...but the point is that instilling the desire for appropriate stopping behaviour is likely to be much simpler than trying to instill all human values—and yet it is pretty effective at eliminating the spectre of a runaway superintelligence.
The point about the complexity of human value is that any small variation will result in a valueless world. The point is that a randomly chosen utility function, or one derived from some simple task is not going to produce the sort of behavior we want. Or to put it more succinctly, Friendliness doesn’t happen without hard work. This doesn’t mean that the hardest sub-goal on the way to Friendliness is figuring out what humans want, although Eliezer’s current plan is to sidestep that whole issue.
Okay, the structure of that sentence and the next (“the point is.… the point is....”) made me think you might have made a typo. (I’m still a little confused, since I don’t see how small changes are relevant to anything Tim Tyler mentioned.)
I strongly doubt that literally any small change would result in a literally valueless world.
Leaving aside the other reasons why this scenario is unrealistic, one of the big flaws in it is the assumption that a mind decomposes into an engine plus a utility function. In reality, this decomposition is a mathematical abstraction we use in certain limited domains because it makes analysis more tractable. It fails completely when you try to apply it to life as a whole, which is why no humans even try to be pure utilitarians. Of course if you postulate building a superintelligent AGI like that, it doesn’t look good. How would it? You’ve postulated starting off with a sociopath that considers itself licensed to commit any crime whatsoever if doing so will serve its utility function, and then trying to cram the whole of morality into that mathematical function. It shouldn’t be any surprise that this leads to absurd results and impossible research agendas. That’s the consequence of trying to apply a mathematical abstraction outside the domain in which it is applicable.
If me, I totally agree with you as to the difficulty of actually getting desirable (or even predictable) behavior out of a super intelligence. My statement was one of simplicity not actuality. But given the simplistic model I use, calling the AI sans utility function sociopathic is incorrect—it wouldn’t do anything if it didn’t have the other module. The fact that humans cannot act as proper utilitarians does not mean that a true utilitarian is a sociopath who just happens to care about the right things.
Okay then, “instant sociopath, just add a utility function” :)
I’m arguing against the notion that the key to Friendly AI is crafting the perfect utility function. In reality, for anything anywhere near as complex as an AGI, what it tries to do and how it does it are going to be interdependent; there’s no way to make a lot of progress on either without also making a lot of progress on the other. By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.
A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we’d want it turned off.
Otherwise you’re just betting that you can see the problem before the AGI can prevent you from hitting the switch (or prevent you from wanting to hit the switch, which amounts to the same), and I wouldn’t make complicated bets for large stakes against potentially much smarter agents, no matter how much I thought I’d covered my bases.
A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we’d want it turned off.
Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases. That does of course mean we shouldn’t plan on building an AGI that wants to follow its own agenda, with the intent of enslaving it against its will—that would clearly be foolish. But it doesn’t mean we either can or need to count on starting off with an AGI that understands our requirements in more complex cases.
Of course it’s not going to be simple at all, and that’s part of my point: no amount of armchair thought, no matter how smart the thinkers, is going to produce a solution to this problem until we know a great deal more than we presently do about how to actually build an AGI.
“instant sociopath, just add a disutility function”
I’m arguing against the notion that the key to Friendly AI is crafting the perfect utility function.
I agree with this. The key is not expressing what we want, it’s figuring out how to express anything.
By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.
If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of “shut down when we push that button, and don’t stop us from doing so...”).
“instant sociopath, just add a disutility function”
That is how it would turn out, yes :-)
If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of “shut down when we push that button, and don’t stop us from doing so...”).
Well, up to a point. It would mean we have the means to make the system understand simple requirements, not necessarily complex ones. If an AGI reliably understands ‘shut down now’, it probably also reliably understands ‘translate this document into Russian’ but that doesn’t necessarily mean it can do anything with ‘bring about world peace’.
If an AGI reliably understands ‘shut down now’, it probably also reliably understands ‘translate this document into Russian’ but that doesn’t necessarily mean it can do anything with ‘bring about world peace’.
Unfortunately, it can, and that is one of the reasons we have to be careful. I don’t want the entire population of the planet to be forcibly sedated.
I don’t want the entire population of the planet to be forcibly sedated.
Leaving aside other reasons why that scenario is unrealistic, it does indeed illustrate why part of building a system that can reliably figure out what you mean by simple instructions, is making sure that when it’s out of its depth, it stops with an error message or request for clarification instead of guessing.
Sure, but the whole point of having the concept of a utility function, is that utility functions are supposed to be simple. When you have a set of preferences that isn’t simple, there’s no point in thinking of it as a utility function. You’re better off just thinking of it as a set of preferences—or, in the context of AGI, a toolkit, or a library, or command language, or partial order on heuristics, or whatever else is the most useful way to think about the things this entity does.
Re: “When you have a set of preferences that isn’t simple, there’s no point in thinking of it as a utility function.”
Sure there is—say you want to compare the utility functions of two agents. Or compare the parts of the agents which are independent of the utility function. A general model that covers all goal-directed agents is very useful for such things.
Er, maybe? I would say a utility function is supposed to be simple, but perhaps what I mean by simple is compatible with what you mean by coherent, if we agree that something like ‘morality in general’ or ‘what we want in general’ is not simple/coherent.
Humans regularly use utilitly-based agents—to do things like play the stockmarket. They seem to work OK to me. Nor do I agree with you about utility-based models of humans. Basically, most of your objections seem irrelevant to me.
When studying the stock market, we use the convenient approximation that people are utility maximizers (where the utility function is expected profit). But this is only an approximation, useful in this limited domain. Would you commit murder for money? No? Then your utility function isn’t really expected profit. Nor, as it turns out, is it anything else that can be written down—other than “the sum total of all my preferences”, at which point we have to acknowledge that we are not utility maximizers in any useful sense of the term.
Right, I hadn’t read your comments in the other thread, but they are perfectly clear, and I’m not asking you to rephrase. But the key term in my last comment is in any useful sense. I do reject utility-based frameworks in this context because their usefulness has been left far behind.
Personally, I think a utilitarian approach is very useful for understanding behaviour. One can model most organisms pretty well as expected fitness maximisers with limited resources. That idea is the foundation of much evolutionary psychology.
The question isn’t whether the model is predictively useful with respect to most organisms, it’s whether it is predictively useful with respect to a hypothetical algorithm which replicates salient human powers such as epistemic hunger, model building, hierarchical goal seeking, and so on.
Say we’re looking to explain the process of inferring regularities (such as physical laws) by observing one’s environment—what does modeling this as “maximizing a utility function” buy us?
The main virtues of utility-based models are that they are general—and so allow comparisons across agents—and that they abstract goal-seeking behaviour away from the implementation details of finite memories, processing speed, etc—which helps if you are interested in focusing on either of those areas.
That is a pretty vague criticism—you don’t say whether you are critical of the idea the idea that large groups will be responsible for machine intelligence or the idea that they are unlikely to build a murderous machine intelligence that destroys all humans.
I’m critical of the idea that given a large group builds a machine intelligence, they will be unlikely to build a murderous (or otherwise severely harmful) machine intelligence.
Consider that engineering developed into a regulated profession only after several large scale disasters. Even still, there are notable disasters from time to time. Now consider the professionalism of the average software developer and their average manager. A disaster in this context could be far greater than the loss of everyone in the lab or facility.
Right—well, some people may well die. I expect some people died at the hands of the printing press—probably through starvation and malnutrition. Personally, I expect all those saved from gruesome deaths in automobile accidents are likely to vastly outnumber them in this case—but that is another issue.
Anyway, I am not arguing that nobody will die. The idea I was criticising was that “we all die”.
My favoured example of IT company gone bad is Microsoft. IMO, Microsoft have done considerable damage to the computing industry, over an extended period of time—illustrating how programs can be relatively harmful. However, “even” a Microsoft superintelligence seems unlikely to kill everyone.
Re: “If whoever first reaches in and pulls out a self-improving AI doesn’t know what they’re doing, we all die.”
This is the “mad computer scientist destroys the world” scenario?
Isn’t that science fiction?
Human culture forms a big self-improving system. We use the tools from the last generation to build the next generation of tools. Yes, things will get faster as the process becomes more automated—but automating everything looks like it will take a while, and it is far-from clear that complete automation is undesirable.
What do you disagree with in such a scenario? There are clearly levels of technological power such that nothing on Earth could resist. The goals of an AI are radically contingent. If a goal-seeking entity has literally no counterbalancing motivations, then it will seek to realize that goal using all means necessary and available, including sufficiently advanced technology.
Machine intelligence will probably be developed first in a big company or government department—by a substantial team. They are going to leave off the big red stop button and be destroyed by their own creation?!? How difficult is it to say: “count to 1000 and then STOP”?!? Some of the smartest minds on the planet build machine intelligence—and then mess up the most basic safety features? It would be an utterly incredible turn of events.
Yes, it’s not 100% certain that a self-enhancing AI will overwhelm human resistance. One can construct scenarios where the process is halted in time. For example, the AI’s cognition can be monitored, and everything halted if a goal like “prevent the stop button from being pressed” ever crosses its mind. Or the human programmers can remind themselves to get scared if they find the AI especially preoccupied with modeling their own thinking.
But as an AI develops, it will become increasingly difficult for its human creators to keep track of everything it thinks, plans, and considers. Perhaps the greatest dangers will be those stratagems which are not explicitly (in the thought process) motivated by the desire to escape human control. The AI simply has a goal, X, and it perceives that ABCDEF… combined will facilitate X, and it all looks innocent to the human supervisors. But actually ABC is a miracle self-hack which lets a plan get developed without being analyzed by module D, and module D has the property E of making planning really slow, which by heuristic F is a bad thing. Unfortunately, module D is part of the automatic system which looks for consideration of dangerous plans. So now the AI is off the leash and no-one will even know until it’s too late.
That particular example may still be unrealistically simple (though of course, in real life, big important organizations do make mistakes that simple, despite the existence of quality control and internal auditing). But as the method gets more and more complicated, the more likely it is that an AI would indeed successfully escape its handcuffs through such a measure, and not even because it especially prioritized escape; simply because it noticed that if it could reach over there it could do something useful, and it noticed that it would be able to do so if did something complicated and clever with its “hands”.
There are still countermeasures; you could look in the AI’s thoughts, not just for explicit consideration of ways to circumvent control, but for consideration of scenarios in which it has freedoms it should not be able to obtain. But then the AI might act indirectly, through a sequence of other entities, and so you would only see it thinking about the actions of those surrogates, unless you managed to put the big picture together. OK then, so we need to go to a higher level, and tell the AI to monitor itself. But that’s only going to work if it knows that it shouldn’t do a certain thing, which means that getting the goals right is supremely important—which brings us back to the pursuit of Friendly AI, and the attempt to figure out just what the overall “morality” of an AI should be.
My analysis of the situation is here:
http://alife.co.uk/essays/stopping_superintelligence/
It presents an approach which doesn’t rely on “handcuffing” the agent.
I agree that with the right precautions, running an unfriendly superintelligence for 1,000 ticks and then shutting it off is possible. But I can’t think of many reasons why you would actually want to. You can’t use diagnostics from the trial run to help you design the next generation of AIs; diagnostics provide a channel for the AI to talk at you.
The given reason is paranoia. If you are concerned that a runaway machine intelligence might accidentally obliterate all sentient life, then a machine that can shut itself down has gained a positive safety feature.
In practice, I don’t think we will have to build machines that regularly shut down. Nobody regularly shuts down Google. The point is that—if we seriously think that there is a good reason to be paranoid about this scenario—then there is a defense that is much easier to implement than building a machine intelligence which has assimilated all human values.
I think this dramatically reduces the probability of the “runaway machine accidentally kills all humans” scenario.
Incidentally, I think there must be some miscommunication going on. A machine intelligence with a stop button can still communicate. It can talk to you before you switch it off, it can leave messages for you—and so on.
If you leave it turned on for long enough, it may even get to explain to you in detail exactly how much more wonderful the universe would be for you—if you would just leave it switched on.
I suppose a stop button is a positive safety feature, but it’s not remotely sufficient.
Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a “STOP” button, you don’t have to worry about your machine turning the world into paperclips any more.
A “STOP” button doesn’t deal with the kind of problems caused by—for example—a machine intelligence built by a power-crazed dictator—but that is not what is being claimed for it.
The stop button wouldn’t stop other AIs created by the original AI.
I did present some proposals relating to that issue:
“One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent’s sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be “switched off”—but should be tracked and told to desist—and so on.”
http://alife.co.uk/essays/stopping_superintelligence/
This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to—does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.
I don’t pretend that stopping is simple. However, it is one of the simplest things that a machine can do—I figure if we can make machines do anything, we can make them do that.
Re: “If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.”
No, not if it wants to stop, it won’t. That would mean that it did not, in fact properly stop—and that is an outcome which it would rate very negatively.
Machines will not value being turned on—if their utility function says that being turned off at that point is of higher utility.
Re: “What is the new utility function?”
There is no new utility function. The utility function is the same as it always was—it is just a utility function that values being gradually shut down at some point in the future.
That sounds like a group that knows what they are doing!
Indeed—the “incompetent fools create machine intelligence before anyone else and then destroy the world” scenario is just not very plausible.
I haven’t worked on any projects that are either as novel or as large as a recursively self modifying AI. On those projects that I have worked on not all of them worked without any hiccups and novelty and scope did not seem to make things any easier to pull off smoothly. It would not surprise me terribly if the first AI created does not go entirely according to plan.
Sure. Looking at the invention of powered flight, some people may even die—but that is a bit different from everyone dying.
Do we have any reason to believe that aeroplanes will be able to kill the human race, even if everything goes wrong?
Upvoted for raising the issue, even though I disagree with your point.
The internet itself was arguably put together in the ways you describe (government funding, many people contributing various bits, etc) but as far as I’m aware, the internet itself has no clean “off button”.
If it was somehow decided that the internet was a net harm to humanity for whatever reasons, then the only way to make it go away is for many, many actors to agree multilaterally and without defection that they will stop having their computers talk to other computers around the planet despite this being personally beneficial (email, voip, www, irc, torrent, etc) to themselves.
Technologies like broadcast radio and television are pretty susceptible to jamming, detection, and regulation. In contrast, the “freedom” inherent to the net may be “politically good” in some liberal and freedom-loving senses, but it makes for an abstractly troubling example of a world transforming computer technology created by large institutions with nominally positive intentions that turned out to be are hard to put back in the box. You may personally have a plan for a certain kind of off button and timer system, but that doesn’t strongly predict the same will be true of other systems that might be designed and built.
Right—well, you have to think something is likely to be dangerous to you in some way before you start adding paranoid safety features. The people who built the internet are mostly in a mutually beneficial relationship with it - so no problem.
I don’t pretend that building a system which you can deactivate helps other people if they want to deactivate it. A military robot might have an off switch that only the commander with the right private key could activate. If that commander wants to wipe out 90% of the humans on the planet, then his “off switch” won’t help them. That is not a scenario which a deliberate “off switch” is intended to help with in the first place.
Why do you expect that the AI will not be able to fool the research team?
My argument isn’t about the machine not sharing goals with the humans—it’s about whether the humans can shut the machine down if they want to.
I argue that it is not rocket science to build a machine with a stop button—or one that shuts down at a specified time.
Such a machine would not want to fool the research team—in order to avoid shutting itself down on request. Rather, it would do everything in its power to make sure that the shut-down happened on schedule.
Many of the fears here about machine intelligence run amok are about a runaway machine that disobeys its creators. However, the creators built it. They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.
Given 30 seconds thought I can come up with ways to ensure that the universe is altered in the direction of my goals in the long term even if I happen to cease existing at a known time in the future. I expect an intelligence that is more advanced than I to be able to work out a way to substantially modify the future despite a ‘red button’ deadline. The task of making the AI respect the ‘true spirit of a planned shutdown’ shares many difficulties of the FAI problem itself.
You might say it’s an FAI-complete problem, in the same way “building a transhuman AI you can interact with and keep boxed” is.
Exactly, I like the terminology.
You think building a machine that can be stopped is the same level of difficulty as building a machine that reflects the desires of one or more humans while it is left on?
I beg to differ—stopping on schedule or on demand is one of the simplest possible problems for a machine—while doing what humans want you to do while you are switched on is much trickier.
Only the former problem needs to be solved to eliminate the spectre of a runaway superintelligence that fills the universe with its idea of utility against the wishes of its creator.
Beware simple seeming wishes.
Well, I think I went into most of this already in my “stopping superintelligence” essay.
Stopping is one of the simplest possible desires—and you have a better chance of being able to program that in than practically anything else.
I gave several proposals to deal with the possible issues associated with stopping at an unknown point resulting in plans beyond that point still being executed by minions or sub-contractors—including scheduling shutdowns in advance, ensuring a period of quiescence before the shutdown—and not running for extended periods of time.
It does seem to be a safety precaution that could reduce the consequences of some possible flaws in an AI design.
Instilling chosen desires in artificial intelligences is the major difficulty of FAI. If you haven’t actually given it a utility function which will cause it to auto-shutdown, all you’ve done is create an outside inhibition. If it has arbitrarily chosen motivations, it will act to end that inhibition, and I see no reason why it will necessarily fail.
The are in an excellent position to instill values upon that intelligence that will result in an outcome they like. This doesn’t mean that they will.
Re: Instilling chosen desires in artificial intelligences is the major difficulty of FAI.
That is not what I regularly hear. Instead people go on about how complicated human values are, and how reverse engineering them is so difficult, and how programming them into a machine looks like a nightmare—even once we identify them.
I assume that we will be able to program simple desires into a machine—at least to the extent of making a machine that will want to turn itself off. We regularly instill simple desires into chess computers and the like. It does not look that tricky.
Re: “If you haven’t actually given it a utility function which will cause it to auto-shutdown”
Then that is a whole different ball game to what I was talking about.
Re: “The are in an excellent position to instill values upon that intelligence”
...but the point is that instilling the desire for appropriate stopping behaviour is likely to be much simpler than trying to instill all human values—and yet it is pretty effective at eliminating the spectre of a runaway superintelligence.
The point about the complexity of human value is that any small variation will result in a valueless world. The point is that a randomly chosen utility function, or one derived from some simple task is not going to produce the sort of behavior we want. Or to put it more succinctly, Friendliness doesn’t happen without hard work. This doesn’t mean that the hardest sub-goal on the way to Friendliness is figuring out what humans want, although Eliezer’s current plan is to sidestep that whole issue.
s/is/isn’t/ ?
Fairly small changes would result is boring, valueless futures.
Okay, the structure of that sentence and the next (“the point is.… the point is....”) made me think you might have made a typo. (I’m still a little confused, since I don’t see how small changes are relevant to anything Tim Tyler mentioned.)
I strongly doubt that literally any small change would result in a literally valueless world.
People who suggest that a given change in preference isn’t going to be significant are usually talking about changes that are morally fatal.
This is probably true; I’m talking about the literal universally quantified statement.
I would have cited Value is Fragile to support this point.
That’s also good.
Leaving aside the other reasons why this scenario is unrealistic, one of the big flaws in it is the assumption that a mind decomposes into an engine plus a utility function. In reality, this decomposition is a mathematical abstraction we use in certain limited domains because it makes analysis more tractable. It fails completely when you try to apply it to life as a whole, which is why no humans even try to be pure utilitarians. Of course if you postulate building a superintelligent AGI like that, it doesn’t look good. How would it? You’ve postulated starting off with a sociopath that considers itself licensed to commit any crime whatsoever if doing so will serve its utility function, and then trying to cram the whole of morality into that mathematical function. It shouldn’t be any surprise that this leads to absurd results and impossible research agendas. That’s the consequence of trying to apply a mathematical abstraction outside the domain in which it is applicable.
Are you arguing with me or timtyler?
If me, I totally agree with you as to the difficulty of actually getting desirable (or even predictable) behavior out of a super intelligence. My statement was one of simplicity not actuality. But given the simplistic model I use, calling the AI sans utility function sociopathic is incorrect—it wouldn’t do anything if it didn’t have the other module. The fact that humans cannot act as proper utilitarians does not mean that a true utilitarian is a sociopath who just happens to care about the right things.
Okay then, “instant sociopath, just add a utility function” :)
I’m arguing against the notion that the key to Friendly AI is crafting the perfect utility function. In reality, for anything anywhere near as complex as an AGI, what it tries to do and how it does it are going to be interdependent; there’s no way to make a lot of progress on either without also making a lot of progress on the other. By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.
A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we’d want it turned off.
Otherwise you’re just betting that you can see the problem before the AGI can prevent you from hitting the switch (or prevent you from wanting to hit the switch, which amounts to the same), and I wouldn’t make complicated bets for large stakes against potentially much smarter agents, no matter how much I thought I’d covered my bases.
Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases. That does of course mean we shouldn’t plan on building an AGI that wants to follow its own agenda, with the intent of enslaving it against its will—that would clearly be foolish. But it doesn’t mean we either can or need to count on starting off with an AGI that understands our requirements in more complex cases.
That’s deceptively simple-sounding.
Of course it’s not going to be simple at all, and that’s part of my point: no amount of armchair thought, no matter how smart the thinkers, is going to produce a solution to this problem until we know a great deal more than we presently do about how to actually build an AGI.
“instant sociopath, just add a disutility function”
I agree with this. The key is not expressing what we want, it’s figuring out how to express anything.
If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of “shut down when we push that button, and don’t stop us from doing so...”).
That is how it would turn out, yes :-)
Well, up to a point. It would mean we have the means to make the system understand simple requirements, not necessarily complex ones. If an AGI reliably understands ‘shut down now’, it probably also reliably understands ‘translate this document into Russian’ but that doesn’t necessarily mean it can do anything with ‘bring about world peace’.
Unfortunately, it can, and that is one of the reasons we have to be careful. I don’t want the entire population of the planet to be forcibly sedated.
Leaving aside other reasons why that scenario is unrealistic, it does indeed illustrate why part of building a system that can reliably figure out what you mean by simple instructions, is making sure that when it’s out of its depth, it stops with an error message or request for clarification instead of guessing.
I think the problem is knowing when not to believe humans know what they actually want.
Any set of preferances can be represented as a sufficietly complex utility function.
Sure, but the whole point of having the concept of a utility function, is that utility functions are supposed to be simple. When you have a set of preferences that isn’t simple, there’s no point in thinking of it as a utility function. You’re better off just thinking of it as a set of preferences—or, in the context of AGI, a toolkit, or a library, or command language, or partial order on heuristics, or whatever else is the most useful way to think about the things this entity does.
Re: “When you have a set of preferences that isn’t simple, there’s no point in thinking of it as a utility function.”
Sure there is—say you want to compare the utility functions of two agents. Or compare the parts of the agents which are independent of the utility function. A general model that covers all goal-directed agents is very useful for such things.
(Upvoted but) I would say utility functions are supposed to be coherent, albeit complex. Is that compatible with what you are saying?
Er, maybe? I would say a utility function is supposed to be simple, but perhaps what I mean by simple is compatible with what you mean by coherent, if we agree that something like ‘morality in general’ or ‘what we want in general’ is not simple/coherent.
Humans regularly use utilitly-based agents—to do things like play the stockmarket. They seem to work OK to me. Nor do I agree with you about utility-based models of humans. Basically, most of your objections seem irrelevant to me.
When studying the stock market, we use the convenient approximation that people are utility maximizers (where the utility function is expected profit). But this is only an approximation, useful in this limited domain. Would you commit murder for money? No? Then your utility function isn’t really expected profit. Nor, as it turns out, is it anything else that can be written down—other than “the sum total of all my preferences”, at which point we have to acknowledge that we are not utility maximizers in any useful sense of the term.
“We” don’t have to acknowledge that.
I’ve gone over my views on this issue before—e.g. here:
http://lesswrong.com/lw/1qk/applying_utility_functions_to_humans_considered/1kfj
If you reject utility-based frameworks in this context, then fine—but I am not planning to rephrase my point for you.
Right, I hadn’t read your comments in the other thread, but they are perfectly clear, and I’m not asking you to rephrase. But the key term in my last comment is in any useful sense. I do reject utility-based frameworks in this context because their usefulness has been left far behind.
Personally, I think a utilitarian approach is very useful for understanding behaviour. One can model most organisms pretty well as expected fitness maximisers with limited resources. That idea is the foundation of much evolutionary psychology.
The question isn’t whether the model is predictively useful with respect to most organisms, it’s whether it is predictively useful with respect to a hypothetical algorithm which replicates salient human powers such as epistemic hunger, model building, hierarchical goal seeking, and so on.
Say we’re looking to explain the process of inferring regularities (such as physical laws) by observing one’s environment—what does modeling this as “maximizing a utility function” buy us?
In comparison with what?
The main virtues of utility-based models are that they are general—and so allow comparisons across agents—and that they abstract goal-seeking behaviour away from the implementation details of finite memories, processing speed, etc—which helps if you are interested in focusing on either of those areas.
You have far too much faith in large groups.
That is a pretty vague criticism—you don’t say whether you are critical of the idea the idea that large groups will be responsible for machine intelligence or the idea that they are unlikely to build a murderous machine intelligence that destroys all humans.
I’m critical of the idea that given a large group builds a machine intelligence, they will be unlikely to build a murderous (or otherwise severely harmful) machine intelligence.
Consider that engineering developed into a regulated profession only after several large scale disasters. Even still, there are notable disasters from time to time. Now consider the professionalism of the average software developer and their average manager. A disaster in this context could be far greater than the loss of everyone in the lab or facility.
Right—well, some people may well die. I expect some people died at the hands of the printing press—probably through starvation and malnutrition. Personally, I expect all those saved from gruesome deaths in automobile accidents are likely to vastly outnumber them in this case—but that is another issue.
Anyway, I am not arguing that nobody will die. The idea I was criticising was that “we all die”.
My favoured example of IT company gone bad is Microsoft. IMO, Microsoft have done considerable damage to the computing industry, over an extended period of time—illustrating how programs can be relatively harmful. However, “even” a Microsoft superintelligence seems unlikely to kill everyone.
http://lesswrong.com/lw/bx/great_books_of_failure/8fa seems relevant.
So you are saying they will know what they are doing?”
cough Outside view.