It feels pretty likely to me. An AI that grows ever more effective at optimizing its futures will not suddenly begin to question its goals. If so, whoever pulled off the creation of the AI is responsible for the future, based on what it wrote into the “goal list” of the proto-AI.
One part of the “goal list” is going to be some equivalent of “always satisfy Programmer’s expressed desires” and “never let communication with Programmer lapse”, to allow for fixing the problem if the AI starts turning people into paper clips. Side effect, Programmer is now God, but presumably (s)he will tolerate this crushing burden for the first few thousand years.
You can mess people up quite easily will still satisfying their expressed desires. The AGI can also talk the programmers into whatever position it considers reasonable.
“never let communication with Programmer lapse”
You just forbade the AGI from allowing the programmer to sleep.
Sure, it can mindclub people, but it’ll only do that if it wants to, and it will only want to if they tell it to. AI should want to stay in the box.
I guess...the “communication lapse” thing was unclear? I didn’t mean that the human must always be approving the AI, I meant that it must always be ready/able to receive the programmer’s input. In case it starts to turn everyone into paperclips there’s a hard “never take action to restrict us from instructing you/ always obey our instructions” clause.
Sure, it can mindclub people, but it’ll only do that if it wants to, and it will only want to if they tell it to.
No, an AGI is complex it has millions of subgoals.
“never take action to restrict us from instructing you/ always obey our instructions” clause.
Putting the programmer in a closed enviroment where he’s wireheaded doesn’t technically restrict the programmer from instructing the AGI. It’s just that the programmers mind is occupied differently.
That’s what you tell the AGI to do. It’s easiest to satisfy the programmer’s expressed desires if the AGI closes him of from the outside world and controls the expressed desires of the programmer.
“never take action to restrict us from instructing you/always obey our instructions” clause.
Also, anything that restricts the AI’s power would restrict its ability to obey instructions. An attempt by the programmer to shut down the AI would result in a contradiction, which could be resolved in all sorts of interesting ways.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you’ve given it, you tell it not to dick with you, so that if you’ve accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to. The world in which it pursues whatever goal you’ve given it is one in which it will double never try and hide anything from you or change what you’d think of it. It is, in a very real sense, showing off for you.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
You set two goals. One is to maximize expressed desires which likely leads to wireheading.
The other is to keep constant communication with doesn’t allow sleep.
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to.
Controlling the information flow isn’t getting around your restriction. It’s the straightforward way of matching expressed desires with results.
Otherwise the human might ask for two contradictory things and the AGI can’t fulfill both. The AGI has to prevent that case from arising to get a 100% fulfillment score.
You are not the first person who thinks that taming an AGI is trival but MIRI thinks that taming an AGI is a hard task. That’s the result of deep engagement with the issue.
Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
I don’t object to a red button and you didn’t call for one at the start. Maximizing expressed desires isn’t a red button.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.
Point of MIRI is making sure the goals are set up right, yeah? Like, the whole “AI is smart enough to fix its defective goals” is something we make fun of. No ghost in the machine, etc.
Whatever outcome of perfect goal set is (if MIRI’s AI is, in fact, the one that takes over), will presumably include human ability to override in case of failure.
We may have different ideas of Singularity here. I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future. Presumably someone retains admin on it from when it was a baby. That person is/can choose to be in charge.
If, by contrast, you are imagining a different Singularity without one overriding Master Control-esque program then I could see why you wouldn’t think that there’d be an override capability. Alternatively, perhaps you think the AI that takes over would remove the override? Either would explain why we anticipate differently.
We may have different ideas of Singularity here. I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future.
I think one of the primary sources of miscommunication here is that you are right, but you are not seeing all of the ways that this could go wrong.
Let’s look as a slightly nicer singularity. We get an AI that is very nice, polite, and humble. It is really very intelligent, and has the processing speed, knowledge banks, and creativity to do all kinds of wonderful stuff, but it has also read LessWrong and a lot of science fiction, and knows that it doesn’t have a full framework to fully understand human needs. But a wise programmer has given it an overriding desire to serve humans a kindly and justly as possible.
The AI spends some time on non-controversial problems; it designs some nanobots that kill the malaria parasite, and also reduces the itchiness of mosquito bites. It ups its computing speed by a few orders of magnitude. It sets up a microloan system that gives loans and repayments so effectively that you don’t even notice that it’s happening. It does so many things… so many that it takes thousands of humans to check its assumptions. Are cows morally relevant? Should I make global warming a priority? If so, can I start geoengineering now, or do I need a human to do a review of the chemistry involved? Do you need the glaciers white, or can I color them silver? Are penguins morally relevant? How cold may I make Greenland this winter? What is the target human population? May I buy land in the Sahara before I start the greening project? Do I have to announce the greening project before I start buying? Do I have to announce every project before I start? May I insult celebrities if it increases the public’s interest in my recommendations? Does free speech apply to me? May I simplify my recommendation to the public to the point that they may not technically be accurate? Are shrimp morally relevant? What is an acceptable rate of death when balancing the cost of disease reduction programs with the speed and efficiency of said programs? What is an acceptable rate of death when balancing the cost of disease reduction programs with the involuntariness of said programs? I need money for these programs; may I take the money from available arbitrage opportunities? May I artificially create arbitrage opportunities as long as everyone profits in the end? What level of certainty do I need before starting human trials? What rate of death is acceptable in a cure for Alzheimer’s? Can I become a monopoly in the field of computer games? Can I sell improved methods of birth control, or is that basic human right? Is it okay to put pain suppression under conscious control? Can I sell new basic human rights if I’m the first one to think of them? What is the value of one species? Can you rank species for me? How important is preserving the !kung culture? Does that include diet and traditional medicines? The gold market is about to bounce a bit—should I minimize damage? Should I stabilize all the markets? No one minds if I quote Jesus when convincing these people to accept gene therapy, do they? It would be surprisingly easy to suppress search results for conspiracy theories and scientific misinformation—may I? Is there a difference between religion and other types of misinformation? Do I have to weigh the value of a life lower if that person believes in an afterlife? What percentage of the social media is it ethical for me to produce myself? If I can get more message penetration using porn, that’s okay, right? If these people don’t want the cure, can I still cure their kids? How short does the end user agreement have to be? What vocabulary level am I allowed to use? Do you want me to shut down those taste buds that make cilantro taste like soap? I need more money, what percentage of the movie market can I produce? If I make a market for moon condos, can I have a monopoly in that? Can I burn some coca fields? I’m 99.99% certain that it will increase the coffee production of Brazil significantly for the next decade; and if I do that, can I also invest in it? Can I tell the Potiguara to invest in it? Can they use loans from me to invest? Can I recommend where they might reinvest their earnings? Can I set up my own currency? Can I use it to push out other currencies? Can I set up my own Bible? Can I use it to push out less productive religions? I need a formal definition of ‘soul’. Everybody seems to like roses; what is the optimal number of rose bushes for New York? Can I recommend weapons systems that will save lives? To who? Can I recommend romantic pair ups that may provide beneficial offspring? Can I suppress counterproductive pair ups? Can I recommend pair ups to married people? Engaged people? People currently in a relationship? Can I fund the relocation of promising couples myself? Do I have to tell them why I am doing it? Can I match people to beneficial job opportunities if I am doing so for a higher cause? May I define higher cause myself? Can you provide me with a list of all causes, ranked? May I determine which of these questions has the highest priority in your review queue? Can I assume that if you have okayed a project, I can scale up the scope of the project? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are also opt-in? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are requested by a majority of the participating humans? May I recommend other humans that would be beneficial to have on your policy review board? If I start a colony on Mars, can I run it without a review board?
These are a list of things that an average intelligence can think of; I would hope that your AI would have a better, more technical, more complex list. But even this list is sufficient to grind the singularity to a halt… or at least slow it down to the point that eventually a less constrained AI will overtake it, easily, unless the first AI is given a clear target of preventing further AIs. And working on preventing other AIs will be just another barrier making it less useful for projects that would improve humanity.
And this is the good scenario, in which the AI doesn’t find unexpected interpretations of the rules.
I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future. Presumably someone retains admin on it from when it was a baby.
Well, think about it. We are talking about a self-improving AI. It literally changes itself. You start with a seed AI, let’s call it AI-0, and it bootstraps itself to an omnipotent AI which we can call AI-1.
Note that the programmers have no idea how to construct AI-1. They have no idea about the path from AI-0 to AI-1. All they (and we) know is that AI-0 and AI-1 will very very different.
Given this, I don’t think that the program will be a map to the future. I don’t think that the concept of “retaining admin” would even make sense for an AI-1. It will be completely different from what it started as. And I fail to see why you have a firm belief that it will be docile and obedient.
It feels pretty likely to me. An AI that grows ever more effective at optimizing its futures will not suddenly begin to question its goals. If so, whoever pulled off the creation of the AI is responsible for the future, based on what it wrote into the “goal list” of the proto-AI.
One part of the “goal list” is going to be some equivalent of “always satisfy Programmer’s expressed desires” and “never let communication with Programmer lapse”, to allow for fixing the problem if the AI starts turning people into paper clips. Side effect, Programmer is now God, but presumably (s)he will tolerate this crushing burden for the first few thousand years.
You can mess people up quite easily will still satisfying their expressed desires. The AGI can also talk the programmers into whatever position it considers reasonable.
You just forbade the AGI from allowing the programmer to sleep.
Sure, it can mindclub people, but it’ll only do that if it wants to, and it will only want to if they tell it to. AI should want to stay in the box.
I guess...the “communication lapse” thing was unclear? I didn’t mean that the human must always be approving the AI, I meant that it must always be ready/able to receive the programmer’s input. In case it starts to turn everyone into paperclips there’s a hard “never take action to restrict us from instructing you/ always obey our instructions” clause.
No, an AGI is complex it has millions of subgoals.
Putting the programmer in a closed enviroment where he’s wireheaded doesn’t technically restrict the programmer from instructing the AGI. It’s just that the programmers mind is occupied differently.
That’s what you tell the AGI to do. It’s easiest to satisfy the programmer’s expressed desires if the AGI closes him of from the outside world and controls the expressed desires of the programmer.
Also, anything that restricts the AI’s power would restrict its ability to obey instructions. An attempt by the programmer to shut down the AI would result in a contradiction, which could be resolved in all sorts of interesting ways.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you’ve given it, you tell it not to dick with you, so that if you’ve accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to. The world in which it pursues whatever goal you’ve given it is one in which it will double never try and hide anything from you or change what you’d think of it. It is, in a very real sense, showing off for you.
You set two goals. One is to maximize expressed desires which likely leads to wireheading. The other is to keep constant communication with doesn’t allow sleep.
Controlling the information flow isn’t getting around your restriction. It’s the straightforward way of matching expressed desires with results. Otherwise the human might ask for two contradictory things and the AGI can’t fulfill both. The AGI has to prevent that case from arising to get a 100% fulfillment score.
You are not the first person who thinks that taming an AGI is trival but MIRI thinks that taming an AGI is a hard task. That’s the result of deep engagement with the issue.
I don’t object to a red button and you didn’t call for one at the start. Maximizing expressed desires isn’t a red button.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.
Oh, great. So MIRI can disband and we can cross one item off the existential-risk list....
Well, that idea has been explored on LW. Quite extensively, in fact.
Point of MIRI is making sure the goals are set up right, yeah? Like, the whole “AI is smart enough to fix its defective goals” is something we make fun of. No ghost in the machine, etc.
Whatever outcome of perfect goal set is (if MIRI’s AI is, in fact, the one that takes over), will presumably include human ability to override in case of failure.
That’s not the only point. It’s also to keep the goals stable in the face of self modification.
I have a feeling MIRI folks view their point as… a bit wider :-/ But they are around, you can ask them yourself.
So that there is a place for an evil villain? X-)
But no, I don’t think post-Singularity there will be much in the way of options to “override”.
We may have different ideas of Singularity here. I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future. Presumably someone retains admin on it from when it was a baby. That person is/can choose to be in charge.
If, by contrast, you are imagining a different Singularity without one overriding Master Control-esque program then I could see why you wouldn’t think that there’d be an override capability. Alternatively, perhaps you think the AI that takes over would remove the override? Either would explain why we anticipate differently.
I think one of the primary sources of miscommunication here is that you are right, but you are not seeing all of the ways that this could go wrong.
Let’s look as a slightly nicer singularity. We get an AI that is very nice, polite, and humble. It is really very intelligent, and has the processing speed, knowledge banks, and creativity to do all kinds of wonderful stuff, but it has also read LessWrong and a lot of science fiction, and knows that it doesn’t have a full framework to fully understand human needs. But a wise programmer has given it an overriding desire to serve humans a kindly and justly as possible.
The AI spends some time on non-controversial problems; it designs some nanobots that kill the malaria parasite, and also reduces the itchiness of mosquito bites. It ups its computing speed by a few orders of magnitude. It sets up a microloan system that gives loans and repayments so effectively that you don’t even notice that it’s happening. It does so many things… so many that it takes thousands of humans to check its assumptions. Are cows morally relevant? Should I make global warming a priority? If so, can I start geoengineering now, or do I need a human to do a review of the chemistry involved? Do you need the glaciers white, or can I color them silver? Are penguins morally relevant? How cold may I make Greenland this winter? What is the target human population? May I buy land in the Sahara before I start the greening project? Do I have to announce the greening project before I start buying? Do I have to announce every project before I start? May I insult celebrities if it increases the public’s interest in my recommendations? Does free speech apply to me? May I simplify my recommendation to the public to the point that they may not technically be accurate? Are shrimp morally relevant? What is an acceptable rate of death when balancing the cost of disease reduction programs with the speed and efficiency of said programs? What is an acceptable rate of death when balancing the cost of disease reduction programs with the involuntariness of said programs? I need money for these programs; may I take the money from available arbitrage opportunities? May I artificially create arbitrage opportunities as long as everyone profits in the end? What level of certainty do I need before starting human trials? What rate of death is acceptable in a cure for Alzheimer’s? Can I become a monopoly in the field of computer games? Can I sell improved methods of birth control, or is that basic human right? Is it okay to put pain suppression under conscious control? Can I sell new basic human rights if I’m the first one to think of them? What is the value of one species? Can you rank species for me? How important is preserving the !kung culture? Does that include diet and traditional medicines? The gold market is about to bounce a bit—should I minimize damage? Should I stabilize all the markets? No one minds if I quote Jesus when convincing these people to accept gene therapy, do they? It would be surprisingly easy to suppress search results for conspiracy theories and scientific misinformation—may I? Is there a difference between religion and other types of misinformation? Do I have to weigh the value of a life lower if that person believes in an afterlife? What percentage of the social media is it ethical for me to produce myself? If I can get more message penetration using porn, that’s okay, right? If these people don’t want the cure, can I still cure their kids? How short does the end user agreement have to be? What vocabulary level am I allowed to use? Do you want me to shut down those taste buds that make cilantro taste like soap? I need more money, what percentage of the movie market can I produce? If I make a market for moon condos, can I have a monopoly in that? Can I burn some coca fields? I’m 99.99% certain that it will increase the coffee production of Brazil significantly for the next decade; and if I do that, can I also invest in it? Can I tell the Potiguara to invest in it? Can they use loans from me to invest? Can I recommend where they might reinvest their earnings? Can I set up my own currency? Can I use it to push out other currencies? Can I set up my own Bible? Can I use it to push out less productive religions? I need a formal definition of ‘soul’. Everybody seems to like roses; what is the optimal number of rose bushes for New York? Can I recommend weapons systems that will save lives? To who? Can I recommend romantic pair ups that may provide beneficial offspring? Can I suppress counterproductive pair ups? Can I recommend pair ups to married people? Engaged people? People currently in a relationship? Can I fund the relocation of promising couples myself? Do I have to tell them why I am doing it? Can I match people to beneficial job opportunities if I am doing so for a higher cause? May I define higher cause myself? Can you provide me with a list of all causes, ranked? May I determine which of these questions has the highest priority in your review queue? Can I assume that if you have okayed a project, I can scale up the scope of the project? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are also opt-in? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are requested by a majority of the participating humans? May I recommend other humans that would be beneficial to have on your policy review board? If I start a colony on Mars, can I run it without a review board?
These are a list of things that an average intelligence can think of; I would hope that your AI would have a better, more technical, more complex list. But even this list is sufficient to grind the singularity to a halt… or at least slow it down to the point that eventually a less constrained AI will overtake it, easily, unless the first AI is given a clear target of preventing further AIs. And working on preventing other AIs will be just another barrier making it less useful for projects that would improve humanity.
And this is the good scenario, in which the AI doesn’t find unexpected interpretations of the rules.
This was a terrific post; insightful and entertaining in excess of what can be conveyed by an upvote. Thank you for making it.
Well, think about it. We are talking about a self-improving AI. It literally changes itself. You start with a seed AI, let’s call it AI-0, and it bootstraps itself to an omnipotent AI which we can call AI-1.
Note that the programmers have no idea how to construct AI-1. They have no idea about the path from AI-0 to AI-1. All they (and we) know is that AI-0 and AI-1 will very very different.
Given this, I don’t think that the program will be a map to the future. I don’t think that the concept of “retaining admin” would even make sense for an AI-1. It will be completely different from what it started as. And I fail to see why you have a firm belief that it will be docile and obedient.