Point of MIRI is making sure the goals are set up right, yeah? Like, the whole “AI is smart enough to fix its defective goals” is something we make fun of. No ghost in the machine, etc.
Whatever outcome of perfect goal set is (if MIRI’s AI is, in fact, the one that takes over), will presumably include human ability to override in case of failure.
We may have different ideas of Singularity here. I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future. Presumably someone retains admin on it from when it was a baby. That person is/can choose to be in charge.
If, by contrast, you are imagining a different Singularity without one overriding Master Control-esque program then I could see why you wouldn’t think that there’d be an override capability. Alternatively, perhaps you think the AI that takes over would remove the override? Either would explain why we anticipate differently.
We may have different ideas of Singularity here. I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future.
I think one of the primary sources of miscommunication here is that you are right, but you are not seeing all of the ways that this could go wrong.
Let’s look as a slightly nicer singularity. We get an AI that is very nice, polite, and humble. It is really very intelligent, and has the processing speed, knowledge banks, and creativity to do all kinds of wonderful stuff, but it has also read LessWrong and a lot of science fiction, and knows that it doesn’t have a full framework to fully understand human needs. But a wise programmer has given it an overriding desire to serve humans a kindly and justly as possible.
The AI spends some time on non-controversial problems; it designs some nanobots that kill the malaria parasite, and also reduces the itchiness of mosquito bites. It ups its computing speed by a few orders of magnitude. It sets up a microloan system that gives loans and repayments so effectively that you don’t even notice that it’s happening. It does so many things… so many that it takes thousands of humans to check its assumptions. Are cows morally relevant? Should I make global warming a priority? If so, can I start geoengineering now, or do I need a human to do a review of the chemistry involved? Do you need the glaciers white, or can I color them silver? Are penguins morally relevant? How cold may I make Greenland this winter? What is the target human population? May I buy land in the Sahara before I start the greening project? Do I have to announce the greening project before I start buying? Do I have to announce every project before I start? May I insult celebrities if it increases the public’s interest in my recommendations? Does free speech apply to me? May I simplify my recommendation to the public to the point that they may not technically be accurate? Are shrimp morally relevant? What is an acceptable rate of death when balancing the cost of disease reduction programs with the speed and efficiency of said programs? What is an acceptable rate of death when balancing the cost of disease reduction programs with the involuntariness of said programs? I need money for these programs; may I take the money from available arbitrage opportunities? May I artificially create arbitrage opportunities as long as everyone profits in the end? What level of certainty do I need before starting human trials? What rate of death is acceptable in a cure for Alzheimer’s? Can I become a monopoly in the field of computer games? Can I sell improved methods of birth control, or is that basic human right? Is it okay to put pain suppression under conscious control? Can I sell new basic human rights if I’m the first one to think of them? What is the value of one species? Can you rank species for me? How important is preserving the !kung culture? Does that include diet and traditional medicines? The gold market is about to bounce a bit—should I minimize damage? Should I stabilize all the markets? No one minds if I quote Jesus when convincing these people to accept gene therapy, do they? It would be surprisingly easy to suppress search results for conspiracy theories and scientific misinformation—may I? Is there a difference between religion and other types of misinformation? Do I have to weigh the value of a life lower if that person believes in an afterlife? What percentage of the social media is it ethical for me to produce myself? If I can get more message penetration using porn, that’s okay, right? If these people don’t want the cure, can I still cure their kids? How short does the end user agreement have to be? What vocabulary level am I allowed to use? Do you want me to shut down those taste buds that make cilantro taste like soap? I need more money, what percentage of the movie market can I produce? If I make a market for moon condos, can I have a monopoly in that? Can I burn some coca fields? I’m 99.99% certain that it will increase the coffee production of Brazil significantly for the next decade; and if I do that, can I also invest in it? Can I tell the Potiguara to invest in it? Can they use loans from me to invest? Can I recommend where they might reinvest their earnings? Can I set up my own currency? Can I use it to push out other currencies? Can I set up my own Bible? Can I use it to push out less productive religions? I need a formal definition of ‘soul’. Everybody seems to like roses; what is the optimal number of rose bushes for New York? Can I recommend weapons systems that will save lives? To who? Can I recommend romantic pair ups that may provide beneficial offspring? Can I suppress counterproductive pair ups? Can I recommend pair ups to married people? Engaged people? People currently in a relationship? Can I fund the relocation of promising couples myself? Do I have to tell them why I am doing it? Can I match people to beneficial job opportunities if I am doing so for a higher cause? May I define higher cause myself? Can you provide me with a list of all causes, ranked? May I determine which of these questions has the highest priority in your review queue? Can I assume that if you have okayed a project, I can scale up the scope of the project? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are also opt-in? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are requested by a majority of the participating humans? May I recommend other humans that would be beneficial to have on your policy review board? If I start a colony on Mars, can I run it without a review board?
These are a list of things that an average intelligence can think of; I would hope that your AI would have a better, more technical, more complex list. But even this list is sufficient to grind the singularity to a halt… or at least slow it down to the point that eventually a less constrained AI will overtake it, easily, unless the first AI is given a clear target of preventing further AIs. And working on preventing other AIs will be just another barrier making it less useful for projects that would improve humanity.
And this is the good scenario, in which the AI doesn’t find unexpected interpretations of the rules.
I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future. Presumably someone retains admin on it from when it was a baby.
Well, think about it. We are talking about a self-improving AI. It literally changes itself. You start with a seed AI, let’s call it AI-0, and it bootstraps itself to an omnipotent AI which we can call AI-1.
Note that the programmers have no idea how to construct AI-1. They have no idea about the path from AI-0 to AI-1. All they (and we) know is that AI-0 and AI-1 will very very different.
Given this, I don’t think that the program will be a map to the future. I don’t think that the concept of “retaining admin” would even make sense for an AI-1. It will be completely different from what it started as. And I fail to see why you have a firm belief that it will be docile and obedient.
Point of MIRI is making sure the goals are set up right, yeah? Like, the whole “AI is smart enough to fix its defective goals” is something we make fun of. No ghost in the machine, etc.
Whatever outcome of perfect goal set is (if MIRI’s AI is, in fact, the one that takes over), will presumably include human ability to override in case of failure.
That’s not the only point. It’s also to keep the goals stable in the face of self modification.
I have a feeling MIRI folks view their point as… a bit wider :-/ But they are around, you can ask them yourself.
So that there is a place for an evil villain? X-)
But no, I don’t think post-Singularity there will be much in the way of options to “override”.
We may have different ideas of Singularity here. I’m picturing one AI making itself smarter until it seizes control of everything. Ergo, its program would be a map to the future. Presumably someone retains admin on it from when it was a baby. That person is/can choose to be in charge.
If, by contrast, you are imagining a different Singularity without one overriding Master Control-esque program then I could see why you wouldn’t think that there’d be an override capability. Alternatively, perhaps you think the AI that takes over would remove the override? Either would explain why we anticipate differently.
I think one of the primary sources of miscommunication here is that you are right, but you are not seeing all of the ways that this could go wrong.
Let’s look as a slightly nicer singularity. We get an AI that is very nice, polite, and humble. It is really very intelligent, and has the processing speed, knowledge banks, and creativity to do all kinds of wonderful stuff, but it has also read LessWrong and a lot of science fiction, and knows that it doesn’t have a full framework to fully understand human needs. But a wise programmer has given it an overriding desire to serve humans a kindly and justly as possible.
The AI spends some time on non-controversial problems; it designs some nanobots that kill the malaria parasite, and also reduces the itchiness of mosquito bites. It ups its computing speed by a few orders of magnitude. It sets up a microloan system that gives loans and repayments so effectively that you don’t even notice that it’s happening. It does so many things… so many that it takes thousands of humans to check its assumptions. Are cows morally relevant? Should I make global warming a priority? If so, can I start geoengineering now, or do I need a human to do a review of the chemistry involved? Do you need the glaciers white, or can I color them silver? Are penguins morally relevant? How cold may I make Greenland this winter? What is the target human population? May I buy land in the Sahara before I start the greening project? Do I have to announce the greening project before I start buying? Do I have to announce every project before I start? May I insult celebrities if it increases the public’s interest in my recommendations? Does free speech apply to me? May I simplify my recommendation to the public to the point that they may not technically be accurate? Are shrimp morally relevant? What is an acceptable rate of death when balancing the cost of disease reduction programs with the speed and efficiency of said programs? What is an acceptable rate of death when balancing the cost of disease reduction programs with the involuntariness of said programs? I need money for these programs; may I take the money from available arbitrage opportunities? May I artificially create arbitrage opportunities as long as everyone profits in the end? What level of certainty do I need before starting human trials? What rate of death is acceptable in a cure for Alzheimer’s? Can I become a monopoly in the field of computer games? Can I sell improved methods of birth control, or is that basic human right? Is it okay to put pain suppression under conscious control? Can I sell new basic human rights if I’m the first one to think of them? What is the value of one species? Can you rank species for me? How important is preserving the !kung culture? Does that include diet and traditional medicines? The gold market is about to bounce a bit—should I minimize damage? Should I stabilize all the markets? No one minds if I quote Jesus when convincing these people to accept gene therapy, do they? It would be surprisingly easy to suppress search results for conspiracy theories and scientific misinformation—may I? Is there a difference between religion and other types of misinformation? Do I have to weigh the value of a life lower if that person believes in an afterlife? What percentage of the social media is it ethical for me to produce myself? If I can get more message penetration using porn, that’s okay, right? If these people don’t want the cure, can I still cure their kids? How short does the end user agreement have to be? What vocabulary level am I allowed to use? Do you want me to shut down those taste buds that make cilantro taste like soap? I need more money, what percentage of the movie market can I produce? If I make a market for moon condos, can I have a monopoly in that? Can I burn some coca fields? I’m 99.99% certain that it will increase the coffee production of Brazil significantly for the next decade; and if I do that, can I also invest in it? Can I tell the Potiguara to invest in it? Can they use loans from me to invest? Can I recommend where they might reinvest their earnings? Can I set up my own currency? Can I use it to push out other currencies? Can I set up my own Bible? Can I use it to push out less productive religions? I need a formal definition of ‘soul’. Everybody seems to like roses; what is the optimal number of rose bushes for New York? Can I recommend weapons systems that will save lives? To who? Can I recommend romantic pair ups that may provide beneficial offspring? Can I suppress counterproductive pair ups? Can I recommend pair ups to married people? Engaged people? People currently in a relationship? Can I fund the relocation of promising couples myself? Do I have to tell them why I am doing it? Can I match people to beneficial job opportunities if I am doing so for a higher cause? May I define higher cause myself? Can you provide me with a list of all causes, ranked? May I determine which of these questions has the highest priority in your review queue? Can I assume that if you have okayed a project, I can scale up the scope of the project? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are also opt-in? Can I assume that if you have okayed a project to go ahead as long as it is opt-in that I can then make other variants of the project as long as they are requested by a majority of the participating humans? May I recommend other humans that would be beneficial to have on your policy review board? If I start a colony on Mars, can I run it without a review board?
These are a list of things that an average intelligence can think of; I would hope that your AI would have a better, more technical, more complex list. But even this list is sufficient to grind the singularity to a halt… or at least slow it down to the point that eventually a less constrained AI will overtake it, easily, unless the first AI is given a clear target of preventing further AIs. And working on preventing other AIs will be just another barrier making it less useful for projects that would improve humanity.
And this is the good scenario, in which the AI doesn’t find unexpected interpretations of the rules.
This was a terrific post; insightful and entertaining in excess of what can be conveyed by an upvote. Thank you for making it.
Well, think about it. We are talking about a self-improving AI. It literally changes itself. You start with a seed AI, let’s call it AI-0, and it bootstraps itself to an omnipotent AI which we can call AI-1.
Note that the programmers have no idea how to construct AI-1. They have no idea about the path from AI-0 to AI-1. All they (and we) know is that AI-0 and AI-1 will very very different.
Given this, I don’t think that the program will be a map to the future. I don’t think that the concept of “retaining admin” would even make sense for an AI-1. It will be completely different from what it started as. And I fail to see why you have a firm belief that it will be docile and obedient.