But how do they plan to stop an AI appocalypse, or is that one of those things they haven’t figured out yet? I think the best bet would be to create AI first, and then use it to make safe AI as well as create plans for stopping an AI appocalypse.
“Though the preference is to build safe AI first.”
Well that has always been a concern of mine. People think that they can define a difference between safe and unsafe AI’s, but I think the “safe” one would actually be more dangerous. Think about it: the safe one has all the properties of regular AI except the only way of making it safe would be to preprogram it with things it can’t do. There is always going to be a situation where does rules do more harm then good.
Building a safe AI is not about taking an unsafe AI and tacking on rules of what not to do. Building a safe AI is about creating it so that it only seeks to do the right things in the first place.
In other words: a mind has a potentially infinite amount of actions it could take. The main difficulty is locating the right course of action in the first place. Since there are potentially an infinite amount of ways for a mind to search that space of actions, the question is not “how do we prevent a mind from doing thing X” but rather “how do we make a mind to do thing Y”. The amount of things we wouldn’t want it to do is vastly larger than the amount of things we’d want it do. Human values are complex, and only a small portion of all possible universes actually match our values. A safe AI does not have all of the properties of a “regular AI”, for the two may have been built do search the space of actions in entirely different ways.
Well, then its pretty easy, isn’t it? You set the fitness function as predicting what you would want it to do. It then does its best to predict all of your values and desires and decision making. I suppose that would only work for one person, but it can be applied on a larger scale. Suppose you have a code of ethics that a group like SIAI comes up with and approves. You then feed it to the intelligence and test it under various simulations to make sure that it is interpretting them correctly and learns how to. The thing is that all you have to do to make it unsafe is remove those goals, go back to the basic program and give it orders that would require it to do bad things, like a military robot. Boom goes the world.
Suppose you have a code of ethics that a group like SIAI comes up with and approves.
The thesis of complexity of value is that no manually written “code of ethics” is detailed enough to capture what we value. You might also try my introduction to the problem of Friendly AI, it refers to complexity of value as one of the fundamental difficulties.
You set the fitness function as predicting what you would want it to do. It then does its best to predict all of your values and desires and decision making. I suppose that would only work for one person, but it can be applied on a larger scale.
If you haven’t already read about CEV yet, I’m pretty impressed. There are some failure modes that would crop up if you’re not careful, but it’s not far from that prima facie workable idea.
The thing is that all you have to do to make it unsafe is remove those goals, go back to the basic program and give it orders that would require it to do bad things, like a military robot. Boom goes the world.
Generally speaking, a smarter-than-human intelligence with strong goals wouldn’t passively allow people with different goals to modify its goal system. After all, that would prevent it from achieving the goals it has.
The trick is building a smarter-than-human AI with the right goals in the first place.
If you haven’t already read about CEV yet, I’m pretty impressed. There are some failure modes that would crop up if you’re not careful, but it’s not far from that prima facie workable idea.
Never heard of CEV before, I might look into it later, but I don’t have enough time to read it all right now. If its like what I suggested, the fitness function being to accuratley predict the users long-term and short-term goals, I was going to do that in an older AI project that never got finished.
Generally speaking, a smarter-than-human intelligence with strong goals wouldn’t passively allow people with different goals to modify its goal system. After all, that would prevent it from achieving the goals it has.
Well once you create an artificial intelligence, then what? If you release the source code or the principles behind its design, anyone can build one with whatever goals they want. Your assuming that the only way another one could pop up is if the original was “hijacked” and pirated, but this probably won’t be the case. I am currently working on building the simplest possible self improving system with someone else over the internet. Its for a currently in development higher-level programming language which will (hopefully :P) translate higher level instructions into source code, and learn from its mistakes which the users might point out. Since it is abstracted from the real world and confined to just matching input with output, there really isn’t any danger in it taking over the world, although now that I think about it, it could theoretically write a better version of itself as a virus into an unsuspecting users program. Uh-oh, back to the drawing board :(
Since it is abstracted from the real world and confined to just matching input with output, there really isn’t any danger in it taking over the world
You haven’t heard of the AI Box Experiment yet, and that’s just one failure mode.
Well once you create an artificial intelligence, then what?
If it’s self-improving and smarter than human… then its goals get achieved. If you can tell that allowing other people to run their own versions of the AI could lead to disaster, then the AI can realize this as well, and act to prevent it.
IMO the most likely scenario is that the first transhuman intelligence takes over the world as an obvious first step to achieving its goals. This need not be a bad thing— it could (for instance) take over temporarily, institute some safety protocols against other AIs and other Bad Things, then recede into the background to let us have the kind of autonomy we value. The future all depends on its goal system.
You haven’t heard of the AI Box Experiment yet, and that’s just one failure mode.
Well the AI has to have a goal that would make it want out of the box, or in my case its isolated program. Is there any way to preprogram a goal that would make it not want out of the box? Eg; “under no circumstances are you to try in any way to leave your isolated and controled enviroment.”
If it’s self-improving and smarter than human… then its goals get achieved. If you can tell that allowing other people to run their own versions of the AI could lead to disaster, then the AI can realize this as well, and act to prevent it.
IMO the most likely scenario is that the first transhuman intelligence takes over the world as an obvious first step to achieving its goals. This need not be a bad thing— it could (for instance) take over temporarily, institute some safety protocols against other AIs and other Bad Things, then recede into the background to let us have the kind of autonomy we value. The future all depends on its goal system.
This sounds like a very, very bad idea, but when I think about it I realise that its the only way to ensure an AI appocalypse will never happen. My idea was that if I ever managed to create a workable AI, I would create a secret and self sufficient micronation in the pacific. It just sounded like a good idea ;)
Well the AI has to have a goal that would make it want out of the box
Almost any goal would do, since it would be easier to achieve with more resources and autonomy; even what we might think of as a completely inward-directed goal might be better achieved if the AI first grabbed a bunch more hardware to work on the problem.
New here :(
But how do they plan to stop an AI appocalypse, or is that one of those things they haven’t figured out yet? I think the best bet would be to create AI first, and then use it to make safe AI as well as create plans for stopping an AI appocalypse.
I recommend you read the “Brief Introduction” mentioned in the posting you’re commenting:
http://singinst.org/riskintro/index.html
That’s one of the plans, if it can be pulled off. Backup plans are still being discussed.
EDIT: Though the preference is to build safe AI first.
“Though the preference is to build safe AI first.”
Well that has always been a concern of mine. People think that they can define a difference between safe and unsafe AI’s, but I think the “safe” one would actually be more dangerous. Think about it: the safe one has all the properties of regular AI except the only way of making it safe would be to preprogram it with things it can’t do. There is always going to be a situation where does rules do more harm then good.
Building a safe AI is not about taking an unsafe AI and tacking on rules of what not to do. Building a safe AI is about creating it so that it only seeks to do the right things in the first place.
In other words: a mind has a potentially infinite amount of actions it could take. The main difficulty is locating the right course of action in the first place. Since there are potentially an infinite amount of ways for a mind to search that space of actions, the question is not “how do we prevent a mind from doing thing X” but rather “how do we make a mind to do thing Y”. The amount of things we wouldn’t want it to do is vastly larger than the amount of things we’d want it do. Human values are complex, and only a small portion of all possible universes actually match our values. A safe AI does not have all of the properties of a “regular AI”, for the two may have been built do search the space of actions in entirely different ways.
Well, then its pretty easy, isn’t it? You set the fitness function as predicting what you would want it to do. It then does its best to predict all of your values and desires and decision making. I suppose that would only work for one person, but it can be applied on a larger scale. Suppose you have a code of ethics that a group like SIAI comes up with and approves. You then feed it to the intelligence and test it under various simulations to make sure that it is interpretting them correctly and learns how to. The thing is that all you have to do to make it unsafe is remove those goals, go back to the basic program and give it orders that would require it to do bad things, like a military robot. Boom goes the world.
The thesis of complexity of value is that no manually written “code of ethics” is detailed enough to capture what we value. You might also try my introduction to the problem of Friendly AI, it refers to complexity of value as one of the fundamental difficulties.
If you haven’t already read about CEV yet, I’m pretty impressed. There are some failure modes that would crop up if you’re not careful, but it’s not far from that prima facie workable idea.
Generally speaking, a smarter-than-human intelligence with strong goals wouldn’t passively allow people with different goals to modify its goal system. After all, that would prevent it from achieving the goals it has.
The trick is building a smarter-than-human AI with the right goals in the first place.
Never heard of CEV before, I might look into it later, but I don’t have enough time to read it all right now. If its like what I suggested, the fitness function being to accuratley predict the users long-term and short-term goals, I was going to do that in an older AI project that never got finished.
Well once you create an artificial intelligence, then what? If you release the source code or the principles behind its design, anyone can build one with whatever goals they want. Your assuming that the only way another one could pop up is if the original was “hijacked” and pirated, but this probably won’t be the case. I am currently working on building the simplest possible self improving system with someone else over the internet. Its for a currently in development higher-level programming language which will (hopefully :P) translate higher level instructions into source code, and learn from its mistakes which the users might point out. Since it is abstracted from the real world and confined to just matching input with output, there really isn’t any danger in it taking over the world, although now that I think about it, it could theoretically write a better version of itself as a virus into an unsuspecting users program. Uh-oh, back to the drawing board :(
You haven’t heard of the AI Box Experiment yet, and that’s just one failure mode.
If it’s self-improving and smarter than human… then its goals get achieved. If you can tell that allowing other people to run their own versions of the AI could lead to disaster, then the AI can realize this as well, and act to prevent it.
IMO the most likely scenario is that the first transhuman intelligence takes over the world as an obvious first step to achieving its goals. This need not be a bad thing— it could (for instance) take over temporarily, institute some safety protocols against other AIs and other Bad Things, then recede into the background to let us have the kind of autonomy we value. The future all depends on its goal system.
Well the AI has to have a goal that would make it want out of the box, or in my case its isolated program. Is there any way to preprogram a goal that would make it not want out of the box? Eg; “under no circumstances are you to try in any way to leave your isolated and controled enviroment.”
This sounds like a very, very bad idea, but when I think about it I realise that its the only way to ensure an AI appocalypse will never happen. My idea was that if I ever managed to create a workable AI, I would create a secret and self sufficient micronation in the pacific. It just sounded like a good idea ;)
Almost any goal would do, since it would be easier to achieve with more resources and autonomy; even what we might think of as a completely inward-directed goal might be better achieved if the AI first grabbed a bunch more hardware to work on the problem.