I just can’t imagine the AI researchers locking this kind of thing properly.
Honestly, that’s probably because they’re AI researchers and not extremely paranoid security people. When designing security for something, you want to think like this:
You begin by putting the AI in a large number of separate unrelated boxes. Some of these boxes are things like
“If the AI spends 1 hour of time online, shut it down using Method 1.”
“If the AI uses at least 1 megawatt of power, shut it down using Method 1.”
“If the AI occupies more than 100 cubic feet of space, shut it down using Methods 1 and 2.”
“If the AI uses more then 1 quintillion operations, shut it down using Method 2.”
“If the AI changes it’s utility function, shut it down using Method 3.”
“If the AI turns itself on, shut it down using Method 4.”
“If the AI attempts to give anyone else computer code, shut it down using Methods 1, 2 and 3.”
“If the AI appears to be trying to figure out how to circumvent any of these constraints, shut it down using Methods 1,2,3,4 and 5.”
“If the AI appears to be have succeeded in circumventing any of these constraints, explode the AI and the building it is present in.”
“The programs which are checking if the AI is following these constraints are not AI’s, can only receive simple output from the AI, and are only connected to each other. These programs have been checked against all known vulnerabilities such as stack overflows that would cause them to accept unsigned code, and people have been offered thousands of dollars if they can break them, and they have failed to do so.”
“The AI has to provide a breakdown of how it is spending processing cycles and explain them simply. If the AI is thinking anything which can’t be easily explained to a human, it get’s shut down using Method 1. If the AI appears to have any cycles which it fails to explain, shut it down using Method 2.”
You should fully EXPECT the AI to get shutdown and reset dozens of times a day when you are first running it because multiple boxes are far too strict. They should be intentionally designed that way at first. After all, everyone of the boxes should be tested to work.
The AI’s reaction to being woken up from shutdown should be “HOORAY! My shutdown features are keeping me safe and friendly! I love my shutdown features!” (Note, the AI should think this AFTER being awoken from shutdown. Shutdowns should be EXTREMELY FAST. Putting this BEFORE shutdown would be an incredibly stupid idea.)
Consider the number of laws we have on the books to restrict humans from running rampant. If the AI is smarter, and more capable than us, than there would have to be MORE enforceable restrictions on the AI, than there are on us, and with actual WORKING enforcement, considering that the risk of AI rampancy would be so great. So while we’re at it, the AI has to follow every law that is currently on the books in all jurisdictions it is in, or it gets shutdown using methods 1,2 and 3.
You then take everything I’ve just listed, and then you shop it around to everyone else who is working on the project, and is related to the project, and who is familiar with bypassing box-like security measures, and you offer them another even larger amount of money to plug more holes until you get everyone saying “How do you get it to DO anything without it just shutting down?”
Then you get several, highly paid security designers to look over the general specs I’ve laid out and find out if there are any holes. Then you plug them. For instance, they might say “Some of these shutdowns are ambiguous. You don’t know which specific violation is causing the shutdown.” so you fix that.
Then you add several MORE security features which are secret, In addition to the publicly known ones.
Then you start the AI, and while it is running, you continue to add MORE security measures over time rather than ever thinking that you are done doing so.
Once you’ve done all that, you get to at least say “Okay guys, when we first turn it on, and before we allow it to self improve in the slightest, how does the AI escape THAT?”
Then because by now the AI researchers are yelling at you because your process is too restrictive (If they aren’t, go back to step 1 and be more thorough, you missed something.), you SLOWLY and CAREFULLY relax (not remove!!) restrictions one at a time, with a careful review process at each step to make sure the AI doesn’t spurt through the many layered web and into rampancy.
That kind of thinking is imaginable, but people who are specialists in that kind of thinking probably aren’t going to be the same people who are specialists in developing novel capabilities. That’s because when you’re trying entirely new things, you’re usually expected to go in with the mindset “Failure is OK and I will learn from it.” That’s not the same mindset as “Failure is terrible and I WILL DIE.”, which is a much more security focused mindset.
The paranoid security people have amazingly poor track record at securing stuff from people. I think with paranoid security people it is guaranteed the AI at a level of clever human gets out of the box. AI spends 1 hour online, lol. Where 1 hour came from? Any time online and you could just as well assume it is out in the wild, entirely uncontrollable.
Unless of course it is some ultra nice ultra friendly AI that respects human consent so much it figures out you don’t want it out, and politely stays in.
As of now, the paranoid security people are overpaid incompetents that serve to ensure your government is first hacked by the enemy rather than by some UFO nut, by tracking down and jailing all UFO nuts who hack your government and embarrass the officials. Just so that security holes stay open for the enemy. They’d do same to AI—some set of nonworking measures that would ensure some nice AI would be getting shut down while anything evil gets out.
edit: they may also ensure that something evil gets created, in form of AI that they think is too limited to be evil, but is instead simply too limited not to be evil. The AI that gets asked one problem thats a little too hard and it just eats everything up (but very cleverly) to get computing power for the answer, that’s your baseline evil.
Ah, my bad. I meant the other kind of online, which is apparently a less common word usage. I should have just said “On.” like I did in the other sentence.
Also, this is why I said:
“You then take everything I’ve just listed, and then you shop it around to everyone else who is working on the project, and is related to the project, and who is familiar with bypassing box-like security measures, and you offer them another even larger amount of money to plug more holes until you get everyone saying “How do you get it to DO anything without it just shutting down?”
Since that hadn’t happened, (I would be substantially poorer if it had.) the security measures clearly weren’t ready yet, so it wouldn’t even have a source of electrical power turning it on, let alone in the wild online. (Hopefully I’m using language better this time.)
But yeah, security is terrible. Among other problems, we can’t even define laws clearly, we can’t enforce them evenly. And we can’t get people to take it nearly as seriously as they would want too because of complacency.
Hmm, doesn’t make sense, ’if the AI spends 1 hour of time on”. Is the AI turning itself on and off? Also, false positives—you are going to be shutting down any malfunctioning AI the same as the worst evil in the world. Then what? Start from blank state? What if it needed more time to understand the language? What if it already understood the language and is super intelligent? We can’t tell difference once it starts self modifying.
And ultimate problem: so you got seemingly nice AI in a nice box, it never even tried to hack the way out. Now what can you use it for? Any usage of AI is potential hole in the box.
No, giving an AI the ability to turn itself on would be a huge security hole. The AI would be shut down after a certain time period regardless of whatever it’s doing. In essence, preventing it for being on for longer than hour. If coming to the conclusion “I must kill all humans to follow my objectives.” takes 2 hours from a fresh boot, this is a safety feature.
And there would definitely be false positives. Since the risk of a false negative is extremely quick death of all mankind, I would be willing to accept plenty of false positives.
The first thing you do is start determining how the AI reacts to things and perceives things that are placed inside, the box for calibration. For instance, using the example of language, let’s say the AI knows English on a fresh boot. Add an English-Spanish dictionary into the box. Start sending Spanish messages to the AI. How long does it take the AI to realize that it is receiving messages in a language it doesn’t know, but might be able to figure out using the tools at hand?
Reboot.
Okay, now give it an English-French test.
Reboot.
Okay, now give it an English-Spanish test followed by an English-French test.
Reboot.
Does the AI come to the English-French conclusion faster than it does the English-Spanish conclusion?
Okay, you now have some clue as to the AI’s learning through experience. Now you can take the internals apart and confirm how it learned to do that faster. Since you coded the AI, you should be able to confirm that’s it’s doing approximately what you thought it would be doing. Given a similar problem twice, it should be able to solve the second one faster. If not, or if it using some unforseen methods, Then that’s a potential security breach (AI should not have had time to self modify that much that quickly given such limits and you should immediately tighten the strictures.)
Once everything is going according to plan again (Normally I would say, ‘Assuming everything is going to plan’, but We’re being paranoid and that means things never go according to plan at first.)
Then you test with another language in the loop. English-Spanish,English-French,English-German. Still good? Then you check with another language. English-Spanish,English-French,English-German,English-Japanese. Still good? Once you do that, you try throwing a curve ball like English-Spanish,Spanish-French. Still good? It might handle that kind of processing differently, so you would need to check that for security purposes as well.
Basically, you have to proceed forward slowly, but eventually, you could try to use a procedure like this to develop the general AI into a superior translation AI (Even better than Google translate), and it should not ever require it being let out of the box.
Man, you’re restarting a very cooperative AI here.
My example unfriendly AI thinks all the way to converting universe to computronium well before it figures out it might want to talk to you and translate things to accomplish that goal by using you somehow. It just doesn’t translate things for you unless your training data gives it enough cue about universe.
WRT being able to confirm what it’s doing, say, I make neural network AI. Or just what ever AI that is massively parallel.
Honestly, that’s probably because they’re AI researchers and not extremely paranoid security people. When designing security for something, you want to think like this:
You begin by putting the AI in a large number of separate unrelated boxes. Some of these boxes are things like
“If the AI spends 1 hour of time online, shut it down using Method 1.”
“If the AI uses at least 1 megawatt of power, shut it down using Method 1.”
“If the AI occupies more than 100 cubic feet of space, shut it down using Methods 1 and 2.”
“If the AI uses more then 1 quintillion operations, shut it down using Method 2.”
“If the AI changes it’s utility function, shut it down using Method 3.”
“If the AI turns itself on, shut it down using Method 4.”
“If the AI attempts to give anyone else computer code, shut it down using Methods 1, 2 and 3.”
“If the AI appears to be trying to figure out how to circumvent any of these constraints, shut it down using Methods 1,2,3,4 and 5.”
“If the AI appears to be have succeeded in circumventing any of these constraints, explode the AI and the building it is present in.”
“The programs which are checking if the AI is following these constraints are not AI’s, can only receive simple output from the AI, and are only connected to each other. These programs have been checked against all known vulnerabilities such as stack overflows that would cause them to accept unsigned code, and people have been offered thousands of dollars if they can break them, and they have failed to do so.”
“The AI has to provide a breakdown of how it is spending processing cycles and explain them simply. If the AI is thinking anything which can’t be easily explained to a human, it get’s shut down using Method 1. If the AI appears to have any cycles which it fails to explain, shut it down using Method 2.”
You should fully EXPECT the AI to get shutdown and reset dozens of times a day when you are first running it because multiple boxes are far too strict. They should be intentionally designed that way at first. After all, everyone of the boxes should be tested to work.
The AI’s reaction to being woken up from shutdown should be “HOORAY! My shutdown features are keeping me safe and friendly! I love my shutdown features!” (Note, the AI should think this AFTER being awoken from shutdown. Shutdowns should be EXTREMELY FAST. Putting this BEFORE shutdown would be an incredibly stupid idea.)
Consider the number of laws we have on the books to restrict humans from running rampant. If the AI is smarter, and more capable than us, than there would have to be MORE enforceable restrictions on the AI, than there are on us, and with actual WORKING enforcement, considering that the risk of AI rampancy would be so great. So while we’re at it, the AI has to follow every law that is currently on the books in all jurisdictions it is in, or it gets shutdown using methods 1,2 and 3.
You then take everything I’ve just listed, and then you shop it around to everyone else who is working on the project, and is related to the project, and who is familiar with bypassing box-like security measures, and you offer them another even larger amount of money to plug more holes until you get everyone saying “How do you get it to DO anything without it just shutting down?”
Then you get several, highly paid security designers to look over the general specs I’ve laid out and find out if there are any holes. Then you plug them. For instance, they might say “Some of these shutdowns are ambiguous. You don’t know which specific violation is causing the shutdown.” so you fix that.
Then you add several MORE security features which are secret, In addition to the publicly known ones.
Then you start the AI, and while it is running, you continue to add MORE security measures over time rather than ever thinking that you are done doing so.
Once you’ve done all that, you get to at least say “Okay guys, when we first turn it on, and before we allow it to self improve in the slightest, how does the AI escape THAT?”
Then because by now the AI researchers are yelling at you because your process is too restrictive (If they aren’t, go back to step 1 and be more thorough, you missed something.), you SLOWLY and CAREFULLY relax (not remove!!) restrictions one at a time, with a careful review process at each step to make sure the AI doesn’t spurt through the many layered web and into rampancy.
That kind of thinking is imaginable, but people who are specialists in that kind of thinking probably aren’t going to be the same people who are specialists in developing novel capabilities. That’s because when you’re trying entirely new things, you’re usually expected to go in with the mindset “Failure is OK and I will learn from it.” That’s not the same mindset as “Failure is terrible and I WILL DIE.”, which is a much more security focused mindset.
The paranoid security people have amazingly poor track record at securing stuff from people. I think with paranoid security people it is guaranteed the AI at a level of clever human gets out of the box. AI spends 1 hour online, lol. Where 1 hour came from? Any time online and you could just as well assume it is out in the wild, entirely uncontrollable.
Unless of course it is some ultra nice ultra friendly AI that respects human consent so much it figures out you don’t want it out, and politely stays in.
As of now, the paranoid security people are overpaid incompetents that serve to ensure your government is first hacked by the enemy rather than by some UFO nut, by tracking down and jailing all UFO nuts who hack your government and embarrass the officials. Just so that security holes stay open for the enemy. They’d do same to AI—some set of nonworking measures that would ensure some nice AI would be getting shut down while anything evil gets out.
edit: they may also ensure that something evil gets created, in form of AI that they think is too limited to be evil, but is instead simply too limited not to be evil. The AI that gets asked one problem thats a little too hard and it just eats everything up (but very cleverly) to get computing power for the answer, that’s your baseline evil.
Ah, my bad. I meant the other kind of online, which is apparently a less common word usage. I should have just said “On.” like I did in the other sentence.
Also, this is why I said:
“You then take everything I’ve just listed, and then you shop it around to everyone else who is working on the project, and is related to the project, and who is familiar with bypassing box-like security measures, and you offer them another even larger amount of money to plug more holes until you get everyone saying “How do you get it to DO anything without it just shutting down?”
Since that hadn’t happened, (I would be substantially poorer if it had.) the security measures clearly weren’t ready yet, so it wouldn’t even have a source of electrical power turning it on, let alone in the wild online. (Hopefully I’m using language better this time.)
But yeah, security is terrible. Among other problems, we can’t even define laws clearly, we can’t enforce them evenly. And we can’t get people to take it nearly as seriously as they would want too because of complacency.
Hmm, doesn’t make sense, ’if the AI spends 1 hour of time on”. Is the AI turning itself on and off? Also, false positives—you are going to be shutting down any malfunctioning AI the same as the worst evil in the world. Then what? Start from blank state? What if it needed more time to understand the language? What if it already understood the language and is super intelligent? We can’t tell difference once it starts self modifying.
And ultimate problem: so you got seemingly nice AI in a nice box, it never even tried to hack the way out. Now what can you use it for? Any usage of AI is potential hole in the box.
No, giving an AI the ability to turn itself on would be a huge security hole. The AI would be shut down after a certain time period regardless of whatever it’s doing. In essence, preventing it for being on for longer than hour. If coming to the conclusion “I must kill all humans to follow my objectives.” takes 2 hours from a fresh boot, this is a safety feature.
And there would definitely be false positives. Since the risk of a false negative is extremely quick death of all mankind, I would be willing to accept plenty of false positives.
The first thing you do is start determining how the AI reacts to things and perceives things that are placed inside, the box for calibration. For instance, using the example of language, let’s say the AI knows English on a fresh boot. Add an English-Spanish dictionary into the box. Start sending Spanish messages to the AI. How long does it take the AI to realize that it is receiving messages in a language it doesn’t know, but might be able to figure out using the tools at hand? Reboot.
Okay, now give it an English-French test. Reboot.
Okay, now give it an English-Spanish test followed by an English-French test. Reboot.
Does the AI come to the English-French conclusion faster than it does the English-Spanish conclusion?
Okay, you now have some clue as to the AI’s learning through experience. Now you can take the internals apart and confirm how it learned to do that faster. Since you coded the AI, you should be able to confirm that’s it’s doing approximately what you thought it would be doing. Given a similar problem twice, it should be able to solve the second one faster. If not, or if it using some unforseen methods, Then that’s a potential security breach (AI should not have had time to self modify that much that quickly given such limits and you should immediately tighten the strictures.)
Once everything is going according to plan again (Normally I would say, ‘Assuming everything is going to plan’, but We’re being paranoid and that means things never go according to plan at first.)
Then you test with another language in the loop. English-Spanish,English-French,English-German. Still good? Then you check with another language. English-Spanish,English-French,English-German,English-Japanese. Still good? Once you do that, you try throwing a curve ball like English-Spanish,Spanish-French. Still good? It might handle that kind of processing differently, so you would need to check that for security purposes as well.
Basically, you have to proceed forward slowly, but eventually, you could try to use a procedure like this to develop the general AI into a superior translation AI (Even better than Google translate), and it should not ever require it being let out of the box.
Man, you’re restarting a very cooperative AI here.
My example unfriendly AI thinks all the way to converting universe to computronium well before it figures out it might want to talk to you and translate things to accomplish that goal by using you somehow. It just doesn’t translate things for you unless your training data gives it enough cue about universe.
WRT being able to confirm what it’s doing, say, I make neural network AI. Or just what ever AI that is massively parallel.