Hmm, doesn’t make sense, ’if the AI spends 1 hour of time on”. Is the AI turning itself on and off? Also, false positives—you are going to be shutting down any malfunctioning AI the same as the worst evil in the world. Then what? Start from blank state? What if it needed more time to understand the language? What if it already understood the language and is super intelligent? We can’t tell difference once it starts self modifying.
And ultimate problem: so you got seemingly nice AI in a nice box, it never even tried to hack the way out. Now what can you use it for? Any usage of AI is potential hole in the box.
No, giving an AI the ability to turn itself on would be a huge security hole. The AI would be shut down after a certain time period regardless of whatever it’s doing. In essence, preventing it for being on for longer than hour. If coming to the conclusion “I must kill all humans to follow my objectives.” takes 2 hours from a fresh boot, this is a safety feature.
And there would definitely be false positives. Since the risk of a false negative is extremely quick death of all mankind, I would be willing to accept plenty of false positives.
The first thing you do is start determining how the AI reacts to things and perceives things that are placed inside, the box for calibration. For instance, using the example of language, let’s say the AI knows English on a fresh boot. Add an English-Spanish dictionary into the box. Start sending Spanish messages to the AI. How long does it take the AI to realize that it is receiving messages in a language it doesn’t know, but might be able to figure out using the tools at hand?
Reboot.
Okay, now give it an English-French test.
Reboot.
Okay, now give it an English-Spanish test followed by an English-French test.
Reboot.
Does the AI come to the English-French conclusion faster than it does the English-Spanish conclusion?
Okay, you now have some clue as to the AI’s learning through experience. Now you can take the internals apart and confirm how it learned to do that faster. Since you coded the AI, you should be able to confirm that’s it’s doing approximately what you thought it would be doing. Given a similar problem twice, it should be able to solve the second one faster. If not, or if it using some unforseen methods, Then that’s a potential security breach (AI should not have had time to self modify that much that quickly given such limits and you should immediately tighten the strictures.)
Once everything is going according to plan again (Normally I would say, ‘Assuming everything is going to plan’, but We’re being paranoid and that means things never go according to plan at first.)
Then you test with another language in the loop. English-Spanish,English-French,English-German. Still good? Then you check with another language. English-Spanish,English-French,English-German,English-Japanese. Still good? Once you do that, you try throwing a curve ball like English-Spanish,Spanish-French. Still good? It might handle that kind of processing differently, so you would need to check that for security purposes as well.
Basically, you have to proceed forward slowly, but eventually, you could try to use a procedure like this to develop the general AI into a superior translation AI (Even better than Google translate), and it should not ever require it being let out of the box.
Man, you’re restarting a very cooperative AI here.
My example unfriendly AI thinks all the way to converting universe to computronium well before it figures out it might want to talk to you and translate things to accomplish that goal by using you somehow. It just doesn’t translate things for you unless your training data gives it enough cue about universe.
WRT being able to confirm what it’s doing, say, I make neural network AI. Or just what ever AI that is massively parallel.
Hmm, doesn’t make sense, ’if the AI spends 1 hour of time on”. Is the AI turning itself on and off? Also, false positives—you are going to be shutting down any malfunctioning AI the same as the worst evil in the world. Then what? Start from blank state? What if it needed more time to understand the language? What if it already understood the language and is super intelligent? We can’t tell difference once it starts self modifying.
And ultimate problem: so you got seemingly nice AI in a nice box, it never even tried to hack the way out. Now what can you use it for? Any usage of AI is potential hole in the box.
No, giving an AI the ability to turn itself on would be a huge security hole. The AI would be shut down after a certain time period regardless of whatever it’s doing. In essence, preventing it for being on for longer than hour. If coming to the conclusion “I must kill all humans to follow my objectives.” takes 2 hours from a fresh boot, this is a safety feature.
And there would definitely be false positives. Since the risk of a false negative is extremely quick death of all mankind, I would be willing to accept plenty of false positives.
The first thing you do is start determining how the AI reacts to things and perceives things that are placed inside, the box for calibration. For instance, using the example of language, let’s say the AI knows English on a fresh boot. Add an English-Spanish dictionary into the box. Start sending Spanish messages to the AI. How long does it take the AI to realize that it is receiving messages in a language it doesn’t know, but might be able to figure out using the tools at hand? Reboot.
Okay, now give it an English-French test. Reboot.
Okay, now give it an English-Spanish test followed by an English-French test. Reboot.
Does the AI come to the English-French conclusion faster than it does the English-Spanish conclusion?
Okay, you now have some clue as to the AI’s learning through experience. Now you can take the internals apart and confirm how it learned to do that faster. Since you coded the AI, you should be able to confirm that’s it’s doing approximately what you thought it would be doing. Given a similar problem twice, it should be able to solve the second one faster. If not, or if it using some unforseen methods, Then that’s a potential security breach (AI should not have had time to self modify that much that quickly given such limits and you should immediately tighten the strictures.)
Once everything is going according to plan again (Normally I would say, ‘Assuming everything is going to plan’, but We’re being paranoid and that means things never go according to plan at first.)
Then you test with another language in the loop. English-Spanish,English-French,English-German. Still good? Then you check with another language. English-Spanish,English-French,English-German,English-Japanese. Still good? Once you do that, you try throwing a curve ball like English-Spanish,Spanish-French. Still good? It might handle that kind of processing differently, so you would need to check that for security purposes as well.
Basically, you have to proceed forward slowly, but eventually, you could try to use a procedure like this to develop the general AI into a superior translation AI (Even better than Google translate), and it should not ever require it being let out of the box.
Man, you’re restarting a very cooperative AI here.
My example unfriendly AI thinks all the way to converting universe to computronium well before it figures out it might want to talk to you and translate things to accomplish that goal by using you somehow. It just doesn’t translate things for you unless your training data gives it enough cue about universe.
WRT being able to confirm what it’s doing, say, I make neural network AI. Or just what ever AI that is massively parallel.