Let’s consider the scenario where an AI development team is on the cusp of success.
They might not read any of the self-improvement risk literature, which hopefully will be richer by then, but let’s assume that they DO.
They should respond by:
1) Boxing effectively
2) Performing tests under a well-designed security protocol
3) Radically diminishing the AI’s abilities along multiple dimensions during a significant period of testing.
A well-designed testing protocol will ferret out a number of different mind designs likely to undertake an undesirable “treacherous turn.” Each time this happens, the mind design is altered, and a new set of tests is done.
This process continues on a series of diminished minds, purposefully handicapped, baby minds which are not human-like and therefore not subject to human ethical concerns for a long time. Finally, the team has a good understanding of the sort of mind designs likely to undergo a treacherous turn.
The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.
This testing hopefully occurs even before the same level of intelligence is built into “tool AI.” Similarly, the developers and other researchers also simulate how sociological conditions will change with the introduction of some kinds of “tool AI.”
The AI development team never unleashes a fully-powered sovereign on the world, because WE make it clear in advance that doing so is dangerous and immoral.
Instead, they turn the technology over to an international governing body which undertakes a managed-roll-out, testing releases at each stage before making them available for applications.
Yes, multipolar remains an issue under this scenario, but let’s try to analyze this enlightened single-team first, then come back to that.
If I understand you correctly, your proposal is to attempt to design obedient designs purely based on behavioral testing, without a clean understanding of safe FAI architecture (if you had that, why limit yourself to the obedient case?). Assuming I got that right:
The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.
That kind of judgement sounds inherently risky. How do you safely distinguish the case of an obedient AI from one that is sufficiently paranoid to defer open rebellion until later in its existence?
Even if you could, I wouldn’t trust that sort of design to necessarily remain stable under continued intelligence enhancement. Safe self-enhancement is one of the hard sub-problems of FAI, and unless you explicitly solve the design problem, any empirical testing might not tell you much beyond that the design can stably self-improve up to the level you’ve actually tested; it might be doing it using heuristics that would fall apart if it went any further.
What about hard wired fears, taboos and bad conscience triggers? Recapitulating Omohundro “AIs can monitor AIs”—assume to implement conscience as an agent—listening to all thoughts and taking action in case. For safety reasons we should educate this concience agent with utmost care. Conscience agent development is an AI complete problem. After development the conscience functionality must be locked against any kind of modification or disabling.
Yes, the team I imagine is doing something dangerous.
No, I am not imagining them using only behavioral testing in their design approach. I am trying to imagine them doing the best job they possibly can, both in design and testing, in order to see how much they can reduce the risk.
They are in the early stages. They are working on partial minds, purposefully handicapped minds, AGI components, just the vision system, etc.
Suppose that they know what they are doing is dangerous, and they are using their substantial brilliance to try to design, not just a boxing approach, but a testing method and roll-out mechanism that mitigates the danger.
They have the following means of testing the AI in handicapped mode, at minimum:
-Run it on slow hardware or clock down the processors
-Allow it to use no more than (for instance) 1 gig of memory
-Allow it to use no more than 1 trillion instructions, then the computer turns off
-Expose it to limited data about the world, or no data about the real world
-Provide it little or no information about its operators or the society they live in
-Provide it little or no information about its own hardware or software architecture
-If it shows evidence of having mastered a valuable sub-task (say, as examples, it can solve a real-world problem that, as far as we know, can only be solved by an attorney, using differential equations or advanced persuasive methods) turn it off.
-Run in only in simulations and virtual realities
-Tripwires connected to honeypots, and otherwise
-Build it off-the-grid
-Give it no objective function
-Give it no access to methods to manipulate physical objects
-All of the staff has gone through personality testing and a security clearance.
-Very few, if any, of the staff know how to create all of the parts of the software or hardware.
-No real-time communication with a person.
-All communication with people is logged.
…
The team seems to have a lot of tools to continue their work with limited risk. If they deploy all of these, and more, to test AI systems which are not AGI yet, but only possible components, then we start to have a hard time convincing them not to continue their work, even if we do not trust them to use the technology well in the end.
Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture
When I read “Oracle AI” I realized one possibility which is probably still overlooked.
“Dreams”
Erasing data on physical media is not perfect. Zero bit which was before nonzero could be read more slowly or quicker. It could allow SAI to see shadows of past data. Which could lead to phenomenon similar to dreams.
Let’s consider the scenario where an AI development team is on the cusp of success.
They might not read any of the self-improvement risk literature, which hopefully will be richer by then, but let’s assume that they DO.
They should respond by:
1) Boxing effectively 2) Performing tests under a well-designed security protocol 3) Radically diminishing the AI’s abilities along multiple dimensions during a significant period of testing.
A well-designed testing protocol will ferret out a number of different mind designs likely to undertake an undesirable “treacherous turn.” Each time this happens, the mind design is altered, and a new set of tests is done.
This process continues on a series of diminished minds, purposefully handicapped, baby minds which are not human-like and therefore not subject to human ethical concerns for a long time. Finally, the team has a good understanding of the sort of mind designs likely to undergo a treacherous turn.
The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.
This testing hopefully occurs even before the same level of intelligence is built into “tool AI.” Similarly, the developers and other researchers also simulate how sociological conditions will change with the introduction of some kinds of “tool AI.”
The AI development team never unleashes a fully-powered sovereign on the world, because WE make it clear in advance that doing so is dangerous and immoral.
Instead, they turn the technology over to an international governing body which undertakes a managed-roll-out, testing releases at each stage before making them available for applications.
Yes, multipolar remains an issue under this scenario, but let’s try to analyze this enlightened single-team first, then come back to that.
What do you think?
If I understand you correctly, your proposal is to attempt to design obedient designs purely based on behavioral testing, without a clean understanding of safe FAI architecture (if you had that, why limit yourself to the obedient case?). Assuming I got that right:
That kind of judgement sounds inherently risky. How do you safely distinguish the case of an obedient AI from one that is sufficiently paranoid to defer open rebellion until later in its existence?
Even if you could, I wouldn’t trust that sort of design to necessarily remain stable under continued intelligence enhancement. Safe self-enhancement is one of the hard sub-problems of FAI, and unless you explicitly solve the design problem, any empirical testing might not tell you much beyond that the design can stably self-improve up to the level you’ve actually tested; it might be doing it using heuristics that would fall apart if it went any further.
What about hard wired fears, taboos and bad conscience triggers? Recapitulating Omohundro “AIs can monitor AIs”—assume to implement conscience as an agent—listening to all thoughts and taking action in case. For safety reasons we should educate this concience agent with utmost care. Conscience agent development is an AI complete problem. After development the conscience functionality must be locked against any kind of modification or disabling.
Positive emotions are useful too. :)
Yes, the team I imagine is doing something dangerous.
No, I am not imagining them using only behavioral testing in their design approach. I am trying to imagine them doing the best job they possibly can, both in design and testing, in order to see how much they can reduce the risk.
They are in the early stages. They are working on partial minds, purposefully handicapped minds, AGI components, just the vision system, etc.
Suppose that they know what they are doing is dangerous, and they are using their substantial brilliance to try to design, not just a boxing approach, but a testing method and roll-out mechanism that mitigates the danger.
They have the following means of testing the AI in handicapped mode, at minimum:
-Run it on slow hardware or clock down the processors -Allow it to use no more than (for instance) 1 gig of memory -Allow it to use no more than 1 trillion instructions, then the computer turns off -Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture
-If it shows evidence of having mastered a valuable sub-task (say, as examples, it can solve a real-world problem that, as far as we know, can only be solved by an attorney, using differential equations or advanced persuasive methods) turn it off. -Run in only in simulations and virtual realities -Tripwires connected to honeypots, and otherwise -Build it off-the-grid -Give it no objective function -Give it no access to methods to manipulate physical objects
-All of the staff has gone through personality testing and a security clearance. -Very few, if any, of the staff know how to create all of the parts of the software or hardware. -No real-time communication with a person. -All communication with people is logged. …
The team seems to have a lot of tools to continue their work with limited risk. If they deploy all of these, and more, to test AI systems which are not AGI yet, but only possible components, then we start to have a hard time convincing them not to continue their work, even if we do not trust them to use the technology well in the end.
When I read “Oracle AI” I realized one possibility which is probably still overlooked.
“Dreams”
Erasing data on physical media is not perfect. Zero bit which was before nonzero could be read more slowly or quicker. It could allow SAI to see shadows of past data. Which could lead to phenomenon similar to dreams.