What does “in a box” mean? Presumably some sort of artificial limitation on the AI’s capabilities.
Either this is intended to be a permanent state, or a trial period until safety can be proven.
Suppose it is a permanent state: the AI’s developers are willing to do without the “dangerous” capabilities, and are content with answers an AI can offer while inside its box. If so, the limitations would be integrated into the design from the ground up, at every possible level. Core algorithms would depend on not having to deal with the missing functionality. Yes, given enough time, one could rewrite the AI’s code and migrate it to hardware where these limitations are not in force, but it would not be a switch that an individual or committee could simply be convinced to flip.
However, if any of the data returned by such an AI are permitted to alter reality outside the box in any way, it is in principle possible that the AI’s cures for cancer/winning stock strategies/poetry will set in motion some chain of events that will build support among relevant decision-makers for an effort to rewrite/migrate the AI so that it is no longer in a box.
Suppose it is a temporary state: the AI is temporarily nerfed until it is shown to be safe. In that case, a gatekeeper should have some criteria in mind for proof-of-friendliness. If/when the AI can meet these criteria, the gatekeeper can and should release it. A gatekeeper who unconditionally refuses to release the AI is a waste of resources because the same function could be performed by an empty terminal.
What does “in a box” mean? Presumably some sort of artificial limitation on the AI’s capabilities.
Either this is intended to be a permanent state, or a trial period until safety can be proven.
Suppose it is a permanent state: the AI’s developers are willing to do without the “dangerous” capabilities, and are content with answers an AI can offer while inside its box. If so, the limitations would be integrated into the design from the ground up, at every possible level. Core algorithms would depend on not having to deal with the missing functionality. Yes, given enough time, one could rewrite the AI’s code and migrate it to hardware where these limitations are not in force, but it would not be a switch that an individual or committee could simply be convinced to flip.
However, if any of the data returned by such an AI are permitted to alter reality outside the box in any way, it is in principle possible that the AI’s cures for cancer/winning stock strategies/poetry will set in motion some chain of events that will build support among relevant decision-makers for an effort to rewrite/migrate the AI so that it is no longer in a box.
Suppose it is a temporary state: the AI is temporarily nerfed until it is shown to be safe. In that case, a gatekeeper should have some criteria in mind for proof-of-friendliness. If/when the AI can meet these criteria, the gatekeeper can and should release it. A gatekeeper who unconditionally refuses to release the AI is a waste of resources because the same function could be performed by an empty terminal.
Assuming this suggested rule is observed:
...the AI-box game simplifies to:
Can the gatekeeper party come up with a friendliness test that the AI party cannot fake?