Stunting, tripwires, and designing limitations on the AI’s goals and behaviors may be very powerful tools.
We are having a hard time judging how powerful they are because we do not have the actual schemes for doing so in front of us to judge.
Until engineering specifications for these approaches start to be available, the jury will still be out.
We certainly can imagine creating a powerful but stripped-down AGI component without all possible functionality. We can also conceive of ways to test it.
Just to get the ball rolling, consider running it one hundred times more slowly than a human brain while it was being tested, debugged and fed various kinds of data.
Consider running it in a specially-designed game world during testing.
We would be able to make some inferences about how likely the system would be to break the law or otherwise become threatening in real life if it did so in in the game world.
If it became threatening under test conditions, that increases the chance that it would become threatening in real-life. Re-design and repeat the tests.
On the other hand, if it did not become threatening under some test conditions, that alone is not enough to prove that it is time to use it in the real-world. So, the developers continue to test and analyze.
The trick, then, is creating a risk assessment process to check the safety of heavily tested AGI components which passed all tests.
The stunting and tripwires would have to go through a validation process before they could be verified as sufficient to permit safe testing.
Just as a heuristic, perhaps we are looking for probabilities of danger below one in a million or one in a trillion.
Or, perhaps we are comparing this probability with the probability of a different catastrophic or existential risk which the AGI can help to mitigate. However, building an AGI in response to some other emergency, the way we built the atomic bomb, seems like a recipe for trouble.
Specifically, the possibility that the AGI will be able to lengthen the life or permit upload of a small group of important people IS NOT enough. Saving the life of The President or an important corporate executive is not enough reason to create AGI.
The development team only escalates to the next level of testing and utilization when danger probabilities falls sufficiently.
Furthermore, the development team is NOT responsible for the risk assessment. That job belongs to another team(s) with a designed process for providing oversight.
If the probability of danger never falls enough, the AGI development team is not allowed to continue.
Right now, such danger probabilities are too high. However, trying to design stunting and tripwire schemes wlll provide infromation that allows us to update the probabilities.
Stunting, tripwires, and designing limitations on the AI’s goals and behaviors may be very powerful tools.
We are having a hard time judging how powerful they are because we do not have the actual schemes for doing so in front of us to judge.
Until engineering specifications for these approaches start to be available, the jury will still be out.
We certainly can imagine creating a powerful but stripped-down AGI component without all possible functionality. We can also conceive of ways to test it.
Just to get the ball rolling, consider running it one hundred times more slowly than a human brain while it was being tested, debugged and fed various kinds of data.
Consider running it in a specially-designed game world during testing.
We would be able to make some inferences about how likely the system would be to break the law or otherwise become threatening in real life if it did so in in the game world.
If it became threatening under test conditions, that increases the chance that it would become threatening in real-life. Re-design and repeat the tests.
On the other hand, if it did not become threatening under some test conditions, that alone is not enough to prove that it is time to use it in the real-world. So, the developers continue to test and analyze.
The trick, then, is creating a risk assessment process to check the safety of heavily tested AGI components which passed all tests.
The stunting and tripwires would have to go through a validation process before they could be verified as sufficient to permit safe testing.
Just as a heuristic, perhaps we are looking for probabilities of danger below one in a million or one in a trillion.
Or, perhaps we are comparing this probability with the probability of a different catastrophic or existential risk which the AGI can help to mitigate. However, building an AGI in response to some other emergency, the way we built the atomic bomb, seems like a recipe for trouble.
Specifically, the possibility that the AGI will be able to lengthen the life or permit upload of a small group of important people IS NOT enough. Saving the life of The President or an important corporate executive is not enough reason to create AGI.
The development team only escalates to the next level of testing and utilization when danger probabilities falls sufficiently.
Furthermore, the development team is NOT responsible for the risk assessment. That job belongs to another team(s) with a designed process for providing oversight.
If the probability of danger never falls enough, the AGI development team is not allowed to continue.
Right now, such danger probabilities are too high. However, trying to design stunting and tripwire schemes wlll provide infromation that allows us to update the probabilities.