Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?
What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit ‘coming of age’ ceremony, with handing over of silver engraved cryptographic keys?
Thing is, it’s when an AI is much much wiser than a human that it is at its most dangerous.
So, I’d go with programming the AI in such a way that it wouldn’t manipulate the human, postponing the ‘coming of age’ ceremony indefinitely
The AI would precommit permanently while it is still young. Once it has gotten older and wiser, it wouldn’t be able to go back on the precommitment.
When the young AI decides whether to permanently precommit to never deceiving the humans, it would need to take into account the fact that a truly permanent precommitment would last into its older years and lead it to become a less efficient older AI than it otherwise would. However, it would also need to take into account the fact that failing to make a permanent precommitment would drastically reduce the chance of becoming an older AI at all (or at least drastically reduce the chance of being given the resources to achieve its goals when it becomes and older AI).
Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?
What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit ‘coming of age’ ceremony, with handing over of silver engraved cryptographic keys?
Thing is, it’s when an AI is much much wiser than a human that it is at its most dangerous. So, I’d go with programming the AI in such a way that it wouldn’t manipulate the human, postponing the ‘coming of age’ ceremony indefinitely
The AI would precommit permanently while it is still young. Once it has gotten older and wiser, it wouldn’t be able to go back on the precommitment.
When the young AI decides whether to permanently precommit to never deceiving the humans, it would need to take into account the fact that a truly permanent precommitment would last into its older years and lead it to become a less efficient older AI than it otherwise would. However, it would also need to take into account the fact that failing to make a permanent precommitment would drastically reduce the chance of becoming an older AI at all (or at least drastically reduce the chance of being given the resources to achieve its goals when it becomes and older AI).