But you run into other problems then, like the certainty the OP touched on. Then the agent will spend significant resources ensuring that it has exactly 9 paperclips made, and wouldn’t accept a 90% probability of making 10 paperclips, because a 99.9999% probability of making 9 paperclips would yield more utility for it.
But the entire point of building FAI is to not require it to have resource usage limits, because it can’t help us if it’s limited. And such resource limits wouldn’t necessarily be useful for “testing” whether or not an AI was friendly, because if it weren’t, it would mimic the behaviour of a FAI so that it could get more resources.
But the entire point of building FAI is to not require it to have resource usage limits, because it can’t help us if it’s limited.
Machines can’t cause so much damage if they have resource-usage limits. This is a prudent safety precaution. It is not true that resource-limited machines can’t help us.
And such resource limits wouldn’t necessarily be useful for “testing” whether or not an AI was friendly, because if it weren’t, it would mimic the behaviour of a FAI so that it could get more resources.
So: the main idea is to attempt damage limitation. If the machine behaves itself, you can carry on with another session. If it does not, it is hopefully back to the drawing board, without too much damage done.
But you run into other problems then, like the certainty the OP touched on. Then the agent will spend significant resources ensuring that it has exactly 9 paperclips made, and wouldn’t accept a 90% probability of making 10 paperclips, because a 99.9999% probability of making 9 paperclips would yield more utility for it.
Sooo—you would normally give such an agent time and resource-usage limits.
But the entire point of building FAI is to not require it to have resource usage limits, because it can’t help us if it’s limited. And such resource limits wouldn’t necessarily be useful for “testing” whether or not an AI was friendly, because if it weren’t, it would mimic the behaviour of a FAI so that it could get more resources.
Machines can’t cause so much damage if they have resource-usage limits. This is a prudent safety precaution. It is not true that resource-limited machines can’t help us.
So: the main idea is to attempt damage limitation. If the machine behaves itself, you can carry on with another session. If it does not, it is hopefully back to the drawing board, without too much damage done.