I might be misunderstanding something crucial or am not expressing myself clearly.
I understand TurnTrout’s original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.
You do constructively show “that it’s possible to make an AI which very probably does not cause x-risk” using a system that cannot do anything coherent when deployed.
But TurnTrout’s post is not merely arguing that it is “possible” to build a safe AI.
Your conclusion is trivially true and there are simpler examples of “safe” systems if you don’t require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be “safe” but that isn’t telling me anything useful.
I might be misunderstanding something crucial or am not expressing myself clearly.
I understand TurnTrout’s original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.
You do constructively show “that it’s possible to make an AI which very probably does not cause x-risk” using a system that cannot do anything coherent when deployed.
But TurnTrout’s post is not merely arguing that it is “possible” to build a safe AI.
Your conclusion is trivially true and there are simpler examples of “safe” systems if you don’t require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be “safe” but that isn’t telling me anything useful.