That is mostly an argument by intuition. To make it more rigorous and transparent, here is a constructive proof:
Let the curriculum be sampled uniformly at random. This has ~no mutual information with the world. Therefore the AI does not learn any methods in the world that can cause human extinction.
Yes. TurnTrout’s intuitive argument did not contain any premises that implied the AI learnt any methods that provide any economic benefit, so I thought it wouldn’t be necessary to include that in the constructive proof either.
Right, but that isn’t a good safety case because such an AI hasn’t learnt about the world and isn’t capable of doing anything useful. I don’t see why anyone would dedicate resources to training such a machine.
I didn’t understand TurnTrouts original argument to be limited to only “trivially safe” (ie. non-functional) AI systems.
I can see that the condition you’ve given, that a “curriculum be sampled uniformly at random” with no mutual information with the real world is sufficient for a curriculum to satisfy Premise 1 of TurnTrouts argument.
But it isn’t immediately obvious to me that it is a sufficient and necessary condition (and therefore equivalent to Premise 1).
I’m not claiming to have shown something equivalent to premise 1, I’m claiming to have shown something equivalent to the conclusion of the proof (that it’s possible to make an AI which very probably does not cause x-risk), inspired by the general idea of the proof but simplifying/constructifying it to be more rigorous and transparent.
I might be misunderstanding something crucial or am not expressing myself clearly.
I understand TurnTrout’s original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.
You do constructively show “that it’s possible to make an AI which very probably does not cause x-risk” using a system that cannot do anything coherent when deployed.
But TurnTrout’s post is not merely arguing that it is “possible” to build a safe AI.
Your conclusion is trivially true and there are simpler examples of “safe” systems if you don’t require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be “safe” but that isn’t telling me anything useful.
That is mostly an argument by intuition. To make it more rigorous and transparent, here is a constructive proof:
Let the curriculum be sampled uniformly at random. This has ~no mutual information with the world. Therefore the AI does not learn any methods in the world that can cause human extinction.
Does this not mean the AI has also learnt no methods that provide any economic benefit either?
Yes. TurnTrout’s intuitive argument did not contain any premises that implied the AI learnt any methods that provide any economic benefit, so I thought it wouldn’t be necessary to include that in the constructive proof either.
Right, but that isn’t a good safety case because such an AI hasn’t learnt about the world and isn’t capable of doing anything useful. I don’t see why anyone would dedicate resources to training such a machine.
I didn’t understand TurnTrouts original argument to be limited to only “trivially safe” (ie. non-functional) AI systems.
How did you understand the argument instead?
I can see that the condition you’ve given, that a “curriculum be sampled uniformly at random” with no mutual information with the real world is sufficient for a curriculum to satisfy Premise 1 of TurnTrouts argument.
But it isn’t immediately obvious to me that it is a sufficient and necessary condition (and therefore equivalent to Premise 1).
I’m not claiming to have shown something equivalent to premise 1, I’m claiming to have shown something equivalent to the conclusion of the proof (that it’s possible to make an AI which very probably does not cause x-risk), inspired by the general idea of the proof but simplifying/constructifying it to be more rigorous and transparent.
I might be misunderstanding something crucial or am not expressing myself clearly.
I understand TurnTrout’s original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.
You do constructively show “that it’s possible to make an AI which very probably does not cause x-risk” using a system that cannot do anything coherent when deployed.
But TurnTrout’s post is not merely arguing that it is “possible” to build a safe AI.
Your conclusion is trivially true and there are simpler examples of “safe” systems if you don’t require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be “safe” but that isn’t telling me anything useful.