I think that’s the deformation of a fundamental theorem (« there exists an universal Turing machine, e.g. it can run any program ») into a practical belief (« an intelligence can pick its value at random »), with a motte and bailey game on the meaning of can where the motte is the fundamental theorem and the bailey is the orthogonal thesis.
(thanks for the link to your own take, e.g. you think it’s the bailey that is the deformation)
Consider the sense in which humans are not aligned with each other. We can’t formulate what “our goals” are. The question of what it even means to secure alignment is fraught with philosophical difficulties.
It’s part of the appeal, isn’t it?
If the oversight AI responsible for such decisions about a slightly stronger AI is not even existentially dangerous, it’s likely to do a bad job of solving this problem.
I don’t get the logic here. Typo?
So I’m not claiming sudden changes, only intractability of what we are trying to do
That’s a fair point, but the intractability of a problem usually goes with the tractability of a slightly relaxed problem. In other words, it can be both fundamentally impossible to please everyone and fundamentally easy to control paperclips maximizers.
And also an aligned AI doesn’t make the world safe until there is a new equilibrium of power, which is a point they don’t address, but is still a major source of existential risk. For example, imagine giving multiple literal humans the power of being superintelligent AIs, with no issues of misalignment between them and their power. This is not a safe world until it settles, at which point humanity might not be there anymore. This is something that should be planned in more detail than what we get by not considering it at all.
Well said.
All significant risks are anthropogenic.
You think all significant risks are known?
Also, it seems clear how to intentionally construct a paperclip maximizer: you search for actions whose expected futures have more paperclips, then perform those actions. So a paperclip maximizer is at least not logically incoherent.
Indeed the inconsistency appears only with superintelligent paperclip maximizers. I can be petty with my wife. I don’t expect a much better me would.
I think that’s the deformation of a fundamental theorem (« there exists an universal Turing machine, e.g. it can run any program ») into a practical belief (« an intelligence can pick its value at random »), with a motte and bailey game on the meaning of can where the motte is the fundamental theorem and the bailey is the orthogonal thesis.
(thanks for the link to your own take, e.g. you think it’s the bailey that is the deformation)
It’s part of the appeal, isn’t it?
I don’t get the logic here. Typo?
That’s a fair point, but the intractability of a problem usually goes with the tractability of a slightly relaxed problem. In other words, it can be both fundamentally impossible to please everyone and fundamentally easy to control paperclips maximizers.
Well said.
You think all significant risks are known?
Indeed the inconsistency appears only with superintelligent paperclip maximizers. I can be petty with my wife. I don’t expect a much better me would.