Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things: (1) There cannot exist any efficient real-world algorithm with goal G. (2) If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail. (3) If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail. (4) If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm). (5) Same as above, for any hypothetical alien societies. (6) There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G. (7) There cannot exist any evolutionary or environmental pressures that would evolving highly efficient real world intelligences to follow goal G. All of these seem extraordinarily strong claims to make!
When I try arguing for anti-orthogonality, all those different conditions on G do not appear to add strength. The claim of anti-orthogonality is, after all, that most goal systems are inconsistent in the same sense in which “I want to be stupider” or “I want to prove Goedel’s statement...” goals are inconsistent, even though the inconsistency is not immediately apparent. And then all of the conditions immediately follow.
The “human society” conditions (3) and (4) are supposed to argue in favor of there being no impossible G-s, but in fact they argue for the opposite. Because, obviously, there are only very few G-s, which would be acceptable as long-term goals for human societies.
This point also highlights another important difference: the anti-orthogonality thesis can be weaker than “there cannot exist any efficient real-world algorithm with goal G”. Instead, it can be “any efficient real-world algorithm with goal G is value-unstable”, meaning that if any value drift, however small, is allowed, then the system will in short time drift away from G to the “right goal system”. This would distinguish between the “strong anti-orthogonality” (1), (2), (3) on the one hand, and “weak anti-orthogonality” (4), (5), (6), (7) on the other.
This weaker anti-orthogonality thesis is sufficient for practical purposes. It basically asserts that an UFAI could only be created via explicit and deliberate attempts to create an UFAI, and not because of bugs, insufficient knowledge, etc. And this makes the whole “Orthogonality for superhuman AIs” section much less relevant.
When I try arguing for anti-orthogonality, all those different conditions on G do not appear to add strength. The claim of anti-orthogonality is, after all, that most goal systems are inconsistent in the same sense in which “I want to be stupider” or “I want to prove Goedel’s statement...” goals are inconsistent, even though the inconsistency is not immediately apparent. And then all of the conditions immediately follow.
The “human society” conditions (3) and (4) are supposed to argue in favor of there being no impossible G-s, but in fact they argue for the opposite. Because, obviously, there are only very few G-s, which would be acceptable as long-term goals for human societies.
This point also highlights another important difference: the anti-orthogonality thesis can be weaker than “there cannot exist any efficient real-world algorithm with goal G”. Instead, it can be “any efficient real-world algorithm with goal G is value-unstable”, meaning that if any value drift, however small, is allowed, then the system will in short time drift away from G to the “right goal system”. This would distinguish between the “strong anti-orthogonality” (1), (2), (3) on the one hand, and “weak anti-orthogonality” (4), (5), (6), (7) on the other.
This weaker anti-orthogonality thesis is sufficient for practical purposes. It basically asserts that an UFAI could only be created via explicit and deliberate attempts to create an UFAI, and not because of bugs, insufficient knowledge, etc. And this makes the whole “Orthogonality for superhuman AIs” section much less relevant.
As I said to Wei, we can start dealing with those arguments once we’ve got strong foundations.
I’ll see if the value drift issue can be better integrated in the argumentation.