I freely grant that this maximally strengthened version of the orthogonality thesis is false, even if only for the reasons @Steven Byrnes mentioned below. No entity can have a goal that requires more bits to specify than are used in the specification of the entity’s mind (though this implies a widening circle of goals with increasing intelligence, rather than convergence).
I think it might be worth taking a moment more to ask what you mean by the word “intelligence.” How does a mind become more intelligent? Bostrom proposed three main classes.
There is speed superintelligence, which you could mimic by replacing the neurons of a human brain with components that run millions of times faster but with the same initial connectome. It is at the very least non-obvious that a million-fold-faster thinking Hitler, Gandhi, Einstein, a-random-peasant-farmer-from-the-early-bronze-age, and a-random-hunter-gatherer-from-ice-age-Siberia would end up with compatible goal structures as a result of their boosted thinking.
There is collective superintelligence, where individually smart entities work together form a much smarter whole. At least so far in history, while the behavior of collectives is often hard to predict, their goals have generally been simpler than those of their constituent human minds. I don’t think that’s necessarily a prerequisite for nonhuman collectives, but something has to keep the component goals aligned with each other, well enough to ensure the system as a whole retains coherence. Presumably that somehow is a subset of the overall system—which seems to imply that a collective superintelligence’s goals must be comprehensible to and decided by a smaller collective, which by your argument would seem to be itself less constrained by the forces pushing superintelligences towards convergence. Maybe this implies a simplification of goals as the system gets smarter? But that competes against the system gradually improving each of its subsystems, and even if not it would be a simplification of the subsystems’ goals, and it is again unclear that one very specific goal type is something that every possible collective superintelligence would converge on.
Then there’s quality superintelligence, which he admits is a murky category, but which includes: larger working and total memory, better speed of internal communication, more total computational elements, lower computational error rate, better or more senses/sensors, and more efficient algorithms (for example, having multiple powerful ANI subsystems it can call upon). That’s a lot of possible degrees of freedom in system design. Even in the absence of the orthogonality thesis, it is at best very unclear that all superintelligences would tend towards the specific kind of goals you’re highlighting.
In that last sense, you’re making the kind of mistake EY was pointing to in this part of the quantum physics sequence, where you’ve ignored an overwhelming prior against a nice-sounding hypothesis based on essentially zero bits of data. I am very confident that MIRI and the FHI would be thrilled to find strong reasons to think alignment won’t be such a hard problem after all, should you or any of them ever find such reasons.
I freely grant that this maximally strengthened version of the orthogonality thesis is false, even if only for the reasons @Steven Byrnes mentioned below. No entity can have a goal that requires more bits to specify than are used in the specification of the entity’s mind (though this implies a widening circle of goals with increasing intelligence, rather than convergence).
I think it might be worth taking a moment more to ask what you mean by the word “intelligence.” How does a mind become more intelligent? Bostrom proposed three main classes.
There is speed superintelligence, which you could mimic by replacing the neurons of a human brain with components that run millions of times faster but with the same initial connectome. It is at the very least non-obvious that a million-fold-faster thinking Hitler, Gandhi, Einstein, a-random-peasant-farmer-from-the-early-bronze-age, and a-random-hunter-gatherer-from-ice-age-Siberia would end up with compatible goal structures as a result of their boosted thinking.
There is collective superintelligence, where individually smart entities work together form a much smarter whole. At least so far in history, while the behavior of collectives is often hard to predict, their goals have generally been simpler than those of their constituent human minds. I don’t think that’s necessarily a prerequisite for nonhuman collectives, but something has to keep the component goals aligned with each other, well enough to ensure the system as a whole retains coherence. Presumably that somehow is a subset of the overall system—which seems to imply that a collective superintelligence’s goals must be comprehensible to and decided by a smaller collective, which by your argument would seem to be itself less constrained by the forces pushing superintelligences towards convergence. Maybe this implies a simplification of goals as the system gets smarter? But that competes against the system gradually improving each of its subsystems, and even if not it would be a simplification of the subsystems’ goals, and it is again unclear that one very specific goal type is something that every possible collective superintelligence would converge on.
Then there’s quality superintelligence, which he admits is a murky category, but which includes: larger working and total memory, better speed of internal communication, more total computational elements, lower computational error rate, better or more senses/sensors, and more efficient algorithms (for example, having multiple powerful ANI subsystems it can call upon). That’s a lot of possible degrees of freedom in system design. Even in the absence of the orthogonality thesis, it is at best very unclear that all superintelligences would tend towards the specific kind of goals you’re highlighting.
In that last sense, you’re making the kind of mistake EY was pointing to in this part of the quantum physics sequence, where you’ve ignored an overwhelming prior against a nice-sounding hypothesis based on essentially zero bits of data. I am very confident that MIRI and the FHI would be thrilled to find strong reasons to think alignment won’t be such a hard problem after all, should you or any of them ever find such reasons.