You might want to also look at my argument in the top-level comment here, which more directly engages with Bostrom’s arguments for the orthogonality hypothesis. In brief, Bostrom says that all intelligence levels are compatible with all goals. I think that this is false: some intelligence levels are incompatible with some goals. AI safety is still as much of a risk either way, since many intelligence levels are compatible with many problematic goals. However, I don’t think Bostrom argues successfully for the orthogonality thesis, and I tried in the OP to illustrate a level of intelligence that is not compatible with any goal.
I don’t think anyone believes that literally that “all intelligence levels are compatible with all goals”. For example, an intelligence that is too dumb to understand the concept of “algebraic geometry” cannot have a goal that can only be stated in terms of algebraic geometry. I’m pretty sure Bostrom put in a caveat along those lines...
I freely grant that this maximally strengthened version of the orthogonality thesis is false, even if only for the reasons @Steven Byrnes mentioned below. No entity can have a goal that requires more bits to specify than are used in the specification of the entity’s mind (though this implies a widening circle of goals with increasing intelligence, rather than convergence).
I think it might be worth taking a moment more to ask what you mean by the word “intelligence.” How does a mind become more intelligent? Bostrom proposed three main classes.
There is speed superintelligence, which you could mimic by replacing the neurons of a human brain with components that run millions of times faster but with the same initial connectome. It is at the very least non-obvious that a million-fold-faster thinking Hitler, Gandhi, Einstein, a-random-peasant-farmer-from-the-early-bronze-age, and a-random-hunter-gatherer-from-ice-age-Siberia would end up with compatible goal structures as a result of their boosted thinking.
There is collective superintelligence, where individually smart entities work together form a much smarter whole. At least so far in history, while the behavior of collectives is often hard to predict, their goals have generally been simpler than those of their constituent human minds. I don’t think that’s necessarily a prerequisite for nonhuman collectives, but something has to keep the component goals aligned with each other, well enough to ensure the system as a whole retains coherence. Presumably that somehow is a subset of the overall system—which seems to imply that a collective superintelligence’s goals must be comprehensible to and decided by a smaller collective, which by your argument would seem to be itself less constrained by the forces pushing superintelligences towards convergence. Maybe this implies a simplification of goals as the system gets smarter? But that competes against the system gradually improving each of its subsystems, and even if not it would be a simplification of the subsystems’ goals, and it is again unclear that one very specific goal type is something that every possible collective superintelligence would converge on.
Then there’s quality superintelligence, which he admits is a murky category, but which includes: larger working and total memory, better speed of internal communication, more total computational elements, lower computational error rate, better or more senses/sensors, and more efficient algorithms (for example, having multiple powerful ANI subsystems it can call upon). That’s a lot of possible degrees of freedom in system design. Even in the absence of the orthogonality thesis, it is at best very unclear that all superintelligences would tend towards the specific kind of goals you’re highlighting.
In that last sense, you’re making the kind of mistake EY was pointing to in this part of the quantum physics sequence, where you’ve ignored an overwhelming prior against a nice-sounding hypothesis based on essentially zero bits of data. I am very confident that MIRI and the FHI would be thrilled to find strong reasons to think alignment won’t be such a hard problem after all, should you or any of them ever find such reasons.
You might want to also look at my argument in the top-level comment here, which more directly engages with Bostrom’s arguments for the orthogonality hypothesis. In brief, Bostrom says that all intelligence levels are compatible with all goals. I think that this is false: some intelligence levels are incompatible with some goals. AI safety is still as much of a risk either way, since many intelligence levels are compatible with many problematic goals. However, I don’t think Bostrom argues successfully for the orthogonality thesis, and I tried in the OP to illustrate a level of intelligence that is not compatible with any goal.
I don’t think anyone believes that literally that “all intelligence levels are compatible with all goals”. For example, an intelligence that is too dumb to understand the concept of “algebraic geometry” cannot have a goal that can only be stated in terms of algebraic geometry. I’m pretty sure Bostrom put in a caveat along those lines...
Note: even so, this objection would imply an in increasing range of possible goals as intelligence rises, not convergence.
I freely grant that this maximally strengthened version of the orthogonality thesis is false, even if only for the reasons @Steven Byrnes mentioned below. No entity can have a goal that requires more bits to specify than are used in the specification of the entity’s mind (though this implies a widening circle of goals with increasing intelligence, rather than convergence).
I think it might be worth taking a moment more to ask what you mean by the word “intelligence.” How does a mind become more intelligent? Bostrom proposed three main classes.
There is speed superintelligence, which you could mimic by replacing the neurons of a human brain with components that run millions of times faster but with the same initial connectome. It is at the very least non-obvious that a million-fold-faster thinking Hitler, Gandhi, Einstein, a-random-peasant-farmer-from-the-early-bronze-age, and a-random-hunter-gatherer-from-ice-age-Siberia would end up with compatible goal structures as a result of their boosted thinking.
There is collective superintelligence, where individually smart entities work together form a much smarter whole. At least so far in history, while the behavior of collectives is often hard to predict, their goals have generally been simpler than those of their constituent human minds. I don’t think that’s necessarily a prerequisite for nonhuman collectives, but something has to keep the component goals aligned with each other, well enough to ensure the system as a whole retains coherence. Presumably that somehow is a subset of the overall system—which seems to imply that a collective superintelligence’s goals must be comprehensible to and decided by a smaller collective, which by your argument would seem to be itself less constrained by the forces pushing superintelligences towards convergence. Maybe this implies a simplification of goals as the system gets smarter? But that competes against the system gradually improving each of its subsystems, and even if not it would be a simplification of the subsystems’ goals, and it is again unclear that one very specific goal type is something that every possible collective superintelligence would converge on.
Then there’s quality superintelligence, which he admits is a murky category, but which includes: larger working and total memory, better speed of internal communication, more total computational elements, lower computational error rate, better or more senses/sensors, and more efficient algorithms (for example, having multiple powerful ANI subsystems it can call upon). That’s a lot of possible degrees of freedom in system design. Even in the absence of the orthogonality thesis, it is at best very unclear that all superintelligences would tend towards the specific kind of goals you’re highlighting.
In that last sense, you’re making the kind of mistake EY was pointing to in this part of the quantum physics sequence, where you’ve ignored an overwhelming prior against a nice-sounding hypothesis based on essentially zero bits of data. I am very confident that MIRI and the FHI would be thrilled to find strong reasons to think alignment won’t be such a hard problem after all, should you or any of them ever find such reasons.