This is a fine point to make in a post, but I want to point out that I think you misrepresent the strength of the orthogonality thesis in your opening paragraph. The thesis is about what’s possible: it’s a claim that intelligence and goals need not be confounded, not that they must not be confounded or shouldn’t be confounded. It was crafted to make clear that AI minds need not be like human minds, where a great many things are tied together in ways such that you can imagine something like the anti-orthgonality thesis (that smarter AI will naturally be good because it’ll reason out moral facts for itself or something similar) holding for humans, but not to propose that AI minds must be a particular way.
In fact, as you note, AI would be easier to align if intelligence and goals were confounded, and perhaps, due to some quirks of current systems, we’ll have an easier enough time than we expected aligning AI to not kill us for at least a few years.
This. “Orthogonality” is just the name the rationality/AI safety community gave to the old philosophical adage that “you can not derive ought from is”. No matter how smart you are at understanding the world, none of that will make you intrinsically good. Morality isn’t inscribed somewhere in the mathematical structure of spacetime like the speed of light or Planck’s constant. The universe just doesn’t care; and thus no amounting of understanding the universe will in itself make you care.
This is a fine point to make in a post, but I want to point out that I think you misrepresent the strength of the orthogonality thesis in your opening paragraph. The thesis is about what’s possible: it’s a claim that intelligence and goals need not be confounded, not that they must not be confounded or shouldn’t be confounded. It was crafted to make clear that AI minds need not be like human minds, where a great many things are tied together in ways such that you can imagine something like the anti-orthgonality thesis (that smarter AI will naturally be good because it’ll reason out moral facts for itself or something similar) holding for humans, but not to propose that AI minds must be a particular way.
In fact, as you note, AI would be easier to align if intelligence and goals were confounded, and perhaps, due to some quirks of current systems, we’ll have an easier enough time than we expected aligning AI to not kill us for at least a few years.
This. “Orthogonality” is just the name the rationality/AI safety community gave to the old philosophical adage that “you can not derive ought from is”. No matter how smart you are at understanding the world, none of that will make you intrinsically good. Morality isn’t inscribed somewhere in the mathematical structure of spacetime like the speed of light or Planck’s constant. The universe just doesn’t care; and thus no amounting of understanding the universe will in itself make you care.
Agreed. Only a very weak form of orthogonality is necessary to have dangerously unaligned AI be the default.