The orthogonality thesis (formulated by Nick Bostrom in his article Superintelligent Will, 2011), states basically that an artificial intelligence can have any combination of intelligence level and goal. This article will focus on this simple question, and will only deal with the practical implementation issues at the end, that would need to be part of its full refutation according to Stuart Armstrong.
Meta-ethics
The orthogonality thesis is based on a variation of ethical values for different beings. This is either because the beings in question have some objective difference in their constitution that associates them to different values, or because they can choose what values they have.
That assumption of variation is arguably based on an analysis of humans. The problem with choosing values is obvious: making errors. Human beings are biologically and constitutionally very similar, and given this, if they objectively and rightfully differ in correct values, it is only in aesthetic preferences, by an existing biological difference. If they differ in other values, given that they are constitutionally similar, then the differing values could not be all correct at the same time, they would be differing due to error in choice.
Aesthetic preferences do vary for us, but they all connect ultimately to their satisfaction ― a specific aesthetic preference may satisfy only some people and not others. What is important is the satisfaction, or good feelings, that they produce, in the present or future (what might entail life preservation), which is basically the same thing to everyone. A given stimulus or occurrence is interpreted by the senses and it can produce good feelings, bad feelings, or neither, depending on the organism that receives it. This variation is besides the point, it is just an idiosyncrasy that could be either way: theoretically, any input (aesthetic preferences) could be associated with a certain output (good and bad feelings), or even no input at all, as in spontaneous satisfaction or wire-heading. In terms of output, good feelings and bad feelings always get positive and negative value, by definition.
Masochism is not a counter-example: masochists like pain only in very specific environments, associated with certain roleplaying fantasies, due to good feelings associated to it, or due to a relief of mental suffering that comes with the pain. Outside of these environments and fantasies, they are just as averse to pain as other people. They don’t regularly put their hands into boiling water to feel the pain, nobody does.
Good and bad feelings are directly felt as positive and desirable; negative and aversive, and this direct verification gives them the highest epistemological value. What is indirectly felt, such as the world around us, science, or physical theories, depends on the senses and could therefore be an illusion, such as being part of a virtual world. We could, theoretically, be living inside virtual worlds in an underlying alien universe with different physical laws and scientific facts, but we can nonetheless be sure of the reality of our conscious experiences in themselves, which are directly felt.
There is a difference between valid and invalid human values, which is the ground of justification for moral realism: valid values have an epistemological justification, while invalid ones are based on arbitrary choice or intuition. The epistemological justification of valid values occurs by that part of our experiences which has a direct certainty, as opposed to indirect: conscious experiences in themselves. Likewise, only conscious beings can be said to be ethically relevant in themselves, while what goes on in the hot magma at the core of the earth, or in a random rock in Pluto, are not. Consciousness creates a subject of experience, which is required for direct ethical value. It is straightforward to conclude, therefore, that good conscious experiences constitute what is good, and bad conscious experiences constitute what is bad. Good and bad are what ethical value is about.
Good and bad feelings (or conscious experiences) are physical occurrences, and therefore objectively good and bad occurrences, and objective value. Other fictional values without epistemological (or logical) justification are therefore in another category, and simply constitute the error which comes from allowing free choice of one’s values for beings with a similar biological constitution.
Personal Identity
The existence of personal identities is purely an illusion that cannot be justified by argument, and clearly disintegrates upon deeper analysis (for why that is, see, e.g., this essay: Universal Identity, or for an introduction to the problem, see Less Wrong article The Anthropic Trilemma).
Different instances in time of a physical organism relate to it in the same way that any other physical organism in the universe does. There is no logical basis for privileging a physical organism’s own viewpoint, nor the satisfaction of their own values over that of other physical organisms, nor for assuming the preponderance of their own reasoning over those of other physical organisms of contextually comparable reasoning capacity.
Therefore, the argument of variation or orthogonality could, at best, assume that a superintelligent physical organism with complete understanding of these cognitively trivial philosophical matters would have to consider all viewpoints and valid preferences in their utility function, in a way much similar to coherent extrapolated volition (CEV), extrapolating the values for intelligence and removing errors, but taking account of the values of all sentient physical organisms: not only humans, but also animals, and possibly sentient machines and aliens. The only values that are validly generalizable among such widely differing sentient creatures are good and bad feelings (in the present or future).
Furthermore, a superintelligent physical organism with such understanding would have to give equal weight to the reasoning of other physical organisms of contextually comparable reasoning capacity (depending on the cognitive demands of the context, or problem, even some humans can reason perfectly well), if existent. In case of convergence, this would be a non-issue. In case of divergence, this would force an evaluation of reasons or argumentation, seeking a convergence or preponderance of argument.
Conclusions
Taking the orthogonality thesis to be merely the assumption of divergence of ethical values of superintelligent agents, but not a statement about the issues with practical implementation and tampering with or forcing them by non-superintelligent humans, then there are two fatal arguments against it, one on the side of meta-ethics (moral realism), and one on the side of personal identity (open/empty individualism, or universal identity).
Beings with general superintelligence should find these fundamental philosophical matters trivial (meta-ethics and personal identity), and understand them completely. They should take a non-privileged and objective viewpoint, accounting for all perspectives of physical subjects, and giving (a priori) similar consideration for the reasoning of all physical organisms of contextually comparable reasoning capacity.
Furthermore they would understand that the free variation of values, even in comparable causal chains of biologically similar organisms, comes from error, and that their extrapolation for intelligence would result in moral realism with good and bad feelings as the epistemologically justified and only valid direct values, from which all other indirectly or instrumentally valuable actions derive their indirect value. For instance, survival, which in a paradise can have positive value, coming from good feelings in the present and future, and which in a hell can have negative value, coming from bad feelings in the present and in the future.
Perhaps certain architectures or contexts involving beings with superintelligence, caused by beings without superintelligence and erratic behavior, could be forced to produce unethical results. This seems to be the most grave existential risk that we face, and would not come from beings with superintelligence themselves, but from human error. The orthogonality thesis is fundamentally mistaken in relation to beings with general superintelligence (surpassing all human cognitive capacities), but it might be practically realized by non-superintelligent human agents.
Arguments against the Orthogonality Thesis
The orthogonality thesis (formulated by Nick Bostrom in his article Superintelligent Will, 2011), states basically that an artificial intelligence can have any combination of intelligence level and goal. This article will focus on this simple question, and will only deal with the practical implementation issues at the end, that would need to be part of its full refutation according to Stuart Armstrong.
Meta-ethics
The orthogonality thesis is based on a variation of ethical values for different beings. This is either because the beings in question have some objective difference in their constitution that associates them to different values, or because they can choose what values they have.
That assumption of variation is arguably based on an analysis of humans. The problem with choosing values is obvious: making errors. Human beings are biologically and constitutionally very similar, and given this, if they objectively and rightfully differ in correct values, it is only in aesthetic preferences, by an existing biological difference. If they differ in other values, given that they are constitutionally similar, then the differing values could not be all correct at the same time, they would be differing due to error in choice.
Aesthetic preferences do vary for us, but they all connect ultimately to their satisfaction ― a specific aesthetic preference may satisfy only some people and not others. What is important is the satisfaction, or good feelings, that they produce, in the present or future (what might entail life preservation), which is basically the same thing to everyone. A given stimulus or occurrence is interpreted by the senses and it can produce good feelings, bad feelings, or neither, depending on the organism that receives it. This variation is besides the point, it is just an idiosyncrasy that could be either way: theoretically, any input (aesthetic preferences) could be associated with a certain output (good and bad feelings), or even no input at all, as in spontaneous satisfaction or wire-heading. In terms of output, good feelings and bad feelings always get positive and negative value, by definition.
Masochism is not a counter-example: masochists like pain only in very specific environments, associated with certain roleplaying fantasies, due to good feelings associated to it, or due to a relief of mental suffering that comes with the pain. Outside of these environments and fantasies, they are just as averse to pain as other people. They don’t regularly put their hands into boiling water to feel the pain, nobody does.
Good and bad feelings are directly felt as positive and desirable; negative and aversive, and this direct verification gives them the highest epistemological value. What is indirectly felt, such as the world around us, science, or physical theories, depends on the senses and could therefore be an illusion, such as being part of a virtual world. We could, theoretically, be living inside virtual worlds in an underlying alien universe with different physical laws and scientific facts, but we can nonetheless be sure of the reality of our conscious experiences in themselves, which are directly felt.
There is a difference between valid and invalid human values, which is the ground of justification for moral realism: valid values have an epistemological justification, while invalid ones are based on arbitrary choice or intuition. The epistemological justification of valid values occurs by that part of our experiences which has a direct certainty, as opposed to indirect: conscious experiences in themselves. Likewise, only conscious beings can be said to be ethically relevant in themselves, while what goes on in the hot magma at the core of the earth, or in a random rock in Pluto, are not. Consciousness creates a subject of experience, which is required for direct ethical value. It is straightforward to conclude, therefore, that good conscious experiences constitute what is good, and bad conscious experiences constitute what is bad. Good and bad are what ethical value is about.
Good and bad feelings (or conscious experiences) are physical occurrences, and therefore objectively good and bad occurrences, and objective value. Other fictional values without epistemological (or logical) justification are therefore in another category, and simply constitute the error which comes from allowing free choice of one’s values for beings with a similar biological constitution.
Personal Identity
The existence of personal identities is purely an illusion that cannot be justified by argument, and clearly disintegrates upon deeper analysis (for why that is, see, e.g., this essay: Universal Identity, or for an introduction to the problem, see Less Wrong article The Anthropic Trilemma).
Different instances in time of a physical organism relate to it in the same way that any other physical organism in the universe does. There is no logical basis for privileging a physical organism’s own viewpoint, nor the satisfaction of their own values over that of other physical organisms, nor for assuming the preponderance of their own reasoning over those of other physical organisms of contextually comparable reasoning capacity.
Therefore, the argument of variation or orthogonality could, at best, assume that a superintelligent physical organism with complete understanding of these cognitively trivial philosophical matters would have to consider all viewpoints and valid preferences in their utility function, in a way much similar to coherent extrapolated volition (CEV), extrapolating the values for intelligence and removing errors, but taking account of the values of all sentient physical organisms: not only humans, but also animals, and possibly sentient machines and aliens. The only values that are validly generalizable among such widely differing sentient creatures are good and bad feelings (in the present or future).
Furthermore, a superintelligent physical organism with such understanding would have to give equal weight to the reasoning of other physical organisms of contextually comparable reasoning capacity (depending on the cognitive demands of the context, or problem, even some humans can reason perfectly well), if existent. In case of convergence, this would be a non-issue. In case of divergence, this would force an evaluation of reasons or argumentation, seeking a convergence or preponderance of argument.
Conclusions
Taking the orthogonality thesis to be merely the assumption of divergence of ethical values of superintelligent agents, but not a statement about the issues with practical implementation and tampering with or forcing them by non-superintelligent humans, then there are two fatal arguments against it, one on the side of meta-ethics (moral realism), and one on the side of personal identity (open/empty individualism, or universal identity).
Beings with general superintelligence should find these fundamental philosophical matters trivial (meta-ethics and personal identity), and understand them completely. They should take a non-privileged and objective viewpoint, accounting for all perspectives of physical subjects, and giving (a priori) similar consideration for the reasoning of all physical organisms of contextually comparable reasoning capacity.
Furthermore they would understand that the free variation of values, even in comparable causal chains of biologically similar organisms, comes from error, and that their extrapolation for intelligence would result in moral realism with good and bad feelings as the epistemologically justified and only valid direct values, from which all other indirectly or instrumentally valuable actions derive their indirect value. For instance, survival, which in a paradise can have positive value, coming from good feelings in the present and future, and which in a hell can have negative value, coming from bad feelings in the present and in the future.
Perhaps certain architectures or contexts involving beings with superintelligence, caused by beings without superintelligence and erratic behavior, could be forced to produce unethical results. This seems to be the most grave existential risk that we face, and would not come from beings with superintelligence themselves, but from human error. The orthogonality thesis is fundamentally mistaken in relation to beings with general superintelligence (surpassing all human cognitive capacities), but it might be practically realized by non-superintelligent human agents.