How many philosophers accept the orthogonality thesis ? Evidence from the PhilPapers survey
The orthogonality thesis and its relation to existing meta-ethical debates
In the field of AI alignment theory, the orthogonality thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of motivation. The reverse thesis, that we may call the heterogonality thesis, asserts that, with enough intelligence, any possible agent would pursue only one set of motivations.
In the field of meta-ethics, moral internalism asserts that any possible agent who hold a moral judgment is motivated to act on this judgment. For example, according to moral internalism, any agent who hold that one ought to donate 10% of one’s income to charity is motivated to do so.
Also in the field of meta-ethics, moral realism asserts that some moral judgments are objectively correct. This is a form of moral cognitivism, which that moral judgments are factual statements that can be objectively correct or incorrect (anti-realist cognitivism is error theory, which asserts that all moral judgments are incorrect).
It’s easy to see that the heterogonality thesis is moral internalism plus moral realism. A moral realist would say that, with enough intelligence, any possible agent would discover objective morality and hold only one set of moral judgments that is objectively correct. Therefore, a moral realist who is also a moral internalist would support the heterogonality thesis by saying that this mean that, with enough intelligence, any possible agent would be motivated to act by only one set of moral judgments, and thus would pursue only one set of motivations.
This is why, even though the orthogonality thesis is a recent concept that is only known by the small circles of AI alignment theorists (and I had to invent the term for its negation by myself), we can try to estimate how many philosophers accept the orthogonality thesis.
The PhilPapers survey
It included three questions on meta-ethics: whether one accept moral realism or moral anti-realism, whether one accept moral internalism or moral externalism, and whether one accept moral cognitivism or moral non-cognitivism. (There was also a question on normative ethics, whether one accept virtue ethics, deontology, or consequentialism. It is not relevant to the orthogonality thesis.)
Each question is divided between multiple options: one for every position, plus an “Other” option for people for whom the question is too unclear to answer, agnostics, people insufficiently familiar with the issue, etc.
Methodology
The methodology is implemented by a bash
script which is available in the appendix. It downloads the answers of public respondents to the PhilPapers survey, extract their opinions on meta-ethics, exclude philosophers who picked “Other” options (because we can’t know if they accept the orthogonality thesis), and then compute the number of philosophers (with a knowable opinion) who accept the orthogonality thesis.
Results
66% of philosophers (with a knowable opinion) accept the orthogonality thesis. This is about two thirds of philosophers.
Appendix: Source code of the script
#!/bin/bash
# WARNING: This script creates many new files.
# It is highly recommended to be in an empty folder when executing it.
function opinion() {
# Usage: opinion <file> <question> <answer A> <answer B>
# Example: opinion 42 "Meta-ethics" "moral realism" "moral anti-realism"
str="$2: $3 or $4?<\/td><td bgcolor='#......' style='width:250px'>"
answer=$(grep -o "$str[-A-Za-z/: ]*" "$1" | sed "s/$str//")
r=other
if grep "$3" <<< "$answer" > /dev/null; then r=$3; fi
if grep "$4" <<< "$answer" > /dev/null; then r=$4; fi
echo $r
}
function metaethical_opinions() {
# Usage: metaethical_opinions <file>
# Example: metaethical_opinions 42
metaethics=$(opinion "$1" "Meta-ethics" "moral realism" "moral anti-realism")
mjudgement=$(opinion "$1" "Moral judgment" "cognitivism" "non-cognitivism")
motivation=$(opinion "$1" "Moral motivation" "internalism" "externalism")
echo "$metaethics $mjudgement $motivation"
}
if ! [ -e public_respondents.html ]; then
wget https://philpapers.org/surveys/public_respondents.html
fi
if ! [ -e pps_meo ]; then
for profile in $(sed “s/’/\n/g” public_respondents.html | grep ”https://philpapers.org/profile/″); do
id=$(cut -d/ -f5 <<< $profile)
if ! [ -e $id ]; then
wget $profile -O $id
fi
metaethical_opinions $id
done | sort | uniq -c | grep -v other | sed ‘s/^ *//’ > pps_meo
fi
orthogonalists=$(grep -v -P “moral realism\tcognitivism\tinternalism” pps_meo | cut -d\ -f1 | paste -sd+ | bc)
philosophers=$(cut -d\ -f1 pps_meo | paste -sd+ | bc)
python3 << EOF
print(“{}% of philosophers (with a knowable opinion) accept the orthogonality thesis.”.format($orthogonalists/$philosophers * 100))
EOF
This should not be reported this way. It should be reported as something like 66%. The other digits are not meaningful.
Yes, you’re right, some people raised this in the /r/ControlProblem subreddit. I fixed this.
Given that the phrase “orthogonality thesis” was not coined until 2012, I doubt the usefulness of this data set in determining current philosophical consensus around it.
Yes, this is the whole point of the first part of the article.
Moral realism plus moral internalism does not imply heterogonality. Just because there is an objectively correct morality, does not mean that any sufficiently powerful optimization process would believe that that morality is correct.
Becasue?
When people say that a morality is “objectively correct”, they generally don’t mean to imply that it is supported by “universally compelling arguments”. What they do mean might be a little hard to parse, and I’m not a moral realist and don’t claim to be able to pass their ITT, but in any case it seems to me that the burden of proof is on the one who claims that their position does imply heterogonality.
I think they do mean that quite a lot of the time, for non-srawman versions of “universally compelling”. I suppose what you a getting at objectively correct morality existing, in some sense, but being undiscoverable, or cognitively inaccessible.
Sure, probably some of them mean that, but you can’t assume that they all do.
But then that would be covered by “internalism”.
That wouldn’t be covered by “internalism”. Whether any possible agent who hold a moral judgment is motivated to act on this judgment is orthogonal (no pun intended) to whether moral judgments are undiscoverable or cognitively inaccessible.
Arguably, AIs don’t have Omohundroan incentives to discover morality.
Whether it would believe it, and whether it would discover it are rather separate questions.
It can’t believe it if it doesn’t discover it.
It is possible to be told something.
Yes, this is my problem with this theory, but there are much stupider opinions held by some percentage of philosophers.
If only everyone could agree with what they are.
Also, it’s not clear that AI would reject the proposition that if there are objectively correct values, then it should update its value system to them, since humans don’t always.
Let me make sure that I get this right: you look at the survey, measure how many people answered yes to both moral internalism and moral realism, and conclude that everyone who did not accepts the orthogonality thesis?
If yes, then I don’t think that’s a good approach, for three distinct reasons
1. You’re assuming philosophers all have internally consistent positions
2. I think you merely have a one-way implication: int∧real⟹het, but not necessarily backwards. It seems possible to reject the orthogonality thesis (and thus accept heterogonality) without believing in both moral realism and moral internalism. But most importantly,
3. Many philosophers probably evaluated morel internalism with respect to humans. Like, I would claim that this is almost universally true for humans, and I probably agree with moral realism, too. kind of. But I also believe the orthogonality thesis when it comes to AI.
All your objections are correct and important, and I think the correct results may be anything from 50% to 80%. That said, I think there’s a reasonable argument that most heterogonalists would consider morality to be the set of motivations from “with enough intelligence, any possible agent would pursue only one set of motivations” (more mathematically, the utility function from “with enough intelligence, any possible agent would pursue only one utility function”).
Can we use “collinearity” instead? It’s an existing word which is the opposite of orthogonality.
I’m not sure it really conveys the relevant idea—it’s too specific an opposite of “orthogonality”. I’m not keen on “heterogonality” either, though; that would be the opposite of “homogonality” if that were a word, but not of “orthogonality”. “Dependence” or “dependency”? (On the grounds that “orthogonality” here really means “independence”.) I think we need a more perspicuous name than that. “The value inevitability thesis” or something like that.
Actually, I’m not very keen on “orthogonality” either because it suggests a very strong kind of independence, where knowing that an agent is highly capable gives us literally no information about its goals—the Arbital page about the orthogonality thesis calls that “strong orthogonality”—and I think usually “orthogonality” in this context has a weaker meaning, saying only that any computationally tractable goal is possible for an intelligent agent. I’d rather have “orthogonality” for the strong thesis, “inevitability” for its opposite, and two other terms for “weak orthogonality” (the negation of inevitability) and “weak inevitability” (the negation of strong orthogonality).
Quoting the specific definitions in the Arbital article for orthogonality, in case people haven’t seen that page (bold added):
I thought about orthodox/heterodox when making the term.
Ah, I see. The trouble is that “ortho-” is being used kinda differently in the two cases.
Ortho- means “straight” or “right”. Orthodoxy is ortho-doxy, right teaching, as opposed to hetero-doxy, different teaching (i.e., different from that of The One True Church, and obviously therefore wrong). But orthogonal is ortho-gonal, right-angled, where of course a “right” angle is traditionally half of a “straight” angle. (Why? Because “right” also means “upright”, so a “right” angle is one like that between something standing upright and the ground it stands on. This applies in Greek as well as English.) I suppose heterogonality could be other-angled-ness, i.e., being at an angle other than a right angle, but that doesn’t feel like a very natural meaning to me somehow.
I don’t think the orthogonality thesis can be defined as ~[moral internalism & moral realism] -- that is, I think there can be and are philosophers who reject moral internalism, moral realism, *and* the orthogonality thesis, making 66% a high estimate.
Nick Land doesn’t strike me as a moral internalist-and-realist (although he has a Twitter and I bet your post will make its way to him somehow), but he doesn’t accept the orthogonality thesis:
This is a form of internalism-and-realism, but it’s not about morality—so it wouldn’t be inconsistent to reject orthogonality and ‘heterogonality’.
I recall someone in the Xenosystems orbit raising the point that humans, continuously since long before our emergence as a distinct species, existed under the maximal possible amount of selection pressure to reproduce, but 1) get weird and 2) frequently don’t reproduce. There are counterarguments that can be made here, of course (AIs can be designed with much more rigor than evolution allows, say), but it’s another possible line of objection to orthogonality that doesn’t involve moral realism.