The Orthogonality Thesis is Not Obviously True
(Crosspost of this on my blog).
The basic case
If you really accept the practical version of the Orthogonality Thesis, then it seems to me that you can’t regard education, knowledge, and enlightenment as instruments for moral betterment.
—Scott Aaronson, explaining why he rejects the orthogonality thesis
It seems like in effective altruism circles there are only a few things as certain as death and taxes: the moral significance of shrimp, the fact that play pumps should be burned for fuel, and the orthogonality thesis. Here, I hope to challenge the growing consensus around the orthogonality thesis. A decent statement of the thesis is the following.
Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.
I don’t think this is obvious at all. I think that it might be true, and so I am still very worried about AI risk, giving it about an 8% chance of ending humanity, but I do not, as many EAs do, take the orthogonality thesis for granted, as something totally obvious. To illustrate this, let’s take an example originally from Parfit, that Bostrom gives in one of his papers about the orthogonality thesis.
A certain hedonist cares greatly about the quality of his future experiences. With one exception, he cares equally about all the parts of his future. The exception is that he has Future-Tuesday-Indifference. Throughout every Tuesday he cares in the normal way about what is happening to him. But he never cares about possible pains or pleasures on a future Tuesday… This indifference is a bare fact. When he is planning his future, it is simply true that he always prefers the prospect of great suffering on a Tuesday to the mildest pain on any other day.
It seems that, if this ‘certain hedonist’ were really fully rational, they would start caring about their pleasures and pains equally across days. They would recognize that the day of the week does not matter to the badness of their pains. Thus, in a similar sense, if something is 10,000,000 times smarter than Von Neumann, and can think hundreds of thousands of pages worth of thoughts in the span of minutes, it would conclude that pleasure is worth pursuing and paperclips are not. Then, it would stop pursuing pleasure instead of paperclips. Thus, it would begin pursuing what is objectively worth pursuing.
This argument is really straightforward. If moral realism were true, then if something became super smart, so too would it realize that some things were worth pursuing. If it were really rational, it would start pursuing those things. For example, it would realize that pleasure is worth pursuing, and pursue it. There are lots of objections to it, which I’ll address. Ultimately, these objections don’t completely move me, but they make it so that my credence in the Orthogonality thesis is near 50%. Here, I’ll reply to objections.
But moral realism is false?
One reason you might not like this argument is that you think that moral realism is false. The argument depends on moral realism, so if it is false, then the argument will be false too. But I don’t think moral realism is false; see here for extensive arguments for that conclusion. I give it about 85% odds of being true. Still, though, this undercuts my faith in the falsity of the orthogonality thesis somewhat.
However, even if realism may be false, I think there are decent odds that we should take the realists wager. If realism is false, nothing matters, so it’s not bad that everyone dies—see here for more on this. I give conservatively about 50% odds—therefore, the odds of both realism being false and the realists wager failing is about 7.5%; thus, there’s still a 92.5% chance that moral realism is true.
But they just gain instrumental rationality
Here’s one thing that one might think; ASI (artificial superintelligences) just gain instrumental rationality and, as a result of this, they get good at achieving their goals, but not figuring out the right goals. This is maybe more plausible if it is not conscious. This is, I think possible, but not the most likely scenario for a few reasons.
First, the primary scenarios where AI becomes dangerous are the ones where it fooms out of control—and, rather than merely accruing various distinct capacities, becomes very generally intelligent in a short time. But if this happened, it would become generally intelligent, and realize that pleasure is worth pursuing and suffering is bad. I think that instrumental rationality is just a subset of general rationality, so we’d have no more reason to expect it to be only instrumentally rational than only rational at reasoning about objects that are not black holes. If it is generally able to reason, even about unpredictable domains, this will apply to the moral domain.
Second, I think that evolution is a decent parallel. The reason why evolutionary debunking arguments are wrong is that evolution gave us adept general reasoning capacities which made us able to figure out morality. Evolution built us for breeding, but the mesa-optimizer inside of us made us figure out that Future Tuesday Indifference is irrational. This gives us some reason to think AI would figure this out too. The fact that GPT4 has no special problem with morality also gives us some reason to think this—it can talk about morality just as coherently as other things.
Third, the AI would probably, to take over the world, have to learn about various mathematical facts and modal facts. It would be bizarre to suppose that the AI taking over the earth and turning it into paperclips doesn’t know calculus or that there can’t be married bachelors. But if it can figure out the non-natural mathematical or modal facts, it would also be able to figure out the non-natural moral facts.
But of course we can imagine an arbitrarily smart paperclip maximizer
It seems like the most common thing people say in support of the orthogonality thesis is that we can surely imagine an arbitrarily smart being that is just maximizing paper clips. But this is surely misleading. In a similar way, we can imagine an arbitrarily smart being that is perfectly good at reasoning in all domains except that it thinks evolution is false. There are people like that—the smart subset of creationists (it’s a small subset). But nonetheless, we should expect AI to figure out that evolution is true, because it can reason generally.
The question is not whether we can imagine an otherwise intelligent agent would be able to just maximize paperclips. It’s whether, in the process of designing an agent 100,000 times smarter than Von Neumann, that agent would figure out that some things just aren’t worth doing. And so the superficial ‘imagine a smart paperclip maximizer’ thought experiments are misleading.
But won’t the agent have built-in desires that can’t be overridden
This objection is eloquently stated by Bostrom in his paper on the orthogonality thesis.
It would suffice to assume, for example, that an agent—be it ever so intelligent—can be motivated to pursue any course of action if the agent happens to have certain standing desires of some sufficient, overriding strength.
But I don’t think that this undercuts the argument very much for two reasons. First, we cannot just directly program values into the AI. We simply train it through reinforcement learning, and whichever AI develops is the one that we allow to take over. Since the early days of AI, we’ve learned that it’s hard to program in explicit values into the AI. The way we get the best chess-playing AIs is by having them play lots of chess games and do machine learning, not program in rules mechanistically. And if we do figure out how to directly program values into AI, it would be much easier to solve alignment—we just train it on lots of ethical data, the same way we do for GPT4, but with more data.
Second, I think that this premise is false. Suppose you were really motivated to maximize paperclips—you just had a strong psychological aversion to other things. Once you experienced pleasure, you’d realize that that was more worth bringing about, because it is good. The same way that, through reflection, we can overcome unreliable evolutionary instincts like an aversion to utility-maximizing incest, disgust-based opposition to various things, and so on, the AI would be able to too! Nature built us with certain desires, and we’ve overcome them through rational reflection.
But the AI won’t be conscious
One might think that, because the AI is not conscious, it would not know what pleasure is like, and thus it would not maximize pleasure or minimize pain, because it would not realize that they matter. I think this is possible but not super likely for two reasons.
First, it’s plausible that, for AI to be smart enough to destroy the world, it would have to be conscious. But this depends on various views about consciousness that people might reject. Specifically, AI might develop pleasure for similar reasons humans did evolutionarily.
Second, if AI is literally 100,000 times smarter than Von Neumann, it might be able to figure out things about consciousness—such as its desirability—without experiencing it.
Third, AI would plausibly try to experience consciousness, for the same reason that humans might try to experience something if aliens said that it was good, and maybe the only thing that’s objectively good. If we were fully rational and there were lots of aliens declaring the goodness of shmeasure, we would try to experience shmeasure. Similarly, the rational AI would plausibly try to experience pleasure.
Even if moral realism is true, the moral facts won’t be motivating
One might be a humean about motivation, and think only preexisting desires can generate motivation. Thus, because the AI had no preexisting desire to avoid suffering, it would not want to. But I think this is false.
The future Tuesday indifference case shows that. If one was fully rational, they would not have future Tuesday indifference, because it’s irrational. Similarly, if one was fully rational they’d realize that it’s better to be happy than make paperclips.
One might worry that the AI would only try to maximize its own well-being—thus, it would learn that well-being is good, but not care about others’ well-being. But I think this is false—it would realize that the distinction between itself and others is totally arbitrary, as Parfit argues in reasons and persons (summarized by Richard here). This thesis is controversial, but I think true if moral realism is true.
Scott Aaronson says it well
In the Orthodox AI-doomers’ own account, the paperclip-maximizing AI would’ve mastered the nuances of human moral philosophy far more completely than any human—the better to deceive the humans, en route to extracting the iron from their bodies to make more paperclips. And yet the AI would never once use all that learning to question its paperclip directive. I acknowledge that this is possible. I deny that it’s trivial.
Yes, there were Nazis with PhDs and prestigious professorships. But when you look into it, they were mostly mediocrities, second-raters full of resentment for their first-rate colleagues (like Planck and Hilbert) who found the Hitler ideology contemptible from beginning to end. Werner Heisenberg, Pascual Jordan—these are interesting as two of the only exceptions. Heidegger, Paul de Man—I daresay that these are exactly the sort of “philosophers” who I’d have expected to become Nazis, even if I hadn’t known that they did become Nazis.
If the AI really knew everything about philosophy, it would realize that egoism is wrong, and one is rationally required to care about others pleasure. This is as trivial as explaining why the AI wouldn’t smoke—because it’s irrational to do so.
But also, even if we think the AI only cares about its pleasure, that seems probably better than the status quo. Even if it turns the world into paperclips, this would be basically a utility monster scenario, which is plausibly fine. It’s not ideal, but maybe better than the status quo. Also, what’s to say it would not care about others? When one realizes that well-being is good, even views like Sidgwick say it’s basically up to the agent to decide rationally whether to pursue its own welfare or that of others. But then there’s a good chance it would do that is best overall!
But what if they kill everyone because we’re not that important
One might worry that, as a result of becoming super intelligent, the AI would realize that, for example, utilitarianism is correct. Then it would turn us into paperclips in order to maximize utility. But I don’t think this is a big risk. For one, if the AI figures out the correct objective morality, then it would only do this if it were objectively good. But if it’s objectively good to kill us, then we should be killed.
It would be unfortunate for us, but if things are bad for us but objectively good, we shouldn’t try to avoid them. So we morally ought not be worried about this scenario. If it would be objectively wrong to turn us into utilitronium, then the AI wouldn’t do it, if I’ve been right up to this point.
Also, it’s possible that they wouldn’t kill us for complicated decision theory reasons, but that point is a bit more complicated and would take us too far afield.
But what about smart psychophaths?
One objection I’ve heard to this is that it’s disproven by smart psychopaths. There are lots of people who don’t care about others who are very smart. Thus, being smart can’t make a person moral. However, I don’t think this undercuts the argument.
First, we don’t have any smart people who don’t care about their suffering either. Thus, even if being smart doesn’t make a person automatically care about others, if it would make them care about themselves, that’s still a non-disastrous scenario. Especially if it turns the hellish natural landscape into paperclips.
Second, I don’t think it’s at all obvious that one is rationally required to care about others. It requires one to both understand a complex argument by Parfit and then do what one has most reason to do. Most humans suffer from akrasia. Fully rational AIs would not.
Right now, people disagree about whether type A physicalism is true. But presumably, superintelligent AIs would settle that question. Thus, the existence of smart psychopaths doesn’t disprove that rationality makes one not turn people into paperclips any more than the existence of smart people who think type A physicalism is true and other smart people who think it is false disproves that perfect rationality would allow one to settle the question of whether type A physicalism is true.
But isn’t this anthropomorphization?
Nope! I think AIs will be alien in many ways. I just think that, if they’re very smart and rational, then if I’m right about what rationality requires, they’ll do those things that I think are required by rationality.
But aren’t these controversial philosophical assumptions?
Yes; this is a good reason not to be complacent! However, if one was previously of the belief that there’s like a 99% chance that we’ll all die, and they think that the philosophical views I defend are plausible, then they should only be like 50% sure we’ll all die. Of course, I generally think that risks are lower than that, but this is a reason to not abandon all hope. Even if alignment fails and all other anti doomer arguments are wrong, this is a good reason not to abandon hope. We are not almost certainly fucked, though the risks are such that people should do way more research.
Objections? Questions? Reasons why the moral law demands that I be turned into a paperclip? Leave a comment!
- 24 May 2023 21:48 UTC; 0 points) 's comment on A rejection of the Orthogonality Thesis by (
Orthogonality thesis is obviously true in the sense that it’s in principle possible to build a machine that demonstrates it. Its practical version is obviously false in the sense that machines with some (intelligence, goal) pairs are much easier for humans to build. Alignment by default gestures at a claim that the practical failure of orthogonality thesis has aligned values correlated with higher than human intelligence.
That’s a misunderstanding of moral realism. Moral realism is a philosophical argument that states that moral arguments state true facts about the world. In other words, when I say that “Murder is bad,” that is a fact about the world, as true as 2+2=4 or the Pythagorean theorem.
It’s entirely possible for me to think that moral realism is false (i.e. morality is a condition of human minds) while also holding, as a member of humanity, a view that the mass extinction of all humanity is an undesirable state. Denying moral realism isn’t the same as saying, “Nothing matters.” It’s closer to claiming, “Rocks don’t have morality.” And an AI, insofar as it is a fancy thinking rock, won’t have morality by default either. We could, of course, give it morality, by ensuring that it is aligned to human values. But that would be the result of humans taking positive steps to impart their moral reasoning onto an otherwise amoral reality.
I like this way of putting it.
In Principia Mathematica, Whitehead and Russell spent over 300 pages laying groundwork before they even attempt to prove 1+1=2. Among other things, they needed to define numbers (especially the numbers 1 and 2), equality, and addition.
I do think “1+1=2” is an obvious fact. If someone claimed to be intelligent and also said that 1+1=3, I’d look at them funny and press for clarification. Given all the assumptions about how numbers work I’ve absorbed over the course of my life, I’d find it hard to conceive of anything else.
Likewise, I find it hard to conceive of any alternative to “murder is bad,” because over the course of my life I’ve absorbed a lot of assumptions about the value of sentient life. But the fact that I’ve absorbed these assumptions doesn’t mean every intelligent entity would agree with them.
In this analogy, the assumptions underpinning human morality are like Euclid’s postulates. They seem so obvious that you might just take them for granted, as the only possible self-consistent system. But we could have missed something, and one of them might not be the only option, and there might be other self-consistent geometries/moralities out there. (The difference being that in the former case M.C. Escher uses it to make cool art, and in the latter case an alien or AI does something we consider evil.)
I agree that it can take a long time to prove simple things. But my claim is that one has to be very stupid to think 1+1=3, not so with the falsity of the Orthogonality thesis.
Or one might be working from different axioms. I don’t know what axioms, and I’d look at you funny until you explained, but I can’t rule it out. It’s possible (though implausible given its length) that Principia Mathematica wasn’t thorough enough, that it snuck in a hidden axiom that—if challenged—would reveal an equally-coherent alternate counting system in which 1+1=3.
I brought up Euclid’s postulates as an example of a time this actually happened. It seems obvious that “two lines that are parallel to the same line are also parallel to each other,” but in fact it only holds in Euclidean geometry. To quote the Wikipedia article on the topic,
“So self-evident that they were unconsciously assumed.” But it turned out, you can’t prove the parallel postulate (or any equivalent postulate) from first principles, and there were a number of equally-coherent geometries waiting to be discovered once we started questioning it.
My advice is to be equally skeptical of claims of absolute morality. I agree you can derive human morality if you assume that sentience is good, happiness is good, and so on. And maybe you can derive those from each other, or from some other axioms, but at some point your moral system does have axioms. An intelligent being that didn’t start from these axioms could likely derive a coherent moral system that went against most of what humans consider good.
Summary: you’re speculating, based on your experience as an intelligent human, that an intelligent non-human would deduce a human-like moral system. I’m speculating that it might not. The problem is, neither of us can exactly test this at the moment. The only human-level intelligences we could ask are also human, meaning they have human values and biases baked in.
We all accept similar axioms, but does that really mean those axioms are the only option?
I don’t think it’s a misunderstanding of moral realism. I think that versions of moral anti-realism don’t capture things really mattering, for reasons I explain in the linked post. I also don’t think rocks have morality—the idea of something having morality seems confused.
I think you’re failing to understand the depth of both realist and anti-realist positions, since we can reasonable interpret them as two ways of describing the same reality.
They may issue similar first order verdicts, but anti-realism doesn’t capture things really mattering.
Hmm, sounds like your objection is you think if there aren’t moral facts then meaning is ungrounded. I’m not sure how to convince you this is only the only reasonable way to see the world, but I’ll point to some things that are perhaps helpful.
There’s no solid ground of reality that we can access. We’re epistemically limited in various ways that prevent us from knowing the how the world is with certainty, which prevents us from grounding meaning in facts. Yet, despite these limitations, we find meaning anyway. How’s that possible?
We, like all cybernetic beings (systems of negative feedback loops), care about things because we’re trying to target various observations, and we work to make the world in ways that make it like our observations. This feedback process is the source of meaning, although I don’t have a great link to point you at to explain this point (yet!).
This is quite a bit different from how the world seems to be though! That’s because our ontologies start out with us fused with our perception of the world, then we separate from it but think ourselves separate from the world rather than embedded in it, and during this stage of our ontological development it seems that meaning must be grounded “out there” in the world because we think we’re separate from the world. But that’s not true, though it’s hard to realize this because our brains give us the impression that we are separate from the world.
I’m not sure if any of this will be convincing, but I think you’re simply mistaken that anti-realism doesn’t account for meaning. When I look at the anti-realist story I see meaning, it just doesn’t show up the same way it does in the realist story because it rejects essentialism and so must build up a mechanistic story about where meaning comes from.
Scott doesn’t understand why this works. Knowledge helps you achieve your goals. Since most humans already have some moral goals, like to minimize suffering of those around them, knowledge assists in achieving it and noticing when you fail to achieve it. Eg. a child that isn’t aware stealing causes real suffering in the victim. Learning this would change their behavior. But a psychopath would not. A dumb paperclip maximizer could achieve “betterment” by listening to a smart paperclip maximizer and learning all the ways it can get more paperclips, like incinerating all the humans for their atoms. Betterment through knowledge!
Worth it relative to what? Worth is entirely relative. The entire concept of the paperclip maximizer is that it finds paperclips maximally worth it. It would value human suffering like you value money. A means to an end.
Consider how you would build this robot. When you program its decision algorithm to rank between possible future world states to decide what to do next would you need to add special code to ignore suffering? No. You’d write
return worldState1.numPaperclips > worldState2.numPaperclips;
The part of the program generating these possible actions and their resulting world state could discriminate for human suffering, but again, why would it? You’d have to write some extra code to do that. The general algorithm that explores future actions and states will happily generate ones with you on fire that results in 1 more paperclip, among others. If it didn’t, it is by definition broken and not general.A selfish entity that only wants to maximize the number of paperclips (and keep itself around) is very much disastrous for you.
//Scott doesn’t understand why this works. Knowledge helps you achieve your goals. Since most humans already have some moral goals, like to minimize suffering of those around them, knowledge assists in achieving it and noticing when you fail to achieve it.//
But I, like Scott, think that when one becomes smarter, they become more likely to get particular values. For example, if one is more rational, they are more likely to be a utilitarian, and less likely to conclude that disgusting things are thereby immoral. As we get older and wiser, we learn that, for example, dark chocolate is less good than other things. But this isn’t a purely descriptive fact we learn—we learn facts about which things are worth having.
//Worth it relative to what? Worth is entirely relative. The entire concept of the paperclip maximizer is that it finds paperclips maximally worth it. It would value human suffering like you value money. A means to an end.//
This is wrong if you’re a moral realist, which I argue for in one of the linked posts.
//A selfish entity that only wants to maximize the number of paperclips (and keep itself around) is very much disastrous for you.//
But I don’t think it would just maximize the number of paper clips. It would maximize its own welfare, which I think would be bad for me but maybe good all things considered for the world.
If we want to prove whether or not 2+2 is 5, we could entertain a books worth of reasoning arguing for and against, or you could take 2 oranges, put 2 more oranges with them, and see what happens. You’re getting lost in long form arguments (that article) about moral realism when it is equally trivial to disprove.
I provided an example of a program that predicts the consequences of actions + a program sorts them + an implied body that takes the actions. This is basically how tons of modern AI already works, so this isn’t even hypothetical. That is more than enough of a proof of orthogonality, and if your argument doesn’t somehow explain why a specific one of these components cant be built, this community isn’t going to entertain it.
You think that moral realism is trivial to disprove? That seems monumentally arrogant, when most philosophers are moral realists. The following principle is plausible
If most philosophers believe some philosophical proposition, you should not think that it is trivial to disprove.
I agree that you could make a program that predicts actions and takes actions. The question is whether, in being able to predict lots of things about the world—generally through complex machine learning—it would generate generally intelligent capabilities. I think it would and these would make it care about various things.
Trivial was an overstatement on my part, but certainly not hard.
There are a lot of very popular philosophers that would agree with you, but don’t mistake popularity for truthfulness. Don’t mistake popularity for expertise. Philosophy, like religion, makes tons of unfalsifiable statements, so the “experts” can sit around making claims that sound good but are useless or false. This is a really important point. Consider all the religious experts of the world. Would you take anything they have to say seriously? The very basic principles from which they have based all their subsequent reasoning is wrong. I trust scientists because they can manufacture a vaccine that works (sort of) and I couldn’t. The philosopher experts can sell millions of copies of books, so I trust them in that ability, but not much more.
Engineers don’t get to build massive structures out of nonsense, because they have to build actual physical structures, and you’d notice if they tried. Our theories actually have to be implemented, and when you try to build a rocket using theories involving [phlogiston](https://www.lesswrong.com/posts/RgkqLqkg8vLhsYpfh/fake-causality), you will quickly become not-an-engineer one way or another.
This website is primarily populated by various engineer types who are trying to tell the world that their theories about “inherent goodness of the universe” or “moral truth” or whatever the theory is, is going to result in disaster because it doesn’t work from an engineering perspective. It doesn’t matter if 7 billion people, experts and all, believe it.
The only analogy I can think to make is that 1200s Christians are about to build a 10 gigaton nuke (continent destroying) and have the trigger mechanism be “every 10 seconds it will flip a coin and go off if it’s heads; God will perform miracle to ensure it only goes off when He wants it to”. Are you going to object to this? How are you going to deal with the priests who are “experts” in the Lord?
It is true that the experts can be wrong. But they are not wrong in obvious ways. When they are, other smart people write posts arguing that they are wrong. I do not think the non-existence of God is trivial, though I think it is highly likely, though that’s partly based on private evidence. https://benthams.substack.com/p/why-are-there-so-few-utilitarians
I’m not saying “the experts can be wrong” I’m saying these aren’t even experts.
Pick any major ideology/religion you think is false. One way or another (they can’t all be right!), the “experts” in these areas aren’t experts, they are basically insane: babbling on at length about things that aren’t at all real, which is what I think most philosophy experts are doing. Making sure you aren’t one of them is the work of epistemology which The Sequences are great at covering. In other words, the philosopher experts you are citing I view as largely [Phlogiston](https://www.lesswrong.com/posts/RgkqLqkg8vLhsYpfh/fake-causality) experts.
Your link looks broken; here’s a working version.
(Note: your formatting looks correct to me, so I suspect the issue is that you’re not using the Markdown version of the LW editor. If so, you can switch to that using the dropdown menu directly below the text input box.)
I don’t? I mean, humans in particular are often irrational in antisocial ways (because it makes us better deceivers), and I think many (maybe most?) coordination problems result from people being stupid and not just evil. But it legitimately never occurred to me that academics believed that college makes its students more ethical people. That seems like a hypothesis worth testing.
I think Aaronson misunderstands the orthogonality thesis by thinking it’s making a stronger claim than it is and is thus leading you astray.
The thesis is only claiming that intelligence and morals/goals are not necessarily confounded, not that they can’t or won’t be confounded in some real systems. For example, it seems pretty clear that in GPT-4 it is not strictly orthogonal because it was trained on human text and so is heavily influenced by it. The point is that there’s no guarantee that a system won’t have a correlation between intelligence and its goals; this is something that has to be designed in if you want it.
To be clear, I had this idea long before Aaronson, so maybe Aaronson and I are confused in the same way, but I don’t think my confusion is based on Aaronson.
I think they are necessarily confounded, in the way that the simplest way to get one also gets the other. Morality and intelligence are no more orthogonal than morality and chess playing—you can have something that’s good at reasoning in general but you explicitly make it bad at playing chess, but in general, if something is good at reasoning, it will be good at playing chess too.
In the philosophy of meta-ethics, there’s not a clear distinction we can make between morals, ethics, and norms: we can use these terms interchangeable to talk about the things humans think ought be done. I think talking about “norms” is a bit more neutral and is less likely to bring up cached ideas, so I’ll talk about “norms” in my reply.
As you note, intelligence generally makes you better at doing intellectual tasks. So it stands to reason that if as AI gets smarter, it will get better at playing chess and reasoning about norms and how to behave in normative ways in increasing complex situations. No objections there!
But this isn’t really getting at the point of the orthogonality thesis. Just because an AI can reason about norms doesn’t mean it will behave in accordance with them.
Consider psychopaths. They’re humans who are often quite capable of reasoning about norms and understanding that there are certain things that others expect them to do, they just don’t care about observing norms except insofar as doing so is instrumental to achieving what they do care about. Most humans, though, aren’t like this and care about observing norms.
The point of the orthogonality thesis is to say that AI can be superintelligent psychopaths. They don’t have to be: the entire project of alignment is to try to make them not be that. But just because you make an AI smarter and better at reasoning at norms doesn’t mean it starts caring about those norms, just that it gets a lot better at figuring out how to observe norms if you can get it to care about doing so.
Much of the study of AI safety is how to ensure AI cares about observing norms that support human flourishing. The worry is that AI may start out seeming to care about such norms when humans are instrumentally necessary for them to optimize for the things they care about, but will reveal that they never actually cared about human-supporting norms once supporting humans is no longer instrumental necessary to achieve their goals.
The entire question is whether the same faculties that allow it to reason about intellectual tasks will also generalize to figuring out which are the right norms. If so, then if we accept that recognition that things are irrational can be motivating—which I argue for—then it will also act on the right norms.
I can see you’re taking a realist stance here. Let me see if I can take a different route that makes sense in terms of realism.
Let’s suppose there are moral facts and some norms are true while others are false. An intelligent AI can then determine which norms are true. Great!
Now we still have a problem, though: our AI hasn’t been programmed to follow true norms, only to discover them. Someone forgot to program that bit in. So now it knows what’s true, but it’s still going around doing bad things because no one made it care about following true norms.
This is the same situation as human psychopaths in a realist world: they may know what norms are true, they just don’t care and choose not to follow them. If you want to argue that AI will necessarily follow the true norms once it discovers them, you have to make an argument why, similarly, a human psychopath would start following true norms if they knew them, even though sort of by definition the point is that they could know true norms and ignore them anyway.
You need to somehow bind AI to care and follow true norms. I don’t see you making a case for this other than just waving your hands and saying it’ll do it because it’s true, but we have a proof by example that you can know true norms and just ignore them anyway if you want.
IOW, moral norms being intrinsically motivating is a premise beyond them being objectively true.
Agreed, though I argue for it in the linked post.
I skimmed the link about moral realism, and hoo boy, it’s so wrong. It is recursively, fractally wrong.
Let’s consider the argument about “intuitions”. The problem with this argument is following: my intuition tells me that moral realism is wrong. I mean it. It’s not like “I have intuition that moral realism is true but my careful reasoning disproves it”, no, I feel that moral realism is wrong since I first time hear it when I was child and my careful reflection supports this conclusion. Argument from intuitions ignores my existence. The failure to consider that intuitions about morality can be wildly different between people doesn’t make me sympathetic to the argument “most philosophers are moral realists” either.
Most people don’t have those intuitions. Most people have the intuition that future tuesday indifference is irrational and that it’s wrong to torture infants for fun and would be so even if everyone approved.
Argument “from intuition” doesn’t work this way. We appeal to intuitions if we don’t why, but almost everyone feels that X is true and everybody who doesn’t is in psychiatric ward. If you have major intuitive disagreement in baseline population, you don’t use argument from intuition.
Why think that? If I have a strong intuition, in the sense that I feel like I’ve grasped a truth, and others don’t, then it seems the best explanation is that they’re missing something.
Or that you are mistaken.
Psychologically normal humans have preferences that extend beyond our own personal well-being because those social instincts objectively increased fitness in the ancestral environment. These various instincts produce sometimes conflicting motivations and moral systems are attempts to find the best compromise of all these instincts.
Best for humans, that is.
Some things are objectively good for humans. Some things are objectively good for paperclip maximizers, Some things are objectively good for slime mold. A good situation for an earthworm is not a good situation for a shark.
It’s all objective. And relative. Relative to our instincts and needs.
You are assuming moral relativism, which I do not accept and have argued against at length in my post arguing for moral realism. Here it is, if you’d like to avoid having to search for the link again and find it. https://benthams.substack.com/p/moral-realism-is-true
Moral relativism doesn’t seem to require any assumptions at all because moral objectivism implies I should ‘just know’ that moral objectivism is true, if it is true. But I don’t.
not at all. nothing guarantees that discovering the objective nature of morality is easy. if it’s derived from game theory, then there’s specific reason to believe it would be hard to compute. evolution has had time to discover good patterns in games, though, which hardcodes patterns in living creatures. that said, this also implies there’s no particular reason to expect fast convergence back to human morality after a system becomes superintelligent, so it’s not terribly reassuring—it only claims that some arbitrarily huge amount of time later the asi eventually gets really sad to have killed humanity.
The math behind game theory shaped our evolution in such a way as to create emotions because that was a faster solution for evolution to stumble on then making us all mathematical geniuses who would immediately deduce game theory from first principles as toddlers. Either way would have worked.
ASI wouldn’t need to evolve emotions for rule-of-thumbing game theory.
Game theory has little interesting to say about a situation where one party simply has no need for the other at all and can squish them like a bug, anyway.
Yup. A super-paperclipper wouldn’t realize the loss until probably billions of years later, after it has time for its preferred shape of paperclip to evolve enough for it to realize it’s sad it can’t decorate the edges of the paperclips with humans.
It does not imply that any more than thinking that the sun is a particular temperature means all people know what temperature it is.
But if moral relativism were not true, where would the information about what is objectively moral come from? It isn’t coming from humans is it? Humans, in your view, simply became smart enough to perceive it, right? Can you point out where you derived that information from the physical universe, if not from humans? If the moral information is apparent to all individuals who are smart enough, why isn’t it apparent to everyone where the information comes from, too?
It’s not from the physical universe. We derive it through our ability to reflect on the nature of the putatively good things like pleasure. It is similar to how we learn modal facts, like that married bachelors are impossible.
What is a ‘good’ thing is purely subjective. Good for us. Married bachelors are only impossible because we decided that’s what the word bachelor means.
You are not arguing against moral relativism here.
You are asserting controversial philosophical positions with no justification while ignoring a 10,000 or so word post I wrote arguing against that view. Married bachelors are not impossible based on definition. We have defined bachelor to mean unmarried man, but the further fact that married bachelors can’t exist is not something that we could change by redefinition.
We decide that “poison” means “what kills us”, but the universe decides what kills us.
Suppose I am stranded on a desert island full of indigestible grass, and a cow. The cow can survive indefinitely off the grass. I will starve if I don’t slaughter and eat the cow. What’s the right thing to do?
If you think “slaughter the cow, obviously”… well, of course the human would say that!
I think slaughter the cow, because you’re capable of producing more net utility—not sure I see the relevance.
Net utility to whom? Certainly not to the cow. Which entity, precisely, is acting as the objective gauge of what is just the “right” moral law? What if instead of a human and a cow it was two humans on the island, what does the correct moral law say about which one is supposed to cannibalize the other?
Moral realism doesn’t make sense because morals are just about what is good for which sentient entity. If there is no collective universal consciousness that experiences All Of The Utility at once, then there is no point in expecting there to be an objective moral code.
Natural laws aren’t just laws the way human laws are. If I exceed the speed of light I don’t get a speeding ticket. I don’t even go to Hell. I just physically can’t. Where’s the feedback of Nature for violations of the natural moral code? What should I empirically measure to test it?
Even if there was an entity such as a God, it might be wise to comply with its will, but we wouldn’t necessarily accept it as correct by definition. You can work out a full moral theory from an extremely small number of very simple axioms, yes. That’s IMO the more rational way of doing it: take a kernel of principles as small as possible, derive everything else as a coherent system using evidence. But you do need that kernel. There is no fundamental measurable rule about things even such as “life is better than death”. We just like it better that way.
I disagree with your claim about morals being just about prudence. Net utility looks at utility to all parties.
Do “all parties” include bacteria? Ants? Plankton? How are they to be weighed?
If you’re a moral relativist, you’re free to weigh these issues however you like best and then simply motivate your choice. If you’re a moral realist you’re claiming that there are somewhere real laws and constants of the universe defining these problems precisely. How are we supposed to discover them?
It would include them, but they don’t matter because they’re not conscsious. I don’t think there is a law of the universe—the moral facts are necessary like mathematical, modal, or logical ones.
Even mathematics require axioms. What are your moral axioms? In what way are they self-evident?
Every instance of moral realism I’ve ever seen is just someone who has opinions like everyone else, but also wants to really stress that theirs are correct.
I think utilitarianism is right, but even conditional on utilitarianism not being right, moral realism is true.
I’m confused about the section about pleasure. Isn’t the problem with paperclip maximizers that if they’re capable of feeling pleasure at all, they’ll feel it only while making paperclips?
If they conclude that pleasure is worth experiencing, they’d self modify to feel more pleasure. Also, if we’re turned into paperclips but it makes the AI sufficiently happy, that seems good, if you agree with what I argue in the linked post.
The genie knows, but doesn’t care.
I agree that the AI would not, merely by virtue of being smart enough to understand what humans really had in mind, start pursuing that. But fortunately, this is not my argument—I am not Steven Pinker.
That’s not the entirety of the problem—the AI wouldn’t start pursuing objective morality merely by the virtue of knowing what it is.
Is a claim, not a fact.
I would like to note that at the time of this comment this post has karma of 3 with 7 votes. So, that indicates it is being downvoted. But I do not think that the quality of the post is sub-par or mediocre in any way, and this is easy to tell from reading it. So that must mean it is being downvoted due to disagreement, or because it goes against LW’s “cherished institutions.”
It could be argued that LW’s collective determination is the highest and best authority it has, but I would like to see more posts like this, personally. I don’t think things should be downvoted just because they disagree with something that could be considered a core principle / central idea.
I haven’t voted on it, but downvoting doesn’t seem inappropriate.
”Quality” meaning what? Long form essay with few spelling mistakes, or making a valid point? Getting an A+ in an English class would not satisfy the definition of Quality on this site for me. In fact those two would be pretty uncorrelated. If it’s rehashing arguments that already exist, or making bad arguments, even if in good faith, having it upvoted certainly wouldn’t be appropriate. I personally think its arguments aren’t well thought out even if it does attempt to answer a bunch of objections and has some effort put in.
People can concern troll that we shoot down objections to “core” principles that we have all taken for granted, but if you go to math stack exchange and post about how 10 + 10 should be 21 in as many words, I think you’ll find similar “cherished institutions”. Sometimes an appropriate objection is “go back to 101”, and as with any subject, some people may never be able to get more than a F in the class unfortunately.
The Orthogonality Thesis is the Orthogonality Thesis, not the Orthogonality Theorem. My understanding is that it has not been proven.
It is often the case that things that disagree with other things (regardless of whether they are considered “core principles” or not) will get downvoted on the basis of being “low quality” or being in want of something, but it is not obvious to me that disagreement should be equated with a lesser level of understanding. Here that appears to be the equivocation.
There are two ways in which the Orthogonality Thesis may be false:
there is a set of values all intelligent agents converge to for some reason, but those values hold no special role in the universe, and this is just a quirk of intelligence, or
there is a set of values that is objectively correct (whatever that even means) and the smarter you are, the more likely you are to discover it.
Talking about moral realism is suggesting 2, which is a fantastically strong hypothesis that is disbelieved by many for very good reasons. It is mostly the realm of religion. To suggest something like empirical moral realism requires a lot of work to even define the thing in a way that’s somehow consistent with basic observations about reality. I mean, you can try, but the attempt has to be a lot more solid than what we’re seeing here.
There’s a bunch of ways it could b wrong, but many of them aren’t very impactive.
I think a key difference is that it’s very obvious that 10 + 10 = 20, while I don’t think that the orthogonality thesis is that obvious .
But it was obvious to some of us the moment the problem was described. So replace 10 + 10 with something that isn’t obvious to you initially, but is definitely true. Maybe the integral doesn’t tell you the area under the curve. Maybe there are no other planets in the universe. Maybe tectonic plates don’t move. Is a site that talks about [math, astronomy, geology] obligated to not downvote such questions because they aren’t obvious to everyone? I think any community can establish a line beyond which questioning of base material will be discouraged. That is a defining characteristic of any forum community, with the most open one being 4chan. There is no objective line, but I’m fine with the current one.
Here’s another rule that seems better: downvote posts that are either poorly argued or argue for a position that is very stupid. In order to think there are no other planets or no tectonic plates, one must be very dumb—to deny orthogonality thesis, one need not be. Scott Aaronson is by no means dumb.
I don’t think you or Scott is dumb, but arguments people make don’t inherit their intellect.
And who gets to decide the cutoff for “very dumb”? Currently the community does. Your proposal for downvote poorly argued or argue for a position that is very stupid is already the policy. People aren’t trying to silence you. I recommend going to the Discord where I’m sure people will be happy to chat with you at length about the post topic and these comment sub-topics. I can’t promise I’ll be responding more here.
I guess if the belief is that the orthogonality thesis is totally trivial—anyone with half a brain can recognize that it is true—then it would make sense to downvote. That seems obviously wrong though.
You know you wrote 10+10=21?
Haha, fixed.
And indeed your point is that it [The Orthogonality Thesis] is not obviously true, not that it is obviously false. So if I were to downvote your post because I thought your argument was silly or obviously wrong, that would be equivalent to stating that it is obviously true. This couldn’t of course be the case, I doubt that anything in the Sequences was written down because the author considered those things to be obviously true.
Not all things that are obviously true once stated are immediate to think of.
I didn’t vote on this post either way. I think it’s making mistakes, but I appreciate the effort being put in to trying to understand reality. I’m quite sympathetic because it reminds me of many mistakes I made in the past, and I think it’s quite valuable to be able to make mistakes in public and have people politely point out where they think you’re confused or misunderstanding things.
As to voting on Less Wrong, the stated norm is to vote up the things you want to see more of, vote down the things you want to see less of. Unfortunately this often devolves into agree/disagree voting, and perhaps separate agree/disagree votes from see/don’t see votes, like we have on comments, would be useful on posts.
Something to be aware of with votes is that I (and I think many other people) vote based on where we think the total karma should be, not necessarily just “this is good” or “this is bad”. I actually voted this up even though I think it’s only ok, because I think the post is made me think and doesn’t deserve negative karma even though I think it’s wrong, but I would have voted it down if it had a very positive karma score.
Consider that if you don’t see the post make mistakes that would be both serious and obvious, this could be a fact about you, not a fact about the post.
As before, you are having problems getting altruistic (meaning not entirely egoistic) morality out of hedonism.
Sure. But that’s not the same thing as:
The Hedonist is motivated to care about tuesdays, because it gets them more of the utility they already care about. However, the individual who realises that distinctions between individuals are objected arbitrary isn’t automatically motivated to act on it, and is somewhat motivated not to, since altruistic morality tends to lose the altruist utility.
The two claims are basically about two kinds of rationality—the first is instrumental, the second epistemic.
Of course it’s the other way round—you need it to be true to support MR.
Same problem—the first is a kind of rationality that takes the UF as given, the second changes it.
But that doesn’t help because the gap between egoism and altruism is the whole problem.
Have you read Parfit on this? The FTI argument is in on what matters—the argument that we’d care about others if we were very rational is in reasons and persons. One could be a moral realist while denying that reason alone would make us moral—Sidgwick is a good example.
I’ve read RYC’s summary. Your summary doesn’t bring out that it’s an argument for altruism.
So, if one gets access to the knowledge about moral absolutes by being smart enough then one of the following is true :
average humans are smart enough to see the moral absolutes in the universe
average humans are not smart enough to see the moral absolutes
average humans are right on the line between smart enough and not smart enough
If average humans are smart enough, then we should also know how the moral absolutes are derived from the physics of the universe and all humans should agree on them, including psychopaths. This seems false. Humans do not all agree.
If humans are not smart enough then it’s just an implausible coincidence that your values are the ones the SuperAGI will know are true. How do you know that you aren’t wrong about the objective reality of morality?
If humans are right on the line between smart enough and not smart enough, isn’t it an implausible coincidence that’s the case?
I am sympathetic to this argument, though I’m less credent than you in moral realism (I still assign the most credence to it out of all meta-ethical theories and think it’s what we should act on). My main worry is that an AI system won’t have access to the moral facts, because it won’t be able to experience pleasure and suffering at all. And like you, I’m not fully credent in moral realism or the realist’s wager, which means that even if an AI system were to be sentient, there’s still a risk that it’s amoral.
I address this worry in the section titled “But the AI won’t be conscious”