Arguments against the Orthogonality Thesis
The orthogonality thesis (formulated by Nick Bostrom in his article Superintelligent Will, 2011), states basically that an artificial intelligence can have any combination of intelligence level and goal. This article will focus on this simple question, and will only deal with the practical implementation issues at the end, that would need to be part of its full refutation according to Stuart Armstrong.
Meta-ethics
The orthogonality thesis is based on a variation of ethical values for different beings. This is either because the beings in question have some objective difference in their constitution that associates them to different values, or because they can choose what values they have.
That assumption of variation is arguably based on an analysis of humans. The problem with choosing values is obvious: making errors. Human beings are biologically and constitutionally very similar, and given this, if they objectively and rightfully differ in correct values, it is only in aesthetic preferences, by an existing biological difference. If they differ in other values, given that they are constitutionally similar, then the differing values could not be all correct at the same time, they would be differing due to error in choice.
Aesthetic preferences do vary for us, but they all connect ultimately to their satisfaction ― a specific aesthetic preference may satisfy only some people and not others. What is important is the satisfaction, or good feelings, that they produce, in the present or future (what might entail life preservation), which is basically the same thing to everyone. A given stimulus or occurrence is interpreted by the senses and it can produce good feelings, bad feelings, or neither, depending on the organism that receives it. This variation is besides the point, it is just an idiosyncrasy that could be either way: theoretically, any input (aesthetic preferences) could be associated with a certain output (good and bad feelings), or even no input at all, as in spontaneous satisfaction or wire-heading. In terms of output, good feelings and bad feelings always get positive and negative value, by definition.
Masochism is not a counter-example: masochists like pain only in very specific environments, associated with certain roleplaying fantasies, due to good feelings associated to it, or due to a relief of mental suffering that comes with the pain. Outside of these environments and fantasies, they are just as averse to pain as other people. They don’t regularly put their hands into boiling water to feel the pain, nobody does.
Good and bad feelings are directly felt as positive and desirable; negative and aversive, and this direct verification gives them the highest epistemological value. What is indirectly felt, such as the world around us, science, or physical theories, depends on the senses and could therefore be an illusion, such as being part of a virtual world. We could, theoretically, be living inside virtual worlds in an underlying alien universe with different physical laws and scientific facts, but we can nonetheless be sure of the reality of our conscious experiences in themselves, which are directly felt.
There is a difference between valid and invalid human values, which is the ground of justification for moral realism: valid values have an epistemological justification, while invalid ones are based on arbitrary choice or intuition. The epistemological justification of valid values occurs by that part of our experiences which has a direct certainty, as opposed to indirect: conscious experiences in themselves. Likewise, only conscious beings can be said to be ethically relevant in themselves, while what goes on in the hot magma at the core of the earth, or in a random rock in Pluto, are not. Consciousness creates a subject of experience, which is required for direct ethical value. It is straightforward to conclude, therefore, that good conscious experiences constitute what is good, and bad conscious experiences constitute what is bad. Good and bad are what ethical value is about.
Good and bad feelings (or conscious experiences) are physical occurrences, and therefore objectively good and bad occurrences, and objective value. Other fictional values without epistemological (or logical) justification are therefore in another category, and simply constitute the error which comes from allowing free choice of one’s values for beings with a similar biological constitution.
Personal Identity
The existence of personal identities is purely an illusion that cannot be justified by argument, and clearly disintegrates upon deeper analysis (for why that is, see, e.g., this essay: Universal Identity, or for an introduction to the problem, see Less Wrong article The Anthropic Trilemma).
Different instances in time of a physical organism relate to it in the same way that any other physical organism in the universe does. There is no logical basis for privileging a physical organism’s own viewpoint, nor the satisfaction of their own values over that of other physical organisms, nor for assuming the preponderance of their own reasoning over those of other physical organisms of contextually comparable reasoning capacity.
Therefore, the argument of variation or orthogonality could, at best, assume that a superintelligent physical organism with complete understanding of these cognitively trivial philosophical matters would have to consider all viewpoints and valid preferences in their utility function, in a way much similar to coherent extrapolated volition (CEV), extrapolating the values for intelligence and removing errors, but taking account of the values of all sentient physical organisms: not only humans, but also animals, and possibly sentient machines and aliens. The only values that are validly generalizable among such widely differing sentient creatures are good and bad feelings (in the present or future).
Furthermore, a superintelligent physical organism with such understanding would have to give equal weight to the reasoning of other physical organisms of contextually comparable reasoning capacity (depending on the cognitive demands of the context, or problem, even some humans can reason perfectly well), if existent. In case of convergence, this would be a non-issue. In case of divergence, this would force an evaluation of reasons or argumentation, seeking a convergence or preponderance of argument.
Conclusions
Taking the orthogonality thesis to be merely the assumption of divergence of ethical values of superintelligent agents, but not a statement about the issues with practical implementation and tampering with or forcing them by non-superintelligent humans, then there are two fatal arguments against it, one on the side of meta-ethics (moral realism), and one on the side of personal identity (open/empty individualism, or universal identity).
Beings with general superintelligence should find these fundamental philosophical matters trivial (meta-ethics and personal identity), and understand them completely. They should take a non-privileged and objective viewpoint, accounting for all perspectives of physical subjects, and giving (a priori) similar consideration for the reasoning of all physical organisms of contextually comparable reasoning capacity.
Furthermore they would understand that the free variation of values, even in comparable causal chains of biologically similar organisms, comes from error, and that their extrapolation for intelligence would result in moral realism with good and bad feelings as the epistemologically justified and only valid direct values, from which all other indirectly or instrumentally valuable actions derive their indirect value. For instance, survival, which in a paradise can have positive value, coming from good feelings in the present and future, and which in a hell can have negative value, coming from bad feelings in the present and in the future.
Perhaps certain architectures or contexts involving beings with superintelligence, caused by beings without superintelligence and erratic behavior, could be forced to produce unethical results. This seems to be the most grave existential risk that we face, and would not come from beings with superintelligence themselves, but from human error. The orthogonality thesis is fundamentally mistaken in relation to beings with general superintelligence (surpassing all human cognitive capacities), but it might be practically realized by non-superintelligent human agents.
- 11 Mar 2013 12:45 UTC; 0 points) 's comment on Questions for Moral Realists by (
- 12 Mar 2013 17:39 UTC; -1 points) 's comment on Questions for Moral Realists by (
Hi Jonatas,
I am having a hard time understanding the argument. In general terms, I take you to be arguing that some kind of additive impartial total hedonistic utilitarianism is true, and would be discovered by, and motivating to, any “generally intelligent” reasoner. Is that right?
My rough guess at your argument, knowing that I am having difficulty following your meaning, is something like this:
Pleasure and pain are intrinsically motivating for brains or algorithms organized around them, indeed we use their role in motivation to pick them out in our language
All of the ways in which people’s seemingly normative attitudes, preferences, intuitions and desires differ can be reduced to vehicles of pleasure or pain.
Intelligent reflection about one’s own desires then leads to egoistic hedonism.
Our concepts of personal identity are hard to reconcile with physicalism, so intelligent beings would be eliminativist about personal identity, and conclude that egoism is untenable as there is no “ego.”
In the absence of the “egoistic” component, one is left with non-egoistic hedonism, which turns into total additive utilitarianism as one tries to satisfy some kind of aggregate of all desires, all of which are desires about pleasure/pain (see #2).
However, the piece as it is now is a bit too elliptical for me to follow: you make various points quickly without explaining the arguments for them, which makes it hard for me to be sure what you mean, or the reasons for believing it. I felt most in need of further explanation on the following passages:
There is a complex web of assumptions here, and it’s very hard for me to be clear what you mean, although I have some guesses.
What kind of value do you mean here? Impersonal ethical value? Impact on behavior? Different sorts of pleasurable and painful experience affect motivation and behavior differently, and motivation does not respond to pleasure or pain as such, but to some discounted transformation thereof. E.g. people will accept a pain 1 hour hence in exchange for a reward immediately when they would not take the reverse deal.
Does this apply to other directly felt moral intuitions, like anger or fairness? Later you say that our best theories show that personal identity is an illusion, despite our perception of continued existence over time, and so we would discard it. What distinguishes the two?
The flat assertion that only conscious experiences have value is opposed by the flat assertions of other philosophers that other things are of value. How is it exactly that increased working memory or speed of thought would change this?
How are good and bad feelings physical occurrences in a way that knowledge or health or equality or the existence of other outcomes that people desire are not?
Earlier you privileged pleasure as a value because it is directly experienced. But an organism directly experiences, and is conditioned or reinforced by its own pain or pleasure.
Error in what sense? If desires are mostly learned through reward and ranticipations of to reward, one can note when the resulting desires do not maximize some metric of personal pleasure or pain (e.g. to be remembered after one dies, or for equality). But why identify with the usual tendency of reinforcement learning rather than the actual attitudes and desires one has?
One could make a similar argument about evolution, claiming that any activity which does not maximize reproductive fitness is a mistake, even if desired or pleasurable. Or if one was created by one’s parents to achieve some particular end, one could say that it is an error relative to that end to pursue some other goal.
So what is the standard of error, and why be moved by it, rather than others?
Hi Carl,
Thank you for a thoughtful comment. I am not used to writing didactically, so forgive my excessive conciseness.
You understood my argument well, in the 5 points, with the detail that I define value as good and bad feelings rather than pleasure, happiness, suffering and pain. The former definition allows for subjective variation and universality, while the latter utilitarian definition is too narrow and anthropocentric, and could be contested on these grounds.
I mean ethical value, but not necessarily impact on behavior or motivation. Indeed, people do accept trades between good and bad feelings, and they can be biased in terms of motivation.
It does not apply in the same way to other moral intuitions, like anger or fairness. The latter are directly felt in some way, and in this sense they are real, but they also have a context related to the world that is indirectly felt and could be false. Anger, for instance, can be directly felt as a bad feeling, but its causation and subsequent behavioral motivation relate to the outside world, and are in another level of certainty (not as certain). Likewise, it could be said that whatever caused good or bad feelings (such as kissing a woman) is not universal and not as certain as the good feeling itself which was caused by it in a person, and was directly verified by them. This person doesn’t know if he is inside a Matrix virtual world and if the woman was really a woman or just computer data, but he knows that the kiss led to directly felt good feelings. The distinction is that one relates to the outside world, and another relates to itself.
Good question. The goodness and badness of feelings is directly felt as so, and is a datum of highest certainty about the world, while the goodness or badness of these other physical occurrences (which are indirectly felt) is not data, but inferences, which though generally trustworthy, need to be justified eventually by being connected to intrinsic values.
Indeed. However, in acting on the world, an organism has to assume a model about the world which they are going to trust as true, in order to act ethically. In this model of the world, in the world as it appears to us, the organism would consider the nature of personal identity and not privilege its own viewpoint. However, you have a reason that, strictly, one’s own experiences are more certain than those of others. The difference in this certainty could be thought of as the difference between direct conscious feelings and physical theories. Let’s say that the former get ascribed a certainty of 100%, while the latter get 95%. The organism might then put 5% more value to its own experiences, not fundamentally, but based on the solipsistic hypothesis that other people are zombies, or that they don’t really exist.
I meant in that case intrinsic values. But what you meant, for instance for equality, can be thought of instrumental values. Instrumental values are taken as heuristics or in decision theory as patterns of behavior that usually lead to intrinsic values. Indeed, in order to achieve direct or intrinsic value, the best way tends to be following instrumental values, such as working, learning, increasing longevity… I argue that the validity of these can be examined by the extent that they lead to direct value, being good and bad feelings, in a non-personal way.
OK, that is the interpretation I found less convincing. The bare axiomatic normative claim that all the desires and moral intuitions not concerned with pleasure as such are errors with respect to maximization of pleasure isn’t an argument for adopting that standard.
And given the admission that biological creatures can and do want things other than pleasure, have other moral intuitions and motivations, and the knowledge that we can and do make computer programs with preferences defined over some model of their environment that do not route through an equivalent of pleasure and pain, the connection from moral philosophy to empirical prediction is on shakier ground than the purely normative assertions.
But why? You seem to be just giving an axiom without any further basis, that others don’t accept.
Once one is valuing things in a model of the world, why stop at your particular axiom? And people do have reactions of approval to their mental models of an equal society, or a diversity of goods, or perfectionism, which are directly experienced.
You can say that you might pursue something vaguely like X, which people feel is morally good or obligatory as such, is instrumental in pursuit of Y. But that doesn’t change the pursuit of X, even in conflict with Y.
Carl, for the sake of readability lesswrong implements markdown and in particular the block quote feature. Place a “>” before the paragraph that is a quote.
The argument for adopting that standard was based on epistemological prevalence of the goodness and badness of good and bad feelings, while other hypothetical intrinsic values could be so only by much less certain inference. But I’d also argue that the nature of how the world is perceived necessitates conscious subjects, and reason that, in the lack of them, or in an universe eternally without consciousness, nothing could possibly matter ethically. Consciousness is therefore given special status, and good and bad relate to it.
Biological creatures indeed have other preferences, but I classify those in the error category, as Eliezer justifies in CEV. Their validity could be argued on a case by case basis, though. Machines could be made unconscious or without capacity for good and bad feelings, then they would need to infer the existence of these by seeing living organisms and their culture (in this case, their certainty would be similar to that of their world model), or possibly by being very intelligent and deducing it from scratch (if this be even possible), otherwise they might be morally anti-realist. In the lack of real values, I suppose, they would have no logical reason to act one way or another, considering meta-ethics.
I think that these values need to be justified somehow. I see them as instrumental values for their tendency to lead to the direct values of good feelings, which take a special status by being directly verified as good. Decision theory and practical ethics are very complex, and sometimes one would take an instrumentally valuable action even in detriment of a direct value, if the action be expected to give even more direct value in the future. For instance, one might spend a lot of time learning philosophical topics, even if it be in detriment of direct pleasure, if one sees it as likely to be important to the world, causing good feelings or preventing bad feelings in an unclear but potentially significant way.
Indeed, there is none. But nor is there any logical basis for not privileging a physical organism’s own viewpoint, and since most organisms evolve/are built to privilege themselves, this is not an argument that will make them change their opinion.
Yes there is: I am not objectively more important, so I should not hold the belief that I am, for the same reasons I should not adopt other arbitrary beliefs for which there is no evidence.
The lesswrongian counterargument seems to be that, while rational agents should not hold objectively indefensible beliefs, desires/values/preferences can be as arbitrary and subjective as you like..
However, that seems to me to assume orthogonality.
According to orthogonality, importance is a subjective belief, not a objective one. If instead you believe that moral values are objective, do you have evidence for this position?
The OT needs that assumption. It is not a free-floating truth, and so should not be argued from as if it were.
It is an open question philosophically. Which means that
1 it is more a of a question of arguing for it than finding empirical evidence
2 There is no unproblematic default position. You can’t argue “no evidence for objectivism, therefore subjectivism” because it is equally the case that there is no evidence of subjectivism. Just arguments on both sides.
Actually, I phrased that poorly—the OT does not need that assumption, it doesn’t use it at all. OT is true for extreme ideas such as AIXI and Godel machines, and if OT is false, then Oracle AI cannot be built. See http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/ for more details.
Yes, it’s a morass and a mess. But OT can be true even if (most variants of) moral realism are true. Though OT is a strong indication that a lot of the intuitions connected with strong moral realism are suspect.
As for burden of proof… Well, the debate is complex (it feels like a diseased debate, but I’m not yet sure about that). But, to simplify a bit, moral realists are asserting that certain moral facts are “objectively true” in a way they haven’t really defined (or if they have defined, their definitions are almost certainly false, such as being universally compelling moral facts for all types of minds).
So I’d say the burden of proof is very clearly on those asserting the existence of these special properties of moral facts. Especially since many people seem much more certain as to what some of the objective moral facts are, than why they are objective (always a bad sign) and often disagree with each other.
If I wanted to steal-man moral realism, I’d argue that the properties of utility functions demonstrate that one cannot have completely unconstrained preferences and still be consistent in ways that feel natural. UDT and other acausal decision theories put certain constraints on how you should interact with your copies, and maybe you can come up with some decent single way of negotiating between different agents with different preferences (we tried this for a long time, but couldn’t crack it). Therefore there is some sense in which some classes of preferences are better than others. Then a moral realist could squint really hard and and say “hey, we can continue this process, and refine this, and reduce the huge class of possible utilities to a much smaller set”.
If importance were objective, then a Clippie could realise that paperclips are unimporant. The OT then comes down to intrinsic moral motivation, ie whether a clippie could realise the importance without being online to act on it.
OT implies the possibility of oracle AI, but the falsehood of OT does not imply they falsehood of oracle .AI. If OT is false , then some or no combinations of goals and intelligence are possible. Oracle .AI could still fall in the set of limited combinations.
OT does not imply that MR is false, any more than MR implies OT is false. The intuitoveness of oracle .AI does not support the OT for reasons given above.
Moral realists are not obviously in need of a definition of objecti8ve truth, anymore than physicalists are. They may be in need of a epistemology to explain how it is arrived at, it’s justification.
It is uncontentious that physical and mathematical facts do not compel all minds. Objective truth is not unconditional compulsion.
Moral realists do not have to, and often do not, claim that there is anything special about the truth or justification of their claims: at least, you have the burden to justify the claim that moral realists have the special about notion of truth.
The fact that some people are more dogmatic about their moral beliefs than proper epistemology would allow is no argument against MR. Dogmatism, confirmation bias, is widespread. Much what has been believe by scientists and rationalists has been wrong. If you had been born 4000 years go, you would have little e evidence of objective truth mathematical or physical truth .
Your steel manning of MR is fair enough. (It would have helped to emphasize that high level principles, .such as”don’t annoy people” are more defensible than fine grained stuff like “don’t scrape your fingernails across black board”) It is not as fair as reading and commenting on an actual moral realist. (Lesswrong is Lesswrong)
You are possibly the first person in the world to do think that morality has something to do with your copies. (By definition, you cannot interact with your MWI counterparts)
Reducing the hhuge set of possibilities is not so f.ar away from Gurus CEV: nor is it so far away from utilitariamsim. I don’t think that either is obviously true and I don’t think either is obviously false. It’s an open question.
If importance were objective, then a Clippie could realise that paperclips are unimporant. The OT then comes down to intrinsic moral motivation, ie whether a clippie could realise the importance without being online to act on it.
OT implies the possibility of oracle AI, but the falsehood of OT does not imply they falsehood of oracle .AI. If OT is false , then some or no combinations of goals and intelligence are possible. Oracle .AI could still fall in the set of limited combinations.
OT does not imply that MR is false, any more than MR implies OT is false. The intuitoveness of oracle .AI does not support the OT for reasons given above.
Moral realists are not obviously in need of a definition of objecti8ve truth, anymore than physicalists are. They may be in need of a epistemology to explain how it is arrived at, it’s justification.
It is uncontentious that physical and mathematical facts do not compel all minds. Objective truth is not unconditional compulsion.
Moral realists do not have to, and often do not, claim that there is anything special about the truth or justification of their claims: at least, you have the burden to justify the claim that moral realists have the special about notion of truth.
The fact that some people are more dogmatic about their moral beliefs than proper epistemology would allow is no argument against MR. Dogmatism, confirmation bias, is widespread. Much what has been believe by scientists and rationalists has been wrong. If you had been born 4000 years go, you would have little e evidence of objective truth mathematical or physical truth .
Your steel manning of MR is fair enough. (It would have helped to emphasize that high level principles, .such as”don’t annoy people” are more defensible than fine grained stuff like “don’t scrape your fingernails across black board”) It is not as fair as reading and commenting on an actual moral realist. (Lesswrong is Lesswrong)
You are possibly the first person in the world to do think that morality has something to do with your copies. (By definition, you cannot interact with your MWI counterparts)
Reducing the huge set of possibilities is not so f.ar away from Gurus CEV: nor is it so far away from utilitariamsim. I don’t think that either is obviously true and I don’t think either is obviously false. It’s an open question.
The argument is that given an Oracle and an entity of limited intelligence that has goal G, we can construct a superintelligent being with goal G by having the limited intelligence ask the Oracle how to achieve G.
Negotiating with your copies is the much easier version of negotiating with other people.
But it still might not be possible, in which case the Oracle will not be of help. That scenario only removes difficulties due to limited intelligence on the builders part.
I don’t have any copies I can interact with, so how can it be easy?
I still don’t see the big problem with MR. In other conversations, people have put it to me that MR is impossilbe because it is impossible to completely satisfy everyone’s preferecnces. It is impossible to completely satisfy everyone’s preferecnces, but that is not soemthing MR requires. It is kind of obvious thar morality in genral requires compromises and sacrifices, since we see that happening all the time. in the real world.
In practical terms, it’s very hard to change the intuitive opinions of people on this, even after many philosophical arguments. Those statements of mine don’t touch the subject. For that the literature should be read, for instance the essay I wrote about it. But if we consider general superintelligences, then they could easily understand it and put it coherently into practice. It seems that this can be naturally expected, except perhaps in practice under some specific cases of human intervention.
Yet, as the eminent philosopher Jos Whedon observed, “Yeah… but [they] don’t care!”
Their motivation (or what they care about) should be in line with their rationality. This doesn’t happen with humans because we have evolutionarily selected and primitive motivations, coupled with a weak rationality, but should not happen with much more intelligent and designed (possibly self-modifying) agents. Logically, one should care about what one’s rationality tells.
Since we can’t built superintelligences straight off, we have to build self-improving AIs.
A rational self-improving AI has to be motivated to become more intelligent, rational, and so on.
So rational self-improving AIs won’t have arbitrary motivations. They will be motivated to value rationality in order to become more rational.
Valuing rationality means disvaluing bias and partiality.
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments (we don’t expect highly rational humans to say “that is a perfectly good argument, but I am going to just ignore it”).
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments for morality.
Therefore, a highly rational agent would no “just not care”. The only possible failure modes are:
1) Non existence of good rational arguments for morality (failure of objective moral cognitivism).
2) Failure of Intrinsic Motivation arising from their conceptual understanding of valid arguments for morality, ie they understand that X is good, that they should do X, and what “should” means, but none of that adds up to a motivation to do X.
I am also having a hard time understanding this argument, but skimming through it I don’t see anything that looks strong enough to defeat the orthogonality thesis, which I see as the claim that it should be possible to design minds in such a way that the part with the utility function is separate from the part which optimizes. This seems to me like a pretty reasonable claim about a certain class of algorithms, and I would expect an argument claiming that such algorithms cannot exist to involve substantially more math than what I see in this argument (namely, no math whatsoever).
It might possible OT design minds that are orthogonal. That sense of the OT is unable to support the claim that AI research is likely to result in dangerous unfriendliness unintentionally . No disaster follows from the fact that it is possible to make dangerous things.
Indeed the orthogonality thesis in that practical sense is not what this essay is about, as I explain in the first paragraph and concede in the last paragraph. This article addresses the assumed orthogonality between ethics and intelligence, particularly general superintelligence, based on considerations from meta-ethics and personal identity, and argues for convergence.
There seems to be surprisingly little argumentation in favor of this convergence, what is utterly surprising to me, given how clear and straightforward I take it to be, though requiring an understanding of meta-ethics and of personal identity which is rare. Eliezer has, at least in the past, stated that he had doubts regarding both philosophical topics, while I claim to understand them very well. These doubts should merit an examination of the matter I’m presenting.
It appears here from time to time. It tends to be considered a trivial error. (This is unlikely to change.)
I suppose it is unlikely to change, but the level of agreement is completely unreflective of any superior understanding or insight. What is rejected, over and over, is a straw-man version of ethical objectivism. And while lesswrongers are informed by their leaders that academic philosophy is dangerous mind-rot, that is unlikely to change.
An error often feels like a clear and straightforward solution from the inside. Have you read the posts surrounding No Universally Compelling Arguments?
I’ve read this one which makes it clear why NUCA is irrelevant: the people who belive in UCA are taking univeral to mean “all rational minds” not ” all minds”.
There are a number of versions of the OT floating around. That version is not impactive on (u)FAI. We already know it is possible to do dumb and dangerous things. The uFAI argument requires certain failure modes to be inevitable or likely even if the absence of malevolence and incompetence.
No offense: this is not written well enough for me to follow your arguments closely enough to respond to them. But, considering the outside view, what do you think are the chances you have actually proven moral realism in three paragraphs? If you’re trying to convince non-realists they’re wrong then substantially more work is necessary.
I tend to be a very concise writer, assuming a quick understanding from the reader, and I don’t perceive very well what is obvious and what isn’t to people. Thank you for the advice. Please point to specific parts that you would like further explaining or expanding, and I will provide it.
At the point where you called some values “errors” without defining their truth conditions I assumed this wasn’t going to be any good and stopped reading.
Being open to criticism is very important, and the bias to disvalue it should be resisted. Perhaps I defined the truth conditions later on (see below).
“There is a difference between valid and invalid human values, which is the ground of justification for moral realism: valid values have an epistemological justification, while invalid ones are based on arbitrary choice or intuition. The epistemological justification of valid values occurs by that part of our experiences which has a direct certainty, as opposed to indirect: conscious experiences in themselves.”
I find your texts here on ethics incomplete and poor (for instance, this one, it shows a lack of understanding of the topic and is naive). I dare you to defend and justify a value that cannot be reduced to good and bad feelings.
See here.
I read that and similar articles. I deliberately didn’t say pleasure or happiness, but “reduced to good and bad feelings”, including other feelings that might be deemed good, such as love, curiosity, self-esteem, meaningfulness..., and including the present and the future. The part about the future includes any instrumental actions in the present which be taken with the intention of obtaining good feelings in the future, for oneself or for others.
This should cover visiting Costa Rica, having good sex, and helping loved ones succeed, which are the examples given in that essay against the simple example of Nozick’s experience machine. The experience machine is intuitively deemed bad because it precludes acting in order to instrumentally increase good feelings in the future and prevent bad feelings of oneself or others, and because pleasure is not what good feelings are all about. It is a very narrow part of the whole spectrum of good experiences one can have, precluding many others mentioned, and this makes it aversive.
The part about wanting and liking has neurological interest and has been well researched. It is not relevant for this question, because values need not correspond with wanting, they can just correspond with liking. Immediate liking is value, wanting is often mistaken. We want things which are evolutionarily or culturally caused, but that are not good for us. Wanting is like an empty promise, while liking can be empirically and directly verified to be good.
Any valid values reduce to good and bad feelings, for oneself or for others, in the present or in the future. This can be said of survival, learning, working, loving, protecting, sight-seeing, etc.
I say it again, I dare Eliezer (or others) to defend and justify a value that cannot be reduced to good and bad feelings.
I want to know more about the future. I do not expect to make much use of the information, and the tiny good feeling I expect to get when I am proven right is far smaller than the good feelings I could get from other uses of my time. My defence for this value as legitimate is that I am quite capable of rational reasoning and hearing out any and all of your arguments, and yet I am also quite certain that neither you nor others will be able to persuade me to abandon it. No further justification or defence beyond that is necessary or possible, in my opinion.
Is the value found in the conscious experiences, which happen to correlate with the activities mentioned, or are the activities themselves valuable, because we happen to like them? If the former, Jonatas’ point should apply. If the latter, then anything can be a value, you just need to design a mind in order to like it. Am I the only one who is bothered by the fact that we could find value in anything if we follow the procedure outlined above?
How about we play a different “game”. Instead of starting with the arbitrary likings evolution has equipped us with, we could just ask what action-guiding principles produce a state of the world which is optimal for conscious beings, as beings with a first-person-perspective are the only entities for which states can objectively be good or bad. If we accept this axiome (or if we presuppose, even within error theory, a fundamental meta utility function stating something like “I terminally care about others”), we can reason about ethics in a much more elagant and non-arbitrary way.
I don’t know whether not experiencing joys in Brazil (or whatever activities humans tend to favor) is bad for a being blissed out in the experience machine; at least it doesn’t seem to me! What I do know for sure is that there’s something bad, i.e. worth preventing, in a consciousness-moment that wants its experiential content to be different.
What sorts of actions, both cognitive (inside the computer) and physical, would a robot have to take to make you think it valued some alien thing like maximizing the number of paperclips in the world? Robbing humans to build a paperclip factory, for example, or making a list of plans, ordered by how many paperclips they would make.
Why is it impossible to program a robot that would do this?
For example, “there is no logical basis for privileging a physical organism’s own viewpoint” is true under certain premises, but what goes wrong if we just build a robot that uses its own viewpoint anyways? Which parts of the program fail when it tries to selfishly maximize the number of paperclips?
Or if I were to make a suggestion rather than asking questions: Needs less cognitive language.
Indeed, a robot could be built that makes paperclips or pretty much anything. For instance, a paperclip assembling machine. That’s an issue of practical implementation and not what the essay has been about, as I mention in the first paragraph and concede in the last.
The issue I argued about is that generally superintelligent agents, on their own will, without certain outside pressures from non-superintelligent agents, would understand personal identity and meta-ethics, leading them to converge to the same values and ethics. This is for two reasons: (1) they would need to take a “God’s eye view” and value all perspectives besides their own, and (2) they would settle on moral realism, with the same values as good and bad feelings, in the present or future.
Well then, let’s increase the problem to where it’s meaningful, and take a look at that. What sort of cognitive and physical actions would make you think a robot is superintelligent? Discovery of new physics, modeling humans so precisely that it can predict us better than we can, making intricate plans that will work flawlessly?
What fails in the program when one tries to build a robot that takes both the paperclip-maximizing actions and superintelligent actions?
For general superintelligence, proving performance in all cognitive areas that surpasses the highest of any humans. This naturally includes philosophy, which is about the most essential type of reasoning.
It could have a narrow superintelligence, like a calculating machine, surpassing human cognitive abilities in some areas but not in others. If it had a general superintelligence, then it would not of its own do paperclip maximization as a goal, because this would be terribly stupid, philosophically.
My hope was to get you to support that claim in an inside-view way. Oh well.
Why it would not do paperclip (or random value) maximization as a goal is explained more at length in the article. There is more than one reason. We’re considering a generally superintelligent agent, assuming above-human philosophical capacity. In terms of personal identity, there is a lack of personal identities, so it would be rational to take an objective, impersonal view, taking account of values and reasonings of relevant different beings. In terms of meta-ethics, there is moral realism and values can be reduced to the quality of conscious experience, so it would have this as its goal. If one takes moral anti-realism to be true, at least for this type of agent we are considering, a lack of real values would be understood as a lack of real goals, and could lead to the tentative goal of seeking more knowledge in order to find a real goal, or having no reason to do anything in particular (this is still susceptible to the considerations from personal identity). I argue against moral anti-realism.
But how do they know what a Gods-eye view even is?
A “God’s-eye view”, as David Pearce says, is an impersonal view, an objective rather than subjective view, a view that does not privilege one personal perspective over another, but take the universe as a whole as its point of reference. This comes from the argued non-existence of personal identities. To check arguments on this, see this comment.
I find this post to be too low quality to support even itself, let alone stand up against the orthogonality thesis (on which I have no opinion). It needs a complete rewrite at best. Some (rather incomplete) notes are below.
Where do you include environmental and cultural influences?
This does not follow. Maybe you need to give some examples. What do you mean by “correct” and “error” here?
This is a contentious attempt to convert everything to hedons. People have multiple contradictory impulses, desires and motives which shape their actions, often not by “maximizing good feelings”.
Really? Been to the Youtube and other video sites lately?
This sounds like a pronouncement of absolute truth, not a description of one of many competing models. It is not clear that the “epistemological justification” is a good definition of the term “valid”.
This is wrong in so many ways, unless you define reality as “conscious experiences in themselves”, which is rather non-standard. In any case, unless you are a dualist, you can probably agree that your conscious experiences can be virtual as much as anything else.
Again, you use the term objective for feelings and conscious experiences, not something easily measured and agreed upon to be in any way objective, certainly no more than the “external world”
Uhh, that post sucked as well.
Kinda stopped reading after that, no point really. Please consider learning the material before writing about it next time. Maybe read a Sequence or two, can’t hurt, can it?
You make some good points about the post, but there’s no call for this:
Jonatas happens to be a rather successful philosophy student, who I think is quite well read in related topics, even if this post needs work. He’s also writing in a second language, which makes it harder to be clear.
Sorry, Carl, I was going by the post’s content, since I don’t know the OP personally. I trust your judgement of his skills in general, maybe you can teach him to write better and assess the quality of the output.
I was giving that background about the specific case in support of the general principle of avoiding making insulting statements, both to avoid poisoning the discourse, and because errors given offense that is rarely outweighed by any pros.
I am not sure what in the statement you quoted you found insulting. That I inferred that he wasn’t well-versed in the material, given the poor quality of the post in question, instead of politely asking if he had, in fact, read the relevant literature?
shminux, what would be a “virtual” conscious experience”? I think you’ll have a lot of work to do to show how the “raw feels” of conscious experience could exist at some level of computational abstraction. An alternative perspective is that the “program-resistant” phenomenal experiences undergone by our minds disclose the intrinsic nature of the stuff of the world—the signature of basement reality. Dualism? No, Strawsonian physicalism: http://en.wikipedia.org/wiki/Physicalism#Strawsonian_physicalism
Of course, I’m not remotely expecting you to agree here. Rather I’m just just pointing out there are counterarguments to computational platonism that mean Jonatas’ argument can’t simply be dismissed.
I don’t understand the difference between Dualism and Strawsonian physicalism. For example, if you adopt Eliezer’s timeless view (of which I’m not a fan), that the universe is written and we are the ink, or something like that, there is no need to talk about “phenomenal experiences undergone by our minds disclose the intrinsic nature of the stuff of the world”, whatever the heck it might mean.
shminux, Strawsonian physicalism may be false; but it is not dualism. Recall the title of Strawson’s controversial essay was “Realistic monism—why physicalism entails panpsychism” (Journal of Consciousness Studies 13 (10-11):3-31 (2006)) For an astute critique of Strawson, perhaps see William Seager’s “The ‘intrinsic nature’ argument for panpsychism” (Journal of Consciousness Studies 13 (10-11):129-145 (2006) http://philpapers.org/rec/SEATIN ) Once again, I’m not asking you to agree here. We just need to be wary of dismissing a philosophical position without understanding the arguments that motivate it.
Indeed, that’s a common pitfall, and I’m no stranger to it. So I decided to read the Strawson’s essay you mentioned. And there I came across this statement:
How is it obvious?
Further on:
So far so good.
The claim in bold is what I don’t get. Either it’s all physics and it can be studied as such, or it’s not, and you need something other than physics to describe “experiential phenomena”, which is dualism, including panpsychism. Maybe there is another alternative I’m missing here?
Anyway, I concede that I know little about philosophy, but this essay seems like an exercise in futility by a person who’d do well to go through some of the required reading on Luke’s list instead. For now, I have lost interest in Strawsonian’s confused musings.
shminux, it is indeed not obvious what is obvious. But most mainstream materialists would acknowledge that we have no idea what “breathes fire into the equations and makes there a world for us to describe.” Monistic materialists believe that this “fire” is nonexperiential; monistic idealists / Strawsonian physicalists believe the fire is experiential. Recall that key concepts in theoretical physics, notably a field (superstring, brane, etc), are defined purely mathematically [cf. “Maxwell’s theory is Maxwell’s equations”] What’s in question here is the very nature of the physical.
Now maybe you’d argue instead in favour of some kind of strong emergence; but if so, this puts paid to reductive physicalism and the ontological unity of science.
[ I could go on if you’re interested; but I get the impression your mind is made up(?) ]
Uh, no, I frequent this site because I enjoy learning new things. I don’t mind if my worldview changes in the process. For example, I used to be a naive physical realist before I thought about these issues, and now I’m more of an instrumentalist. Now, is your mind made up? Do you allow for a chance that your epistemology changes as a result of this exchange?
apologies shminux, I hadn’t intended to convey the impression I believed I was more open-minded than you in general; I was just gauging your level of interest here before plunging on. Instrumentalism? Well, certainly the price of adopting a realist interpretation of quantum mechanics is extraordinarily high, namely Everett’s multiverse. The price of also preserving reductive physicalism is high too. But if we do relax this constraint, then the alternatives seem ghastly. Thus David Chalmers explores, inconclusively, Strawsonian physicalism before opting for some kind of naturalistic dualism. To my mind, dualism is a counsel of despair.
I believe I lack context for most of your statements here, since none of them seem make sense to me.
As for my version of instrumentalism (not generally accepted on this forum), I do not postulate any kind of external/objective reality, and hence do not consider terms like “exist” very useful. I care about models accurately predicting future data inputs based on the past data inputs, without worrying where these inputs come from. In such a framework all QM interpretations making identical predictions are equivalent. I suspect this sounds “ghastly” to you.
While these vary, I don’t see legitimate values that could be affected by them. Could you provide examples of such values?
Imagine that two exact replicas of a person exist in different locations, exactly the same except for an antagonism in one of their values. Both could not be correct at the same time about that value. I mean error in the sense, for example, that Eliezer employs in Coherent Extrapolated Volition: that error that comes from insufficient intelligence in thinking about our values.
Except in the aforementioned sense or error, could you provide examples of legitimate values that don’t reduce to good and bad feelings?
I think that literature about masochism is of more evidence than youtube videos, that could be isolated incidents of people who are not regularly masochist. If you have evidence from those sites, I’d like to see it.
Even being virtual, or illusive, they would still be real occurrences, and real illusions, being directly felt. I mean that in the sense of Nick Bostrom’s simulation argument.
Perhaps it was not sufficiently explained, but check this introduction on Less Wrong, then, or the comment I made below about it:
http://lesswrong.com/lw/19d/the_anthropic_trilemma/
I read many sequences, understand them well, and assure you that, if this post seems not to make sense, then it is because it was not explained in sufficient length.
The two can’t be perfectly identical if they disagree. You have to additionally assume that the discrepancy is in the parts that reason about their values instead of the values themselves for the conclusion to hold.
What if I changed the causation chain in this example, and instead of having the antagonistic values caused by the identical agents themselves, I had myself inserted the antagonistic values in their memories, while I did their replication? I could have picked the antagonistic value from the mind of a different person, and put it into one of the replicas, complete with a small reasoning or justification in its memory.
They would both wake up, one with one value in their memory, and another with an antagonistic value. What would it be that would make one of them correct and not the other? Could both values be correct? The issue here is questioning if any values whatsoever can be validly held for similar beings, or if a good justification is needed. In CEV, Eliezer proposed that we can make errors about our values, and that they should be extrapolated for the reasonings we would make if we had higher intelligence.
This essay is… unclear, but it really sounds like you are limiting the definition of ‘intelligence’ to a large but limited set of somewhat human-like intelligences with a native capacity for sociability, which does not include most Yudkowskian FOOMing AIs.
I should have explained things much more at length. The intelligence in that context I use is general superintelligence, being defined as that which surpasses human intelligence in all domains. Why is a native capacity for sociability implied?
I actually read all the way through and found the broad argument quite understandable (although many of the smaller details were confusing). I also found it obviously wrong on many levels. The one I would consider most essential is that you say:
Assuming I understood correctly, you’re saying that because continuous personal identity isn’t a real thing, there’s no reason to favour one conscious being over another. But that doesn’t follow at all. Just because the “you” a year from now is only somewhat similar to the “you” now doesn’t mean you shouldn’t favour him over everyone else (and indeed there are good reasons for doing so). I wrote a longer comment along these lines in response to some doubts in a recent a discussion of a post about dissolving personal identity.
And it wouldn’t defeat the OT because you’d still have to prove you couldn’t have a utility function over e.g. causal continuity (note: you can have a utility function over causal continuity).
A certain machine could perhaps be programmed with an utility function over causal continuity, but a privileged stance for one’s own values wouldn’t be rational lacking a personal identity, in an objective “God’s eye view”, as David Pearce says. That would call at least for something like coherent extrapolated volition, at least including agents with contextually equivalent reasoning capacity. Note that I use “at least” twice, to accommodate your ethical views. More sensible would be to include not only humans, but all known sentient perspectives, because the ethical value(s) of subjects arguably depend more on sentience than on reasoning capacity.
I argue (in this article) that the you (consciousness) in one second bears little resemblance to the you in the next second.
I also explain why you can’t have partial identity in that paper, and that argues against the position you took (which is similar to that explained by philosopher David Lewis in his paper Survival and Identity).
I recommend reading, whether you agree with this essay or not. The advanced and tenable philosophical positions on this subject are two. Empty individualism, characterized by Derek Parfit in his book “Reasons and Persons”, and open individualism, for which there are better arguments, explained in 4 pages in my essay and more at length in Daniel Kolak’s book “I Am You: The Metaphysical Foundations for Global Ethics”.
For another interesting take on the subject here on Less Wrong, check Kaj Sotala’s An attempt to dissolve subjective expectation and personal identity.
I read Kaj Sotala’s post, as you may surmise from the fact that I was the one who first linked (to a comment on it) in the grandparent. I also skimmed your article, and it seems equivalent to the idea of considering algorithmic identity or humans as optimizations processes or what-have-you (not sure if there’s a specific term or post on it) that’s pretty mainstream on LW, and with which I at least partially sympathise.
However, this has nothing to do with my objection. Let me rephrase in more general and philosophical terms, I guess. As far as I can tell, somewhere in your post you purport to solve the is-out problem. However, I do not find that any such solution follows from anything you say.
We seem to be moving from personal identity to ethics. In ethics it is defined that good is what ought to be, and bad is what ought not to be. Ethics is about defining values (what is good and ought to be), and how to cause them.
Good and bad feelings are good and bad as direct data, being direct perceptions, and this quality they have is not an inference. Their good and bad quality is directly accessible by consciousness, as data with the highest epistemic certainty. Being data they are “is”, and being good and bad, under the above definition of ethics, they are “ought” too. This is a special status that only good and bad feelings have, and no other values do.
I’m not convinced by that (specifically that feelings can be sorted into bad and good in a neat way and that we can agree on which ones are more bad/good), however that is still not my point. Sorry, I thought I was being clear, but apparently not.
You claim that a general superintelligence ought to care about all sorts of consciousnesses because it is very very intelligent (and understands what good/bad feelings are and the illusion of personal identities and whatnot). Why? Why wouldn’t it only care about something like the stereotypical example of creating more paperclips?
What is defined as ethically good is by definition what ought to be done, at least rationally. Some agents, such as humans, often don’t act rationally, due to a conflict of reason with evolutionarily selected motivations, which have really their own evolutionary values in mind (e.g. have as many children as possible), not ours. This shouldn’t happen for much more intelligent agents, with stronger rationality (and possibly a capability to self-modify).
Then your argument is circular/tautological. You define a “rational” action as one that “does that which ethically good”, and then you suppose that a superintelligence must be very “rational”. However, this is not the conventional usage of “rational” in economics or decision theory (and not on Less Wrong). Also, by this definition, I would not necessarily wish to be “rational”, and the problem of making a superintelligence “rational” is exactly as hard, and basically equivalent to, making it “friendly”.
I’m not sure I’m using rational in that sense, I could substitute “being rational” with “using reason”, “thinking intelligently”, “making sense”, “being logical”, what seems to follow from being generally superintelligent. Ethics is the study of defining what ought to be done and how to achieve it, so it seems to follow from general superintelligence as well. The trickier part seems to be defining ethics. Humans often act with motivations which are not based on formal ethics, but ethics is like a formal elaboration of what one’s (or everyone’s) motivations and actions ought to be.
Hm, sorry, it’s looking increasingly difficult to reach a consensus on this, so I’m going to bow out after this post.
With that in mind, I’d like to say that what I have in mind when I say “an action is rational” is approximately “this action is the best one for achieving one’s goals” (approximately because that ignores practical considerations like the cost of figuring out which action this is exactly). I also personally believe that insofar as ethics is worth talking about at all, it is simply the study of what we socially consider to be convenient to term good, not the search for an absolute, universal good, since such a good (almost certainly) does not exist. As such, the claim that you should always act ethically is not very convincing in my worldview (it is basically equivalent to the claim that you should try to benefit society and is similarly differently persuasive for different people). Instead, each individual should satisfy her own goals, which may be completely umm… orthogonal… to whatever we decide to use for “ethics”. The class of agents that will indeed decide to care about the ethics we like seems like a tiny subset of all potential agents, as well as of all potential superintelligent agents (which is of course just a restatement of the thesis).
Consequently, to me, the idea that we should expect a superintelligence to figure out some absolute ethics (that probably don’t exist) and decide that it should adhere to them looks fanciful.
I see. I think that ethics could be taken as, even individually, the formal definition of one’s goals and how to reach them, although in the orthogonality thesis ethics is taken in a collective level. Since personal identities cannot be sustained by logic, the distinction between individual goals and societal goals becomes trivial, and both are mutually inclusive.
For the question of personal identity, another essay, that was posted on Less Wrong by Eliezer, is here:
http://lesswrong.com/lw/19d/the_anthropic_trilemma/
However, while this essay presents the issue, it admittedly does not solve it, and expresses doubt that it would be solved in this forum. The solution exists in philosophy, though. For example, in the first essay I linked to, in Daniel Kolak’s work “I Am You: The Metaphysical Foundations for Global Ethics”, or also, in a partial form, in Derek Parfit’s work “Reasons and Persons”.
The crux of the disagreement, I think, is in the way we understand the self-assessment of our experience. If consciousness is epiphenomenal or just a different level of description of a purely physical world, this self-assessment is entirely algorithmic and does not disclose anything real about the intrinsic nature of consciousness.
But consciousness is not epiphenomenal, and a purely computational account fails to bridge the explanatory gap. Somehow conscious experience can evaluate itself directly, which still remains a not well understood and peculiar fact about the universe. In addition, as I see it, this needs to be acknowledged to make more progress in understanding both ethics and the relationship between the physical world and consciousness.
Indeed, epiphenomenalism can seemingly be easily disproved by its implication that if it were true, then we wouldn’t be able to talk about our consciousness. As I said in the essay, though, consciousness is that of which we can be most certain of, by its directly accessible nature, and I would rather think that we are living in a virtual world under an universe with other, alien physical laws, than that consciousness itself is not real.
The problem with The orthogonality thesis is not so much that it’s wrong that it is misleading. It’s a special case of the idea that we will ultimately be able to create whatever we can imagine (because our brains are VR simulators—and because of Turing completeness). The problem with it is that what we can imagine and what evolution tends to produce are different things. Failing to account for that seems consistent with fear mongering about the future—a common marketing technique for these kinds of outfit. Sure enough, the paper goes on to talk about sinister dangers.
By the way, I’d like to say a public thanks to the idiot who down-rated my article and then went through every comment I posted in this forum and also rated it negatively, without replying to them either, or probably even reading them. Chimpanzee trophy to you.
And to those who negative rated this comment, I suppose that you agree with this dirty practice.