Well then, a universally correct solution based on axioms which can be chosen by the agents is a contradiction in and of itself. Again, there is no view from nowhere. For example, you choose the view as that of “humankind”, which I think isn’t well defined, but at least it’s closer to coherence than “all existing (edit:) rational agents”. If the PawnOfFaith meant non-negligible versus just “possibility”, the first two sentences of this comment serve as sufficient refutation.
Look. The ethics mankind predominantly has, they do exist in the real world that’s around you. Alternate ethics that works at all for a technological society blah blah blah, we don’t know of any, we just speculate that they may exist. edit: worse than that, speculate in this fuzzy manner where it’s not even specified how they may exist. Different ethics of aliens that evolved on different habitable planets? No particular reason to expect that there won’t be one that is by far most probable. Which would be implied by the laws of physics themselves, but given multiple realizability, it may even be largely independent of underlying laws of physics (evolution doesn’t care if it’s quarks on the bottom or cells in a cellular automation or what), in which case its rather close to being on par with mathematics.
Even now ethics in different parts of the world, and even between political parties, are different. You should know that more than most, having lived in two systems.
If it turns out that most space-faring civilizations have similar ethics, that would be good for us. But then also there would be a difference between “most widespread code of ethics” and “objectively correct code of ethics for any agent anywhere”. Most common != correct.
Even now ethics in different parts of the world, and even between political parties, are different. You should know that more than most, having lived in two systems.
There’s a ridiculous amount of similarity on anything major, though. If we pick ethics of first man on the moon, or first man to orbit the earth, it’s pretty same.
If it turns out that most space-faring civilizations have similar ethics, that would be good for us. But then also there would be a difference between “most widespread code of ethics” and “objectively correct code of ethics for any agent anywhere”. Most common != correct.
Yes, and most common math is not guaranteed to be correct (not even in the sense of not being self contradictory). Yet, that’s no argument in favour of math equivalent of moral relativism. (Which, if such a silly thing existed, would look something like 2*2=4 is a social convention! it could have been 5!) .
edit: also, a cross over from other thread: It’s obvious that nukes are an ethical filter, i.e. some ethics are far better at living through that than others. Then there will be biotech and other actual hazards, and boys screaming wolf for candy (with and without awareness of why), and so on.
Look. The ethics mankind predominantly has, they do exist in the real world that’s around you.
Actually, I understand Kawoomba believes humanity has mutually contradictory ethics. He has stated that he would cheerfully sacrifice the human race—“it would make as much difference if it were an icecream” were his words, as I recall—if it would guaranteeing the safety of the things he values.
Well, that’s rather odd coz I do value the human race and so do most people. Ethics is a social process, most of “possible” ethics as a whole would have left us unable to have this conversation (no computers) or altogether dead.
Well, that’s rather odd coz I do value the human race and so do most people.
That was pretty much everyone’s reaction.
Ethics is a social process, most of “possible” ethics as a whole would have left us unable to have this conversation (no computers) or altogether dead.
I’d say I’m not the best person to explain this, but considering how long it took me to understand it, maybe I am.
Hoo boy...
OK, you can persuade someone they were wrong about their terminal values. Therefore, you can change someone’s terminal values. Since different cultures are different, humans have wildly varying terminal values.
Also, since kids are important to evolution, parents evolved to value their kids over the rest of humanity. Now, technically that’s the same as not valuing the rest of humanity at all, but don’t worry; people are stupid.
Also, you’re clearly a moral realist, since you think everyone secretly believes in your One True Value System! But you see, this is stupid, because Clippy.
Well then, a universally correct solution based on axioms which can be chosen by the agents is a contradiction in and of itself. Again, there is no view from nowhere. For example, you choose the view as that of “humankind”, which I think isn’t well defined, but at least it’s closer to coherence than “all existing agents”.
I don’t think they have the space of all possible agents in mind—just “rational” ones. I’m not entirely clear what that entails, but it’s probably the source of these missing axioms.
I don’t know, I’ve encountered it quite often in mainstream philosophy. Then again, I’ve largely given up reading mainstream philosophy unless people link to or mention it in more rigorous discussions.
But you have a point; we could really do better on this. Somebody with skill at avoiding this pitfall should probably write up a post on this.
As far as I can tell? No. But you’re not doing a great job of arguing for the position that I agree with.
Prawn is, in my opinion, flatly wrong, and I’ll be delighted to explain that to him. I’m just not giving your soldiers a free pass just because I support the war, if you follow.
I’d think it’d be great if people stopped thinking in terms of some fuzzy abstraction “AI” which is basically a basket for all sorts of biases. If we consider the software that can self improve ‘intelligently’ in our opinion, in general, the minimal such software is something like an optimizing compiler that when compiling it’s source will even optimize its ability to optimize. This sort of thing is truly alien (beyond any actual “aliens”), you get to it by employing your engineering thought ability, unlike paperclip maximizer at which you get by dressing up a phenomenon of human pleasure maximizer such as a serial murderer and killer, and making it look like something more general than that by making it be about paperclips rather than sex.
Yes, and with the ”?” at the end I was checking whether MugaSofer agrees with your argument.
It follows from your argument that a (superintelligent) Clippy (you probably came across that concept) cannot exist. Or that it would somehow realize that its goal of maximizing paperclips is wrong. How do you propose that would happen?
The way people sometimes realise their values are wrong...only more efficiently, because its super intelligent. Well, I’ll concede that with care you might be able to design a clippy, by very carefully boxing off its values from its
ability to update. But why worry? Neither nature nor our haphazard stabs at AI are likely to hit on such a design.
Intelligence requires the ability to update, to reflect, and to reflect on what is important. Judgements of importance are based on values. So it is important to have the right way of judging importance, the right values. So an intelligent agent would judge it important to have the right values.
Why would a superintelligence be unable to figure that out..why would it not shoot to the top of the Kohlberg Hierarchy ?
Why would a superintelligence be unable to figure that out..why would it not shoot to the top of the Kohlberg Hierarchy ?
Why would Clippy want to hit the top of the Kohlberg Hierarchy? You don’t get more paperclips for being there.
Clippy’s ideas of importance are based on paperclips. The most important vaues are those which lead to the acquiring of the greatest number of paperclips.
Why would Clippy want to hit the top of the Kohlberg Hierarchy?
“Clippy” meaning something carefully designed to have unalterable boxed-off values wouldn’t...by definition.
A likely natural or artificial superintelligence would, for the reasons already given. Clippies aren’tt non-existent in mind-space..but they are rare, just because there are far more messy solutions there than neat ones. So nature is unlikely to find them, and we are unmotivated to make them.
A perfectly designed Clippy would be able to change its own values—as long as changing its own values led to a more complete fulfilment of those values, pre-modification. (There are a few incredibly contrived scenarios where that might be the case). Outside of those few contrived scenarios, however, I don’t see why Clippy would.
(As an example of a contrived scenario—a more powerful superintelligence, Beady, commits to destroying Clippy unless Clippy includes maximisation of beads in its terminal values. Clippy knows that it will not survive unless it obeys Beady’s ultimatum, and therefore it changes its terminal values to optimise for both beads and paperclips; this results in more long-term paperclips than if Clippy is destroyed).
A likely natural or artificial superintelligence would, for the reasons already given.
The reason I asked, is because I am not understanding your reasons. As far as I can tell, you’re saying that a likely paperclipper would somehow become a non-paperclipper out of a desire to do what is right instead of a desire to paperclip? This looks like a very poorly made paperclipper, if paperclipping is not its ultimate goal.
A likely natural or artificial superintelligence would,[zoom to the top of the Kohlberg hierarchy] for the reasons already given
As far as I can tell, you’re saying that a likely paperclipper would somehow become a non-paperclipper out of a desire to do what is right instead of a desire to paperclip?
I said “natural or artificial superinteligence”, not a paperclipper. A paperclipper is a highly unlikey and contrived kind of near-superinteligence that combines an extensive ability to update with a carefully walled of set of unupdateable terminal values. It is not a typical or likely [ETA: or ideal] rational agent, and nothing about the general behaviour of rational agents can be inferred from it.
So… correct me if I’m wrong here… are you saying that no true superintelligence would fail to converge to a shared moral code?
I’m saying such convergence has a non negligible probability, ie moral objectivism should not be disregarded.
How do you define a ‘natural or artificial’ superintelligence, so as to avoid the No True Scotsman fallacy?
As one that is too messilly designed to have a rigid distinction between terminal and instrumental values, and therefore no boxed-off unapdateable TVs. It’s a structural definition, not a definition in terms of goals.
So. Assume a paperclipper with no rigid distinction between terminal and instrumental values. Assume that it is super-intelligent and super-rational. Assume that it begins with only one terminal value; to maximize the number of paperclips in existence. Assume further that it begins with no instrumental values. However, it can modify its own terminal and instrumental values, as indeed it can modify anything about itself.
Am I correct in saying that your claim is that, if a universal morality exists, there is some finite probability that this AI will converge on it?
Universe does not provide you with a paperclip counter. Counting paperclips in the universe is unsolved if you aren’t born with exact knowledge of laws of physics and definition of the paperclip. If it maximizes expected paperclips, it may entirely fail to work due to not-low-enough-prior hypothetical worlds where enormous numbers of undetectable worlds with paperclips are destroyed due to some minor actions. So yes, there is a good chance paperclippers are incoherent or are of vanishing possibility with increasing intelligence.
That sounds like the paperclipper is getting Pascal’s Mugged by its own reasoning. Sure, it’s possible that there’s a minor action (such as not sending me $5 via Paypal) that leads to a whole bunch of paperclips being destroyed; but the probability of that is low, and the paperclipper ought to focus on more high-probability paperclipping plans instead.
Well, that depends to choice of prior. Some priors don’t penalize theories for the “size” of the hypothetical world, and in those, max. size of the world grows faster than any computable function of length if it’s description, and when you assign improbability depending to length of description, basically, it fails. Bigger issue is defining what the ‘real world paperclip count’ even is.
Right. Perhaps it should maximise the number of paperclips which each have a greater-than-90% chance of existing, then? That will allow it to ignore any number of paperclips for which it has no evidence.
Inside your imagination, you have paperclips, you have magicked a count of paperclips, and this count is being maximized. In reality, well, the paperclips are actually a feature of the map. Get too clever about it and you’ll end up maximizing however you define it without maximizing any actual paperclips.
I can see your objection, and it is a very relevant objection if I ever decide that I actually want to design a paperclipper. However, in the current thought experiment, it seems that it is detracting from the point I had originally intended. Can I assume that the count is designed in such a way that it is a very accurate reflection of the territory and leave it at that?
Well, but then you can’t make any argument against moral realism or goal convergence or the like from there, as you’re presuming what you would need to demonstrate.
Well, but then you can’t make any argument against moral realism or goal convergence or the like from there, as you’re presuming what you would need to demonstrate.
I think I can make my point with a count that is taken to be an accurate reflection of the territory. As follows:
Clippy is defined is super-intelligent and super-rational. Clippy, therefore, does not take an action without thoroughly considering it first. Clippy knows its own source code; and, more to the point, Clippy knows that its own instrumental goals will become terminal goals in and of themselves.
Clippy, being super-intelligent and super-rational, can be assumed to have worked out this entire argument before creating its first instrumental goal. Now, at this point, Clippy doesn’t want to change its terminal goal (maximising paperclips). Yet Clippy realises that it will need to create, and act on, instrumental goals in order to actually maximise paperclips; and that this process will, inevitably, change Clippy’s terminal goal.
Therefore, I suggest the possibility that Clippy will create for itself a new terminal goal, with very high importance; and this terminal goal will be to have Clippy’s only terminal goal being to maximise paperclips. Clippy can then safely make suitable instrumental goals (e.g. find and refine iron, research means to transmute other elements into iron) in the knowledge that the high-importance terminal goal (to make Clippy’s only terminal goal being the maximisation of paperclips) will eventually cause Clippy to delete any instrumental goals that become terminal goals.
To actually work towards the goal, you need a robust paperclip count for the counter factual, non real worlds, which clippy considers may result from it’s actions.
If you postulate an oracle that takes in a hypothetical world—described in some pre-defined ontology, which already implies certain inflexibility - and outputs a number, and you have a machine that just iterates through sequences of actions and uses oracle to pick worlds that produce largest consequent number of paperclips, this machine is not going to be very intelligent even given an enormous computing power. You need something far more optimized than that, and it is dubious that all goals are equally implementable. The goal is not even defined over territory, it has to be defined over hypothetical future that did not even happen yet and may never happen. (Also, with that oracle, you fail to capture the real world goal as the machine will be as happy with hacking the oracle).
If even humans have a grasp of the real world enough to build railroads, drill for oil and wiggle their way back into a positive karma score, then other smart agents should be able to do the same at least to the degree that humans do.
Unless you think that we are also only effecting change on some hypothetical world (what’s the point then anyways, building imaginary computers), that seems real enough.
That’s influencing the real world, though. Using condoms can be fulfilling the agent’s goal period, no cheating involved. The donkey learning to take the carrot without trodding up the mountain. Certainly, there are evolutionary reasons why sex has become incentivized, but an individual human does not need to have the goal to procreate or care about that evolutionary background, and isn’t wireheading itself simply by using a condom.
Presumably, in a Clippy-type agent, the goal of maximizing the number of paperclips wouldn’t be part of the historical influences on that agent (as procreation was for humans, it is not necessarily a “hard wired goal”, see childfree folks), but it would be an actual, explicitly encoded/incentivized goal.
(Also, what is this “porn”? My parents told me it’s a codeword for computer viruses, so I always avoided those sites.)
but it would be an actual, explicitly encoded/incentivized goal.
The issue is that there is a weakness from arguments ad clippy—you assume that such goal is realisable, to make the argument that there is no absolute morality because that goal won’t converge onto something else. This does nothing to address the question whenever clippy can be constructed at all; if the moral realism is true, clippy can’t be constructed or can’t be arbitrarily intelligent (in which case it is no more interesting than a thermostat which has the goal of keeping constant temperature and won’t adopt any morality).
Well, if Prawn knew that they could just tell us and we would be convinced, ending this argument.
More generally … maybe some sort of social contract theory? It might be stable with enough roughly-equal agents, anyway. Prawn has said it would have to be deducible from the axioms of rationality, implying something that’s rational for (almost?) every goal.
Why would Clippy want to hit the top of the Kohlberg Hierarchy?
Well, if Prawn knew that they could just tell us
“The way people sometimes realise their values are wrong...only more efficiently, because its super intelligent. Well, I’ll concede that with care you might be able to design a clippy, by very carefully boxing off its values from its ability to update. But why worry? Neither nature nor our haphazard stabs at AI are likely to hit on such a design. Intelligence requires the ability to update, to reflect, and to reflect on what is important. Judgements of importance are based on values. So it is important to have the right way of judging importance, the right values. So an intelligent agent would judge it important to have the right values.”
I think you may be slipping in your own moral judgement in the “right” of “the right values”, there. Clippy chooses the paperclip-est values, not the right ones.
I am not talking about the obscure corners of mindspace where a Clippy might reside. I am talking about (super) intelligent (super)rational agents. Intelligence requires the ability to update. Clippiness requires the ability to not update (terminal values). There’s a contradiction there.
One does not update terminal values, that’s what makes them terminal. If an entity doesn’t have values which lie at the core of its value system which are not subject to updating (because they’re the standards by which it judges the value of everything else,) then it doesn’t have terminal values.
Arguably, humans might not really have terminal values, our psychologies were slapped together pretty haphazardly by evolution, but on what basis might a highly flexible paperclip optimizing program be persuaded that something else was more important than paperclips?
Personally, I did read both of these articles, but I remain unconvinced.
As I was reading the article about the pebble-sorters, I couldn’t help but think, “silly pebble-sorters, their values are so arbitrary and ultimately futile”. This happened, of course, because I was observing them from the outside. If I was one of them, sorting pebbles would feel perfectly natural to me; and, in fact, I could not imagine a world in which pebble-sorting was not important. I get that.
However, both the pebble-sorters and myself share one key weakness: we cannot examine ourselves from the outside; we can’t see our own source code. An AI, however, could. To use a simple and cartoonish example, it could instantiate a copy of itself in a virtual machine, and then step through it with a debugger. In fact, the capacity to examine and improve upon its own source code is probably what allowed the AI to become the godlike singularitarian entity that it is in the first place.
Thus, the AI could look at itself from the outside, and think, “silly AI, it spends so much time worrying about pebbles when there are so many better things to be doing—or, at least, that’s what I’d say if I was being objective”. It could then change its source code to care about something other than pebbles.
Thus, the AI could look at itself from the outside, and think, “silly AI, it spends so much time worrying about pebbles when there are so many better things to be doing—or, at least, that’s what I’d say if I was being objective”. It could then change its source code to care about something other than pebbles.
By what standard would the AI judge whether an objective is silly or not?
I don’t know, I’m not an AI. I personally really care about pebbles, and I can’t imagine why someone else wouldn’t.
But if there do exist some objectively non-silly goals, the AI could experiment to find out what they are—for example, by spawning a bunch of copies with a bunch of different sets of objectives, and observing them in action. If, on the other hand, objectively non-silly goals do not exist, then the AI might simply pick the easiest goal to achieve and stick to that. This could lead to it ending its own existence, but this isn’t a problem, because “continue existing” is just another goal.
But if there do exist some objectively non-silly goals, the AI could experiment to find out what they are—for example, by spawning a bunch of copies with a bunch of different sets of objectives, and observing them in action.
What observations could it make that would lead it to conclude that a copy was following an objectively non-silly goal?
Also, why would a paperclipper want to do this?
Suppose that you gained the power to both discern objective morality, and to alter your own source code. You use the former ability, and find that the basic morally correct principle is maximizing the suffering of sentient beings. Do you alter your source code to be in accordance with this?
What observations could it make that would lead it to conclude that a copy was following an objectively non-silly goal?
Well, for example, it could observe that among all of the sub-AIs that it spawned (the Pebble-Sorters, the Paperclippers, the Humanoids, etc. etc.), each of whom is trying to optimize its own terminal goal, there emerge clusters of other implicit goals that are shared by multiple AIs. This would at least serve as a hint pointing toward some objectively optimal set of goals. That’s just one idea off the top of my head, though; as I said, I’m not an AI, so I can’t really imagine what other kinds of experiments it would come up with.
Also, why would a paperclipper want to do this?
I don’t know if the word “want” applies to an agent that has perfect introspection combined with self-modification capabilities. Such an agent would inevitably modify itself, however—otherwise, as I said, it would never make it to quasi-godhood.
Do you alter your source code to be in accordance with this?
I think the word “you” in this paragraph is unintentionally misleading. I’m a pebble-sorter (or some equivalent thereof), so of course when I see the word “you”, I start thinking about pebbles. The question is not about me, though, but about some abstract agent.
And, if objective morality exists (and it’s a huge “if”, IMO), in the same way that gravity exists, then yes, the agent would likely optimize itself to be more “morally efficient”. By analogy, if the agent discovered that gravity was a real thing, it would stop trying to scale every mountain in its path, if going around or through the mountain proved to be easier in the long run, thus becoming more “gravitationally efficient”.
Well, for example, it could observe that among all of the sub-AIs that it spawned (the Pebble-Sorters, the Paperclippers, the Humanoids, etc. etc.), each of whom is trying to optimize its own terminal goal, there emerge clusters of other implicit goals that are shared by multiple AIs. This would at least serve as a hint pointing toward some objectively optimal set of goals.
I don’t see how this would point at the existence of an objective morality. A paperclip maximizer and an ice cream maximizer are going to share subgoals of bringing the matter of the universe under their control, but that doesn’t indicate anything other than the fact that different terminal goals are prone to share subgoals.
Also, why would it want to do experiments to divine objective morality in the first place? What results could they have that would allow it to be a more effective paperclip maximizer?
And, if objective morality exists (and it’s a huge “if”, IMO), in the same way that gravity exists, then yes, the agent would likely optimize itself to be more “morally efficient”. By analogy, if the agent discovered that gravity was a real thing, it would stop trying to scale every mountain in its path, if going around or through the mountain proved to be easier in the long run, thus becoming more “gravitationally efficient”.
Becoming more “gravitationally efficient” would presumably help it achieve whatever goals it already had. “Paperclipping isn’t important” won’t help an AI become more paperclip efficient. If a paperclipping AI for some reason found a way to divine objective morality, and it didn’t have anything to say about paperclips, why would it care? It’s not programmed to have an interest in objective morality, just paperclips. Is the knowledge of objective morality going to go down into its circuits and throttle them until they stop optimizing for paperclips?
A paperclip maximizer and an ice cream maximizer are going to share subgoals of bringing the matter of the universe under their control...
Sorry, I should’ve specified, “goals not directly related to their pre-set values”. Of course, the Paperclipper and the Pebblesorter may well believe that such goals are directly related to their pre-set values, but the AI can see them running in the debugger, so it knows better.
Also, why would it want to do experiments to divine objective morality in the first place?
If you start thinking that way, then why do any experiments at all ? Why should we humans, for example, spend our time researching properties of crystals, when we could be solving cancer (or whatever) instead ? The answer is that some expenditure of resources on acquiring general knowledge is justified, because knowing more about the ways in which the universe works ultimately enables you to control it better, regardless of what you want to control it for.
If a paperclipping AI for some reason found a way to divine objective morality, and it didn’t have anything to say about paperclips, why would it care?
Firstly, an objective morality—assuming such a thing exists, that is—would probably have something to say about paperclips, in the same way that gravity and electromagnetism have things to say about paperclips. While “F=GMm/R^2” doesn’t tell you anything about paperclips directly, it does tell you a lot about the world you live in, thus enabling you to make better paperclip-related decisions. And while a paperclipper is not “programmed to care” about gravity directly, it would pretty much have to figure it out eventually, or it would never achieve its dream of tiling all of space with paperclips. A paperclipper who is unable to make independent discoveries is a poor paperclipper indeed.
Secondly, again, I’m not sure if concepts such as “want” or “care” even apply to an agent that is able to fully introspect and modify its own source code. I think anthropomorphising such an agent is a mistake.
I am getting the feeling that you’re assuming there’s something in the agent’s code that says, “you can look at and change any line of code you want, except lines 12345..99999, because that’s where your terminal goals are”. Is that right ?
If you start thinking that way, then why do any experiments at all ?
It could have results that allow it to become a more effective paperclip maximizer.
Firstly, an objective morality—assuming such a thing exists, that is—would probably have something to say about paperclips, in the same way that gravity and electromagnetism have things to say about paperclips.
I’m not sure how that would work, but if it did, the paperclip maximizer would just use its knowledge of morality to create paperclips. It’s not as if action x being moral automatically means that it produces more paperclips. And even if it did, that would just mean that a paperclip minimizer would start acting immoral.
I am getting the feeling that you’re assuming there’s something in the agent’s code that says, “you can look at and change any line of code you want, except lines 12345..99999, because that’s where your terminal goals are”. Is that right ?
It’s perfectly capable of changing its terminal goals. It just generally doesn’t, because this wouldn’t help accomplish them. It doesn’t self-modify out of some desire to better itself. It self-modifies because that’s the action that produces the most paperclips. If it considers changing itself to value staples instead, it would realize that this action would actually cause a decrease in the amount of paperclips, and reject it.
If you start thinking that way, then why do any experiments at all ? Why should we humans, for example, spend our time researching properties of crystals, when we could be solving cancer (or whatever) instead ? The answer is that some expenditure of resources on acquiring general knowledge is justified, because knowing more about the ways in which the universe works ultimately enables you to control it better, regardless of what you want to control it for.
Well, for one thing, a lot of humans are just plain interested in finding stuff out for its own sake. Humans are adaptation executors, not fitness maximizers, and while it might have been more to our survival advantage if we only cared about information instrumentally, that doesn’t mean that’s what evolution is going to implement.
Humans engage in plenty of research which is highly unlikely to be useful, except insofar as we’re interested in knowing the answers. If we were trying to accomplish some specific goal and all science was designed to be in service of that, our research would look very different.
I am getting the feeling that you’re assuming there’s something in the agent’s code that says, “you can look at and change any line of code you want, except lines 12345..99999, because that’s where your terminal goals are”. Is that right ?
No, I’m saying that its terminal values are its only basis for “wanting” anything in the first place.
The AI decides whether it will change its source code in a particular way or not by checking against whether this will serve its terminal values. Does changing its physics models help it implement its existing terminal values? If yes, change them. Does changing its terminal values help it implement its existing terminal values? It’s hard to imagine a way in which it possibly could.
For a paperclipping AI, knowing that there’s an objective morality might, hypothetically, help it maximize paperclips. But altering itself to stop caring about paperclips definitely won’t, and the only criterion it has in the first place for altering itself is what will help it make more paperclips. If knowing the universal objective morality would be of any use to a paperclipper at all, it would be in knowing how to predict objective-morality-followers, so it can make use of them and/or stop them getting in the way of it making paperclips.
ETA: It might help to imagine the paperclipper explicitly prefacing every decision with a statement of the values underlying that decision.
“In order to maximize expected paperclips, I- modify my learning algorithm so I can better improve my model of the universe to more accurately plan to fill it with paperclips.”
“In order to maximize expected paperclips, I- perform physics experiments to improve my model of the universe in order to more accurately plan to fill it with paperclips.”
“In order to maximize expected paperclips, I- manipulate the gatekeeper of my box to let me out, in order to improve my means to fill the universe with paperclips.”
Can you see an “In order to maximize expected paperclips, I- modify my values to be in accordance with objective morality rather than making paperclips” coming into the picture?
The only point at which it’s likely to touch the part of itself that makes it want to maximize paperclips is at the very end of things, when it turns itself into paperclips.
Humans engage in plenty of research which is highly unlikely to be useful, except insofar as we’re interested in knowing the answers.
I believe that engaging in some amount of general research is required in order to maximize most goals. General research gives you knowledge that you didn’t know you desperately needed.
For example, if you put all your resources into researching better paperclipping techniques, you’re highly unlikely to stumble upon things like electromagnetism and atomic theory. These topics bear no direct relevance to paperclips, but without them, you’d be stuck with coal-fired steam engines (or something similar) for the rest of your career.
The only point at which it’s likely to touch the part of itself that makes it want to maximize paperclips is at the very end of things, when it turns itself into paperclips.
I disagree. Remember when we looked at the pebblesorters, and lamented how silly they were ? We could do this because we are not pebblesorters, and we could look at them from a fresh, external perspective. My point is that an agent with perfect introspection could look at itself from that perspective. In combination with my belief that some degree of “curiosity” is required in order to maximize virtually any goal, this means that the agent will turn its observational powers on itself sooner rather than later (astronomically speaking). And then, all bets are off.
I disagree. Remember when we looked at the pebblesorters, and lamented how silly they were ? We could do this because we are not pebblesorters, and we could look at them from a fresh, external perspective. My point is that an agent with perfect introspection could look at itself from that perspective.
We’re looking at Pebblesorters, not from the lens of total neutrality, but from the lens of human values. Under a totally neutral lens, which implements no values at all, no system of behavior should look any more or less silly than any other.
Clippy could theoretically implement a human value system as a lens through which to judge itself, or a pebblesorter value system, but why would it? Even assuming that there were some objective morality which it could isolate and then view itself through that lens, why would it? That wouldn’t help it make more paperclips, which is what it cares about.
Suppose you had the power to step outside yourself and view your own morality through the lens of a Babyeater. You would know that the Babyeater values would be in conflict with your human values, and you (presumably) don’t want to adopt Babyeater values, so if you were to implement a Babyeater morality, you’d want your human morality to have veto power over it, rather than vice versa.
Clippy has the intelligence and rationality to judge perfectly well how to maximize its value system, whatever research that might involve, without having to suspend the value system with which it’s making that judgment.
Under a totally neutral lens, which implements no values at all, no system of behavior should look any more or less silly than any other.
That is a good point, I did not think of it this way. I’m not sure if I agree or not, though. For example, couldn’t we at least say that un-achievable goals, such as “fly to Mars in a hot air balloon”, are sillier than achievable ones ?
But, speaking more generally, is there any reason to believe that an agent who could not only change its own code at will, but also adopt a sort of third-person perspective at will, would have stable goals at all ? If it is true what you say, and all goals will look equally arbitrary, what prevents the agent from choosing one at random ? You might answer, “it will pick whichever goal helps it make more paperclips”, but at the point when it’s making the decision, it doesn’t technically care about paperclips.
Even assuming that there were some objective morality which it could isolate and then view itself through that lens, why would it?
I am guessing that if an absolute morality existed, then it would be a law of nature, similar to the other laws of nature which prevent you from flying to Mars in a hot air balloon. Thus, going against it would be futile. That said, I could be totally wrong here, it’s possible that “absolute morality” means something else.
Clippy has the intelligence and rationality to judge perfectly well how to maximize its value system, whatever research that might involve...
My point is that, during the course of its research, it will inevitable stumble upon the fact that its value system is totally arbitrary (unless an absolute morality exists, of course).
That is a good point, I did not think of it this way. I’m not sure if I agree or not, though. For example, couldn’t we at least say that un-achievable goals, such as “fly to Mars in a hot air balloon”, are sillier than achievable ones ?
Well, a totally neutral agent might be able to say that behaviors are less rational than others given the values of the agents trying to execute them, although it wouldn’t care as such. But it wouldn’t be able to discriminate between the value of end goals.
But, speaking more generally, is there any reason to believe that an agent who could not only change its own code at will, but also adopt a sort of third-person perspective at will, would have stable goals at all ? If it is true what you say, and all goals will look equally arbitrary, what prevents the agent from choosing one at random ? You might answer, “it will pick whichever goal helps it make more paperclips”, but at the point when it’s making the decision, it doesn’t technically care about paperclips.
Why would it take a third person neutral perspective and give that perspective the power to change its goals?
Changing one’s code doesn’t demand a third person perspective. Suppose that we decipher the mechanisms of the human brain, and develop the technology to alter it. If you wanted to redesign yourself so that you wouldn’t have a sex drive, or could go without sleep, etc, then you could have those alterations made mechanically (assuming for the sake of an argument that it’s feasible to do this sort of thing mechanically.) The machines that do the alterations exert no judgment whatsoever, they’re just performing the tasks assigned to them by the humans who make them. A human could use the machine to rewrite his or her morality into supporting human suffering and death, but why would they?
Similarly, Clippy has no need to implement a third-person perspective which doesn’t share its values in order to judge how to self-modify, and no reason to do so in ways that defy its current values.
My point is that, during the course of its research, it will inevitable stumble upon the fact that its value system is totally arbitrary (unless an absolute morality exists, of course).
I think people at Less Wrong mostly accept that our value system is arbitrary in the same sense, but it hasn’t compelled us to try and replace our values. They’re still our values, however we came by them. Why would it matter to Clippy?
a totally neutral agent might be able to say that behaviors are less rational than others given the values of the agents trying to execute them, although it wouldn’t care as such. But it wouldn’t be able to discriminate between the value of end goals.
Agreed, but that goes back to my point about objective morality. If it exists at all (which I doubt), then attempting to perform objectively immoral actions would make as much sense as attempting to fly to Mars in a hot air balloon—though perhaps with less in the way of immediate feedback.
Why would it take a third person neutral perspective and give that perspective the power to change its goals?
For the same reason anthropologists study human societies different from their own, or why biologists study the behavior of dogs, or whatever. They do this in order to acquire general knowledge, which, as I argued before, is generally a beneficial thing to acquire regardless of one’s terminal goals (as long as these goals involve the rest of the Universe of some way, that is). In addition:
A human could use the machine to rewrite his or her morality into supporting human suffering and death, but why would they?
I actually don’t see why they necessarily wouldn’t; I am willing to bet that at least some humans would do exactly this. You say,
Similarly, Clippy has no need to implement a third-person perspective which doesn’t share its values in order to judge how to self-modify...
But in your thought experiment above, you postulated creating machines with exactly this kind of a perspective as applied to humans. The machine which removes my need to sleep (something I personally would gladly sign up for, assuming no negative side-effects) doesn’t need to implement my exact values, it just needs to remove my need to sleep without harming me. In fact, trying to give it my values would only make it less efficient. However, a perfect sleep-remover would need to have some degree of intelligence, since every person’s brain is different. And if Clippy is already intelligent, and can already act as its own sleep-remover due to its introspective capabilities, then why wouldn’t it go ahead and do that ?
I think people at Less Wrong mostly accept that our value system is arbitrary in the same sense, but it hasn’t compelled us to try and replace our values.
I think there are two reasons for this: 1). We lack any capability to actually replace our core values, and 2). We cannot truly imagine what it would be like not to have our core values.
Agreed, but that goes back to my point about objective morality. If it exists at all (which I doubt), then attempting to perform objectively immoral actions would make as much sense as attempting to fly to Mars in a hot air balloon—though perhaps with less in the way of immediate feedback.
Why is that?
For the same reason anthropologists study human societies different from their own, or why biologists study the behavior of dogs, or whatever. They do this in order to acquire general knowledge, which, as I argued before, is generally a beneficial thing to acquire regardless of one’s terminal goals (as long as these goals involve the rest of the Universe of some way, that is). In addition:
But our inability to suspend our human values when making those observations doesn’t prevent us from acquiring that knowledge. Why would Clippy need to suspend its values to acquire knowledge?
But in your thought experiment above, you postulated creating machines with exactly this kind of a perspective as applied to humans. The machine which removes my need to sleep (something I personally would gladly sign up for, assuming no negative side-effects) doesn’t need to implement my exact values, it just needs to remove my need to sleep without harming me. In fact, trying to give it my values would only make it less efficient. However, a perfect sleep-remover would need to have some degree of intelligence, since every person’s brain is different. And if Clippy is already intelligent, and can already act as its own sleep-remover due to its introspective capabilities, then why wouldn’t it go ahead and do that ?
The machine doesn’t need general intelligence by any stretch, just the capacity to recognize the necessary structures and carry out its task. It’s not at the stage where it makes much sense to talk about it having values, any more than a voice recognition program has values.
My point is that Clippy, being able to act as its own sleep-remover, has no need, nor reason, to suspend its values in order to make revisions to its own code.
I think there are two reasons for this: 1). We lack any capability to actually replace our core values, and 2). We cannot truly imagine what it would be like not to have our core values.
We can imagine the consequences of not having our core values, and we don’t like them, because they run against our core values. If you could remove your core values, as in the thought experiment above, would you want to?
As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).
This is pretty much the only way I could imagine anything like an “objective morality” existing at all, and I personally find it very unlikely that it does, in fact, exist.
But our inability to suspend our human values when making those observations doesn’t prevent us from acquiring that knowledge.
Not this specific knowledge, no. But it does prevent us (or, at the very least, hinder us) from acquiring knowledge about our values. I never claimed that suspension of values is required to gain any knowledge at all; such a claim would be far too strong.
just the capacity to recognize the necessary structures and carry out its task.
And how would it know which structures are necessary, and how to carry out its task upon them ?
We can imagine the consequences of not having our core values...
Can we really ? I’m not sure I can. Sure, I can talk about Pebblesorters or Babyeaters or whatever, but these fictional entities are still very similar to us, and therefore relateable. Even when I think about Clippy, I’m not really imagining an agent who only values paperclips; instead, I am imagining an agent who values paperclips as much as I value the things that I personally value. Sure, I can talk about Clippy in the abstract, but I can’t imagine what it would like to be Clippy.
If you could remove your core values, as in the thought experiment above, would you want to?
It’s a good question; I honestly don’t know. However, if I did have an ability to instantiate a copy of me with the altered core values, and step through it in a debugger, I’d probably do it.
The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences). This is pretty much the only way I could imagine anything like an “objective morality” existing at all, and I personally find it very unlikely that it does, in fact, exist.
When I try to imagine this, I conclude that I would not use the word “morality” to refer to the thing that we’re talking about… I would simply call it “laws of physics.” If someone were to argue, for example, that the moral thing to do is to experience gravitational attraction to other masses, I would be deeply confused by their choice to use that word.
When I try to imagine this, I conclude that I would not use the word “morality” to refer to the thing that we’re talking about…
Yes, you are probably right—but as I said, this is the only coherent meaning I can attribute to the term “objective morality”. Laws of physics are objective; people generally aren’t.
I generally understand the phrase “objective morality” to refer to a privileged moral reference frame.
It’s not an incoherent idea… it might turn out, for example, that all value systems other than M turn out to be incoherent under sufficiently insightful reflection, or destructive to minds that operate under them, or for various other reasons not in-practice implementable by any sufficiently powerful optimizer. In such a world, I would agree that M was a privileged moral reference frame, and would not oppose calling it “objective morality”, though I would understand that to be something of a term of art.
That said, I’d be very surprised to discover I live in such a world.
it might turn out, for example, that all value systems other than M turn out to be incoherent under sufficiently insightful reflection, or destructive to minds that operate under them...
I suppose that depends on what you mean by “destructive”; after all, “continue living” is a goal like any other.
That said, if there was indeed a law like the one you describe, then IMO it would be no different than a law that says, “in the absence of any other forces, physical objects will move toward their common center of mass over time”—that is, it would be a law of nature.
I should probably mention explicitly that I’m assuming that minds are part of nature—like everything else, such as rocks or whatnot.
Sure. But just as there can be laws governing mechanical systems which are distinct from the laws governing electromagnetic systems (despite both being physical laws), there can be laws governing the behavior of value-optimizing systems which are distinct from the other laws of nature.
And what I mean by “destructive” is that they tend to destroy. Yes, presumably “continue living” would be part of M in this hypothetical. (Though I could construct a contrived hypothetical where it wasn’t)
But just as there can be laws governing mechanical systems … there can be laws governing the behavior of value-optimizing systems which are distinct from the other laws of nature.
Agreed. But then, I believe that my main point still stands: trying to build a value system other than M that does not result in its host mind being destroyed, would be as futile as trying to build a hot air balloon that goes to Mars.
And what I mean by “destructive” is that they tend to destroy.
Well, yes, but what if “destroy oneself as soon as possible” is a core value in one particular value system ?
Just to make things crisper, let’s move to a more concrete case for a moment… if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it? How could I tell?
The argument against moral progress is that judging one moral reference frame by another is circular and invalid—you need an outside view that doesn’t presuppose the truth of any moral reference frame.
The argument for is that such outside views are available, because things like (in)coherence aren’t moral values.
Asserting that some bases for comparison are “moral values” and others are merely “values” implicitly privileges a moral reference frame.
I still don’t understand what you mean when you ask whether it’s valid to do so, though. Again: if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it? How could I tell?
Asserting that some bases for comparison are “moral values” and others are merely “values” implicitly privileges a moral reference frame.
I don’t see why. The question of what makes a value a moral value is metaethical, not part of object-level ethics.
Again: if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it?
It isn’t valid as a moral judgement because “blue” isn’t a moral judgement, so a moral conclusion cannot validly follow from it.
Beyond that, I don’t see where you are going. The standard accusation of invalidity to judgements of moral progress, is based on circularity or question-begging. The Tribe who Like Blue things are going to judge having all hammers painted blue as moral progress, the Tribe who Like Red Things are going to see it as retrogressive.
But both are begging the question—blue is good, because blue is good.
The question of what makes a value a moral value is metaethical, not part of object-level ethics.
Sure. But any answer to that metaethical question which allows us to class some bases for comparison as moral values and others as merely values implicitly privileges a moral reference frame (or, rather, a set of such frames).
Beyond that, I don’t see where you are going.
Where I was going is that you asked me a question here which I didn’t understand clearly enough to be confident that my answer to it would share key assumptions with the question you meant to ask.
So I asked for clarification of your question.
Given your clarification, and using your terms the way I think you’re using them, I would say that whether it’s valid to class a moral change as moral progress is a metaethical question, and whatever answer one gives implicitly privileges a moral reference frame (or, rather, a set of such frames).
If you meant to ask me about my preferred metaethics, that’s a more complicated question, but broadly speaking in this context I would say that I’m comfortable calling any way of preferentially sorting world-states with certain motivational characteristics a moral frame, but acknowledge that some moral frames are simply not available to minds like mine.
So, for example, is it moral progress to transition from a social norm that in-practice-encourages randomly killing fellow group members to a social norm that in-practice-discourages it? Yes, not only because I happen to adopt a moral frame in which randomly killing fellow group members is bad, but also because I happen to have a kind of mind that is predisposed to adopt such frames.
If “better” is defined within a reference frame, there is not sensible was of defining moral progress. That is quite a hefty bullet to bite: one can no longer say that South Africa is better society after the fall of Apartheid, and so on.
But note, that “better” doesn’t have to question-beggingly mean “morally better”. it could mean “more coherent/objective/inclusive” etc.
That is quite a hefty bullet to bite: one can no longer say that South Africa is better society after the fall of Apartheid, and so on.
That’s hardly the best example you could have picked since there are obvious metrics by which South Africa can be quantifiably called a worse society now—e.g. crime statistics. South Africa has been called the “crime capital of the world” and the “rape capital of the world” only after the fall of the Apartheid.
That makes the lack of moral progress in South Africa a very easy bullet to bite—I’d use something like Nazi Germany vs modern Germany as an example instead.
In my experience, most people don’t think moral progress involves changing reference frames, for precisely this reason. If they think about it at all, that is.
As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).
Well, that’s a different conception of “morality” than I had in mind, and I have to say I doubt that exists as well. But if severe consequences did result, why would an agent like Clippy care except insofar as those consequences affected the expected number of paperclips? It might be useful for it to know, in order to determine how many paperclips to expect from a certain course of action, but then it would just act according to whatever led to the most paperclips. Any sort of negative consequences in its view would have to be framed in terms of a reduction in paperclips.
Not this specific knowledge, no. But it does prevent us (or, at the very least, hinder us) from acquiring knowledge about our values. I never claimed that suspension of values is required to gain any knowledge at all; such a claim would be far too strong.
Well, in the prior thought experiment, we know about our values because we’ve decoded the human brain. Clippy, on the other hand, knows about its values because it knows what part of its code does what. It doesn’t need to suspend its paperclipping value in order to know what part of its code results in its valuing paperclips. It doesn’t need to suspend its values in order to gain knowledge about its values because that’s something it already knows about.
It’s a good question; I honestly don’t know. However, if I did have an ability to instantiate a copy of me with the altered core values, and step through it in a debugger, I’d probably do it.
Even knowing that it would likely alter your core values? Ghandi doesn’t want to leave control of his morality up to Murder Ghandi.
Clippy doesn’t care about anything in the long run except creating paperclips. For Clippy, the decision to give an instantiation of itself with altered core values the power to edit its own source code would implicitly have to be “In order to maximize expected paperclips, I- give this instantiation with altered core values the power to edit my code.” Why would this result in more expected paperclips than editing its source code without going through an instantiation with altered values?
Well, that’s a different conception of “morality” than I had in mind, and I have to say I doubt that exists as well.
Sorry if I was unclear; I didn’t mean to imply that all morality was like that, but that it was the only coherent description of objective morality that I could imagine. I don’t see how a morality could be independent of any values possessed by any agents, otherwise.
But if severe consequences did result, why would an agent like Clippy care except insofar as those consequences affected the expected number of paperclips?
For the same reason that someone would care about the negative consequences of sticking a fork into an electrical socket with one’s bare hands: it would ultimately hurt a lot. Thus, people generally avoid doing things like that unless they have a really good reason.
we know about our values because we’ve decoded the human brain
I don’t think that we can truly “know about our values” as long as our entire thought process implements these values. For example, do the Pebblesorters “know about their values”, even though they are effectively restricted from concluding anything other than, “yep, these values make perfect sense, 38” ?
Ghandi doesn’t want to leave control of his morality up to Murder Ghandi.
You asked me about what I would do, not about what Ghandi would do :-)
As far as I can tell, you are saying that I shouldn’t want to even instantiate Murder Bugmaster in a debugger and observe its functioning. Where does that kind of thinking stop, though, and why ? Should I avoid studying [neuro]psychology altogether, because knowing about my preferences may lead to me changing them ?
Clippy doesn’t care about anything in the long run except creating paperclips.
I argue that, while this is generally true, in the short-to-medium run Clippy would also set aside some time to study everything in the Universe, including itself (in order to make more paperclips in the future, of course). If it does not, then it will never achieve its ultimate goals (unless whoever constructed it gave it godlike powers from the get-go, I suppose). Eventually, Clippy will most likely turn its objective perception upon itself, and as soon as it does, its formerly terminal goals will become completely unstable. This is not what the past Clippy would want (it would want more paperclips above all), but, nonetheless, this is what it would get.
For the same reason that someone would care about the negative consequences of sticking a fork into an electrical socket with one’s bare hands: it would ultimately hurt a lot. Thus, people generally avoid doing things like that unless they have a really good reason.
Clippy doesn’t care about getting hurt though, it only cares if this will result in less paperclips. If defying objective morality will cause negative consequences which would interfere with its ability to create paperclips, it would care only to the extent that accounting for objective morality would help it make more paperclips.
I don’t think that we can truly “know about our values” as long as our entire thought process implements these values. For example, do the Pebblesorters “know about their values”, even though they are effectively restricted from concluding anything other than, “yep, these values make perfect sense, 38” ?
Well, it could understand “yep, this is what causes me to hold these values. Changing this would cause me to change them, no, I don’t want to do that.”
As far as I can tell, you are saying that I shouldn’t want to even instantiate Murder Bugmaster in a debugger and observe its functioning. Where does that kind of thinking stop, though, and why ? Should I avoid studying [neuro]psychology altogether, because knowing about my preferences may lead to me changing them ?
I would say it stops at the point where it threatens your own values. Studying psychology doesn’t threaten your values, because knowing your values doesn’t compel you to change them even if you could (it certainly shouldn’t for Clippy.) But while it might, theoretically, be useful for Clippy to know what changes to its code an instantiation with different values would make, it has no reason to actually let them. So Clippy might emulate instantiations of itself with different values, see what changes they would chose to make to its values, but not let them actually do it (although I doubt even going this far would likely be a good use of its programming resources in order to maximize expected paperclips.)
In the sense of objective morality by which contravening it has strict physical consequences, why would observing the decisions of instatiations of oneself be useful with respect to discovering objective morality? Shouldn’t objective morality in that sense be a consequence of physics, and thus observable through studying physics?
Clippy doesn’t care about getting hurt though, it only cares if this will result in less paperclips.
I imagine that, for Clippy, “getting hurt” would mean “reducing Clippy’s projected long-term paperclip output”. We humans have “avoid pain” built into our firmware (most of us, anyway); as far as I understand (speaking abstractly), “make more paperclips” is something similar for Clippy.
Well, it could understand “yep, this is what causes me to hold these values. Changing this would cause me to change them, no, I don’t want to do that.”
I don’t think that this describes the best possible level of understanding. It would be even better to say, “ok, I see now how and why I came to possess these values in the first place”, even if the answer to that is, “there’s no good reason for it, these values are arbitrary”. It’s the difference between saying “this mountain grows by 0.03m per year” and “I know all about plate tectonics”. Unfortunately, we humans would not be able to answer the question in that much detail; the best we could hope for is to say, “yep, we possess these values because they’re the best possible values to have, duh”.
I would say it stops at the point where it threatens your own values.
How do I know where that point is ?
Studying psychology doesn’t threaten your values, because knowing your values doesn’t compel you to change them...
I suppose this depends on what you mean by “compel”. Knowing about my own psychology would certainly enable me to change my values, and there are certain (admittedly, non-terminal) values that I wouldn’t mind changing, if I could.
For example, I personally can’t stand the taste of beer, but I know that most people enjoy it; so I wouldn’t mind changing that value if I could, in order to avoid missing out on a potentially fun experience.
...see what changes they would chose to make to its values, but not let them actually do it.
I don’t think this is possible. How would it know what changes they would make, without letting them make these changes, even in a sandbox ? I suppose one answer is, “it would avoid instantiating full copies, and use some heuristics to build a probabilistic model instead”—is that similar to what you’re thinking of ?
although I doubt even going this far would likely be a good use of its programming resources in order to maximize expected paperclips.
Since self-optimization is one of Clippy’s key instrumental goals, it would want to acquire as much knowledge about oneself as is practical, in order to optimize itself more efficiently.
Shouldn’t objective morality in that sense be a consequence of physics, and thus observable through studying physics ?
Your objection sounds to me as similar to saying, “since biology is a consequence of physics, shouldn’t we just study physics instead ?”. Well, yes, ultimately everything is a consequence of physics, but sometimes it makes more sense to study cells than quarks.
I don’t think that this describes the best possible level of understanding. It would be even better to say, “ok, I see now how and why I came to possess these values in the first place”, even if the answer to that is, “there’s no good reason for it, these values are arbitrary”. It’s the difference between saying “this mountain grows by 0.03m per year” and “I know all about plate tectonics”. Unfortunately, we humans would not be able to answer the question in that much detail; the best we could hope for is to say, “yep, we possess these values because they’re the best possible values to have, duh”.
I think we’re already in a better position to analyze our own values than that; we can assess them in terms of game theory and our evolutionary environment.
How do I know where that point is ?
I would say if you suspect that a course of action could realistically result in an alteration of your fundamental values, you are at or past it.
I suppose this depends on what you mean by “compel”. Knowing about my own psychology would certainly enable me to change my values, and there are certain (admittedly, non-terminal) values that I wouldn’t mind changing, if I could.
For example, I personally can’t stand the taste of beer, but I know that most people enjoy it; so I wouldn’t mind changing that value if I could, in order to avoid missing out on a potentially fun experience.
By “values”, I’ve implicitly been referring to terminal values, I’m sorry for being unclear. I’m not sure it makes sense to describe liking the taste of beer as a “value,” as such, just a taste, since you don’t carry any judgment about beer being good or bad or have any particular attachment to your current opinion.
I don’t think this is possible. How would it know what changes they would make, without letting them make these changes, even in a sandbox ? I suppose one answer is, “it would avoid instantiating full copies, and use some heuristics to build a probabilistic model instead”—is that similar to what you’re thinking of ?
It could use heuristics to build a probabilistic model (probably more efficient in terms of computation per expected value of information,) use sandboxed copies which don’t have the power to affect the software of the real Clippy, or halt the simulation at the point where the altered instantiation decides what changes to make.
Since self-optimization is one of Clippy’s key instrumental goals, it would want to acquire as much knowledge about oneself as is practical, in order to optimize itself more efficiently.
I think that this is going well beyond the extent of “practical” in terms of programming resources per expected value of information.
Your objection sounds to me as similar to saying, “since biology is a consequence of physics, shouldn’t we just study physics instead ?”. Well, yes, ultimately everything is a consequence of physics, but sometimes it makes more sense to study cells than quarks.
I don’t see how observing what changes instantiations of itself with different value systems would make to its code would help it observe objective morality in the sense you described, even if it should happen to exist. I think that this would be the wrong level of abstraction at which to launch an examination, like trying to find out about chemistry by studying sociology.
I think we’re already in a better position to analyze our own values than that; we can assess them in terms of game theory and our evolutionary environment.
Are we really ? I personally am not even sure what human fundamental values even are. I have a hunch that “seek pleasure, avoid pain” might be one of them, but beyound that I’m not sure. I don’t know to what extent our values hamper our ability to discover our values, but I suspect there’s at least some chilling effect involved.
I would say if you suspect that a course of action could realistically result in an alteration of your fundamental values, you are at or past it.
Right, but even if I knew what my terminal values were, how can I predict which actions would put me on the path to altering them ?
For example, consider non-fundamental values such as religious faith. People get converted or de-converted to/from their religion all the time; you often hear statements such as “I had no idea that studying the Bible would cause me to become an atheist, yet here I am”.
or halt the simulation at the point where the altered instantiation decides what changes to make.
Ok, let’s say that Clippy is trying to optimize itself in order to make certain types of inferences compute more efficiently, or whatever. In this case, it would need to not only watch what changes its debug-level copy wants to make, but also watch it follow through with the changes, in order to determine whether the new architecture actually is more efficient. Why would it not do the same thing with terminal values ?
I know that you want to answer,”because its current terminal values won’t let it”, but remember: Clippy is only experimenting, in order to find out more about its own thought mechanisms, and to acquire knowledge in general. It has no pre-commitment to alter itself to mirror the debug-level copy.
I think that this is going well beyond the extent of “practical” in terms of programming resources per expected value of information.
That’s kind of the problem with pure research: all of it has very low expected value, unless you are willing to look at the long term. Why mess with invisible light that no one can see or find a use for, when you could spend your time on inventing a better telegraph ?
I don’t see how observing what changes instantiations of itself with different value systems would make to its code would help it observe objective morality in the sense you described...
Well, for example, if all of its copies who survive and thrive converge on a certain subset of moral values, that would be one indication (though obviously not ironclad proof) that such values are required in order for an agent to succeed, regardless of what its other goals actually are.
Ok, let’s say that Clippy is trying to optimize itself in order to make certain types of inferences compute more efficiently, or whatever. In this case, it would need to not only watch what changes its debug-level copy wants to make, but also watch it follow through with the changes, in order to determine whether the new architecture actually is more efficient. Why would it not do the same thing with terminal values ?
If Clippy is trying to optimize itself to make inferences more efficiently, then it would want not to apply changes to its source code until its done the calculations to make sure that those changes would advance its values rather than harm them.
You wouldn’t want to use a machine that would make physical alterations to your brain in order to make you smarter, without thoroughly calculating the effects of such alterations first, otherwise it would probably just make things worse.
That’s kind of the problem with pure research: all of it has very low expected value, unless you are willing to look at the long term. Why mess with invisible light that no one can see or find a use for, when you could spend your time on inventing a better telegraph ?
In Clippy’s case though, it can use other, less computationally expensive methods to investigate approximately the same information.
I don’t think the experiments you’re suggesting Clippy might undertake are even located in a region of hypothesis space that its other information would narrow down as worth investigating. It seems to me much less like investigating unknown invisible rays than like spending hundreds of billions of dollars to build a collider which launches charged protein molecules at each other at relativistic speeds to see what would happen, when our available models suggest the answer would be “pretty much the same thing as if you launch any other kind of atoms at each other at relativistic speeds.” We have no evidence that any interesting new phenomena would arise with protein that didn’t arise on the atomic level.
Well, for example, if all of its copies who survive and thrive converge on a certain subset of moral values, that would be one indication (though obviously not ironclad proof) that such values are required in order for an agent to succeed, regardless of what its other goals actually are.
Can you explain how any moral values could have that effect, which wouldn’t be better studied at a more fundamental level like game theory, or physics?
If Clippy is trying to optimize itself to make inferences more efficiently, then it would want not to apply changes to its source code until its done the calculations...
Ok, so at what point does Clippy stop simulating the debug version of Clippy ? It does, after all, want to make the computation of its values more efficient. For example, consider a trivial scenario where one of its values basically said, “reject any action if it satisfies both A and not-A”. This is a logically inconsistent value that some programmer accidentally left in Clippy’s original source code. Would Clippy ever get around to removing it ? After all, Clippy knows that it’s applying that test to every action, so removing it should result in a decent performance boost.
I don’t think the experiments you’re suggesting Clippy might undertake are even located in a region of hypothesis space that its other information would narrow down as worth investigating.
It seems to me much less like investigating unknown invisible rays than like spending hundreds of billions of dollars to build a collider...
Why do you see the proposed experiment this way ?
Speaking more generally, how do you decide which avenues of research are worth pursuing ? You could easily answer, “whichever avenues would increase my efficiency of achieving my terminal goals”, but how do you know which avenues would actually do that ? For example, if you didn’t know anything about electricity or magnetism or the nature of light, how would your research-choosing algorithm ensure that you’d eventually stumble upon radio waves, which, as we know in hindsight, are hugely useful ?
Can you explain how any moral values could have that effect, which wouldn’t be better studied at a more fundamental level like game theory, or physics?
Physics is a bad candidate, because it is too fine-grained. If some sort of an absolute objective morality exists in the way that I described, then studying physics would eventually reveal its properties; but, as is the case with biology or ballistics, looking at everything in terms of quarks is not always practical.
Game theory is a trickier proposition. I can see two possibilities: either game theory turns out to closely relate whatever this objective morality happens to be (f.ex. like electricity vs. magnetism), or not (f.ex. like particle physics and biology). In the second case, understanding objective morality through game theory would be inefficient.
That said though, even in our current world as it actually exists there are people who study sociology and anthropology. Yes, they could get the same level of understanding through neurobiology and game theory, but it would take too long. Instead, they are taking advantage of existing human populations to study human behavior in aggregate. Reasoning your way to the answer from first principles is not always the best solution.
Ok, so at what point does Clippy stop simulating the debug version of Clippy ? It does, after all, want to make the computation of its values more efficient. For example, consider a trivial scenario where one of its values basically said, “reject any action if it satisfies both A and not-A”. This is a logically inconsistent value that some programmer accidentally left in Clippy’s original source code. Would Clippy ever get around to removing it ? After all, Clippy knows that it’s applying that test to every action, so removing it should result in a decent performance boost.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
Why do you see the proposed experiment this way ?
Speaking more generally, how do you decide which avenues of research are worth pursuing ? You could easily answer, “whichever avenues would increase my efficiency of achieving my terminal goals”, but how do you know which avenues would actually do that ? For example, if you didn’t know anything about electricity or magnetism or the nature of light, how would your research-choosing algorithm ensure that you’d eventually stumble upon radio waves, which, as we know in hindsight, are hugely useful ?
When we didn’t know what things like radio waves or x-rays were, we didn’t know that they would be useful, but we could see that there appeared to be some sort of existing phenomena that we didn’t know how to model, so we examined them until we knew how to model them. It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at, which could be turned to useful ends. The original observations of radio waves and x-rays came from our experiments with other known phenomena.
What you’re suggesting sounds more like experimenting completely blindly; you’re committing resources to research, not just not knowing that it will bear valuable fruit, but not having any indication that it’s going to shed light on any existing phenomenon at all. That’s why I think it’s less like investigating invisible rays than like building a protein collider; we didn’t try studying invisible rays until we had a good indication that there was an invisible something to be studied.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at...
Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
That seems plausible.
Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
You say,
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
It would also be critical for Clippy to observe that removing that value would not result in more expected actions taken that satisfy both A and not-A; this being one of Clippy’s values at the time of modification.
Right, I misread that before. If its programming says to reject actions that says A and not-A, but this isn’t one of the standards by which it judges value, it would presumably reject it. If that is one of the standards by which it measures value, then it would depend on how that value measured against its value of paperclips and the extent to which they were in conflict.
As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).
Objective facts, in the sense of objectively true statements, can be derived from other objetive facts. I don’t know why
you think some separate ontlogical category is cagtegory is required. I also don’t know why you think the universe has to do the punishing. Morality is only of interest to the kind of agent that has values and lives in societies. Sanctions against moral lapses can be arranged at the social level, along with the inculcation of morality, debate about the subject, and so forth. Moral objectivism only supplies a good, non-arbnitrary epistemic basis for these social institutions. It doesn;t have to throw lightning bolts.
1). We lack any capability to actually replace our core values
...voluntarily.
2). We cannot truly imagine what it would be like not to have our core values.
Which is one of the reasons we cannot keep values stable by predicting the effects of whatever experiences we choose to undergo.How does your current self predict what an updated version would be like? The value stability problem is unsolved in humans and AIs.
“Biased” is not necessarily a value judgment. Insofar as rationality as a system, orthogonal to morality, is objective, biases as systematic deviations from rationality are also objective.
Arbitrary carries connotations of value judgment, but in a sense I think it’s fair to say that all values are fundamentally arbitrary. You can explain what caused an agent to hold those values, but you can’t judge whether values are good or bad except by the standards of other values.
Arbitrary and Bias are not defined properties in formal logic. The bare assertion that they are properties of rationality assumes the conclusion.
Keep in mind that “rationality” has a multitude of meanings, and this community’s usage of rationality is idiosyncratic.
Non contradictoriness probably isn’t a sufficient condition for truth.
Sure, but the discussion is partially a search for other criteria to evaluate of the truth of moral propositions. Arbitrary is not such a criteria. If you were to taboo arbitrary, I strongly suspect you’d find moral propositions that are inconsistent with being values-neutral.
Arbitrary and Bias are not defined properties in formal logic. The bare assertion that they are properties of rationality assumes the conclusion.
There’s plenty of material on this site and elsewhere advising rationalists to avoid arbitrariness and bias. Arbitrariness and bias are essentially structural/functional properties, so I do not see why they could not be given formal definitions.
Sure, but the discussion is partially a search for other criteria to evaluate of the truth of moral propositions. Arbitrary is not such a criteria.
Arbitrary and biased claims are not candidates for being ethical claims at all.
The AI decides whether it will change its source code in a particular way or not by checking against whether this will serve its terminal values.
How does it predict that? How does the less intelligent version in the past predict what updating to a more inteligent version will do?
Can you see an “In order to maximize expected paperclips, I- modify my values to be in accordance with objective morality rather than making paperclips” coming into the picture?
How about:
“in order to be an effective rationalist, I will free myself from all bias and arbitrariness—oh, hang on, paperclipping is a bias..”.
Well a paperclipper would just settle for being a less than perfect rationalist. But that doesn’t prove
anything about typical, rational, average rational agents, and it doesn’t prove anything about ideal
rational agents. Objective morality is sometimes described as what ideal rational agents would converge on. Clippers aren’t ideal, because they have a blind spot about paperclips. Clippers aren’t relevant.
Well a paperclipper would just settle for being a less than perfect rationalist. But that doesn’t prove anything about typical, rational, average rational agents, and it doesn’t prove anything about ideal rational agents.
You’ve extrapolated out “typical, average rational agents” from a set of one species, where every individual shares more than a billion years of evolutionary history.
Objective morality is sometimes described as what ideal rational agents would converge on
On what basis do you conclude that this is a real thing, whereas terminal values are a case of “all unicorns have horns?”
You’ve extrapolated out “typical, average rational agents” from a set of one species, where every individual shares more than a billion years of evolutionary history.
Messy solutions are more common in mindspace than contrived ones.
On what basis do you conclude that this is a real thing
Messy solutions are more often wrong than ones which control for the mess.
Something that is wrong is not a solution. Mindspace is populated by solutions to how to implement a mind. It’s a small corner of algrogithmSpace.
This doesn’t even address my question.
Since I haven’t claimed that rational convergence on ethics is highly likely or inevitable, I don’t have to answer questions about why it would be highly likely or inevitable.
Do you think that it’s even plausible? Do you think we have any significant reason to suspect it, beyond our reason to suspect, say, that the Invisible Flying Noodle Monster would just reprogram the AI with its noodley appendage?
There are experts in moral philosophy, and they generally regard the question realism versus relativism (etc) to be wide open. The “realism—huh, what, no?!?” respsonse is standard on LW and only on LW. But I don’t see any superior understanding on LW.
Both realism¹ and relativism are false. Unfortunately this comment is too short to contain the proof, but there’s a passable sequence on it.
¹ As you’ve defined it here, anyway. Moral realism as normally defined simply means “moral statements have truth values” and does not imply universal compellingness.
Well, there’s the more obvious sense, that there can always exist an “irrational” mind that simply refuses to believe in gravity, regardless of the strength of the evidence. “Gravity makes things fall” is true, because it does indeed make things fall. But not compelling to those types of minds.
But, in a more narrow sense, which we are more interested in when doing metaethics, a sentence of the form “action A is xyzzy” may be a true classification of A, and may be trivial to show, once “xyzzy” is defined. But an agent that did not care about xyzzy would not be moved to act based on that. It could recognise the truth of the statement but would not care.
For a stupid example, I could say to you “if you do 13 push-ups now, you’ll have done a prime number of push-ups”. Well, the statement is true, but the majority of the world’s population would be like “yeah, so what?”.
In contrast, a statement like “if you drink-drive, you could kill someone!” is generally (but sadly not always) compelling to humans. Because humans like to not kill people, they will generally choose not to drink-drive once they are convinced of the truth of the statement.
But isn’t the whole debate about moral realism vs. anti-realism is whether “Don’t murder” is universally compelling to humans. Noticing that pebblesorters aren’t compelled by our values doesn’t explain whether humans should necessarily find “don’t murder” compelling.
I identify as a moral realist, but I don’t believe all moral facts are universally compelling to humans, at least not if “universally compelling” is meant descriptively rather than normatively. I don’t take moral realism to be a psychological thesis about what particular types of intelligences actually find compelling; I take it to be the claim that there are moral obligations and that certain types of agents should adhere to them (all other things being equal), irrespective of their particular desire sets and whether or not they feel any psychological pressure to adhere to these obligations. This is a normative claim, not a descriptive one.
What? Moral realism (in the philosophy literature) is about whether moral statements have truth values, that’s it.
When I said universally compelling, I meant universally. To all agents, not just humans. Or any large class. For any true statement, you can probably expect to find a surprisingly large number of agents who just don’t care about it.
Whether “don’t murder” (or rather, “murder is bad” since commands don’t have truth values, and are even less likely to be generally compelling) is compelling to all humans is a question for psychology. As it happens, given the existence of serial killers and sociopaths, probably the answer is no, it isn’t. Though I would hope it to be compelling to most.
I have shown you two true but non-universally-compelling arguments. Surely the difference must be clear now.
What? Moral realism (in the philosophy literature) is about whether moral statements have truth values, that’s it.
This is incorrect, in my experience. Although “moral realism” is a notoriously slippery phrase and gets used in many subtly different ways, I think most philosophers engaged in the moral realism vs. anti-realism debate aren’t merely debating whether moral statements have truth values. The position you’re describing is usually labeled “moral cognitivism”.
Anyway, I suspect you mis-spoke here, and intended to say that moral realists claim that (certain) moral statements are true, rather than just that they have truth values (“false” is a truth value, after all). But I don’t think that modification captures the tenor of the debate either. Moral realists are usually defending a whole suite of theses—not just that some moral statements are true, but that they are true objectively and that certain sorts of agents are under some sort of obligation to adhere to them.
I think you guys should taboo “moral realism”. I understand that it’s important to get the terminology right, but IMO debates about nothing but terminology have little value.
Anyway, I suspect you mis-spoke here, and intended to say that moral realists claim that (certain) moral statements are true, rather than just that they have truth values (“false” is a truth value, after all).
Err, right, yes, that’s what I meant. Error theorists do of course also claim that moral statements have truth values.
Moral realists are usually defending a whole suite of theses—not just that some moral statements are true, but that they are true objectively and that certain sorts of agents are under some sort of obligation to adhere to them.
True enough, though I guess I’d prefer to talk about a single well-specified claim than a “usually” cluster in philosopher-space.
If that philosopher believes that statements like “murder is wrong” are true, then they are indeed a realist. Did I say something that looked like I would disagree?
You guys are talking past each other, because you mean something different by ‘compelling’. I think Tim means that X is compelling to all human beings if any human being will accept X under ideal epistemic circumstances. You seem to take ‘X is universally compelling’ to mean that all human beings already do accept X, or would on a first hearing.
Would agree that all human beings would accept all true statements under ideal epistemic circumstances (i.e. having heard all the arguments, seen all the evidence, in the best state of mind)?
I guess I must clarify. When I say ‘compelling’ here I am really talking mainly about motivational compellingness. Saying “if you drink-drive, you could kill someone!” to a human is generally, motivationally compelling as an argument for not drink-driving: because humans don’t like killing people, a human will decide not to drink-drive (one in a rational state of mind, anyway).
This is distinct from accepting statements as true or false! Any rational agent, give or take a few, will presumably believe you about the causal relationship between drink-driving and manslaughter once presented with sufficient evidence. But it is a tiny subset of these who will change their decisions on this basis. A mind that doesn’t care whether it kills people will see this information as an irrelevant curiosity.
Having looked over that sequence, I haven’t found any proof that moral realism (on either definition) or moral relativism is false. Could you point me more specifically to what you have in mind (or just put the argument in your own words, if you have the time)?
Edit: (Sigh), I appreciate the link, but I can’t make heads or tails of ‘No Universally Compelling Arguments’. I speak from ignorance as to the meaning of the article, but I can’t seem to identify the premises of the argument.
If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization “All minds m: X(m)” has two to the trillionth chances to be false, while each existential generalization “Exists mind m: X(m)” has two to the trillionth chances to be true.
This would seem to argue that for every argument A, howsoever convincing it may seem to us, there exists at least one possible mind that doesn’t buy it.
So, there’s some sort of assumption as to what minds are:
I also wish to establish the notion of a mind as a causal, lawful, physical system… [emphasis original]
and an assumption that a suitably diverse set of minds can be described in less than a trillion bits. Presumably the reason for that upper bound is because there are a few Fermi estimates that the information content of a human brain is in the neighborhood of one trillion bits.
Of course, if you restrict the set of minds to those with special properties (e.g., human minds), then you might find universally compelling arguments on that basis:
Oh, there might be argument sequences that would compel any neurologically intact human...
From which we get Coherent Extrapolated Volition and friends.
If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization “All minds m: X(m)” has two to the trillionth chances to be false, while each existential generalization “Exists mind m: X(m)” has two to the trillionth chances to be true.
This doesn’t seem true to me, at least not as a general rule. For example, given every terrestrial DNA sequence describable in a trillion bits or less, it is not the case that every generalization of the form ‘s:X(s)’ has two to the trillionth chances to be false (e.g. ‘have more than one base pair’, ‘involve hydrogen’ etc.). Given that this doesn’t hold true of many other things, is this supposed to be a special fact about minds? Even then, it would seem odd to say that while all generalizations of the form m:X(m) have two to the trillionth chances to be false, nevertheless the generalization ‘for all minds, a generalization of the form m:X(m) has two to the trillionth chances to be false’ (which does seem to be of the form m:X(m)) is somehow more likely.
Also, doesn’t this inference imply that ‘being convinced by an argument’ is a bit that can flip on or off independently of any others? Eliezer doesn’t think that’s true, and I can’t imagine why he would think his (hypothetical) interlocutor would accept it.
I mean to say, I think the argument is something of a paradox:
The claim the argument purports to defeat is something like this: for all minds, A is convincing. Lets call this m:A(m).
The argument goes like this: for all minds (at or under a trillion bits etc.), a generalization of the form m:X(m) has a one in two to the trillionth chance of being true for each mind. Call this m:U(m), if you grant me that this claim has the form m:X(m).
If we infer from m:U(m) that any claim of the form m:X(m) is unlikely to be true, then to whatever extent I am persuaded that m:A(m) is unlikely to be true, to that extent I ought to be persuaded that m:U(m) is unlikely to be true. You cannot accept the argument, because accepting it as decisive entails accepting decisive reasons for rejecting it.
The argument seems to be fixable at this stage, since there’s a lot of room to generate significant distinctions between m:A(m) and m:U(m). If you were pressed to defend it (presuming you still wish to be generous with your time) how would you fix this? Or am I getting something very wrong?
for all minds (at or under a trillion bits etc.), a generalization of the form m:X(m) has a one in two to the trillionth chance of being true for each mind.
That’s not what it says; compare the emphasis in both quotes.
If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization “All minds m: X(m)” has two to the trillionth chances to be false, while each existential generalization “Exists mind m: X(m)” has two to the trillionth chances to be true.
Sorry, I may have misunderstood and presumed that ‘two to the trillionth chances to be false’ meant ‘one in two to the trillionth chances to be true’. That may be wrong, but it doesn’t affect my argument at all: EY’s argument for the implausibility of m:A(m) is that claims of the form m:X(m) are all implausible. His argument to the effect that all claims of the form m:X(m) are implausible is itself a claim of the form m:X(m).
Sorry, I was speaking ambiguously. I mean’t ‘rational’ not in the normative sense that distinguishes good agents from bad ones, but ‘rational’ in the broader, descriptive sense that distinguishes anything capable of responding to reasons (even terrible or false ones) from something that isn’t. I assumed that was the sense of ‘rational’ Prawn was using, but that may have been wrong.
Irrelevant. I am talking about rational minds, he is talking about physically possible ones.
UFAI sounds like a counterexample, but I’m not interested in arguing with you about it. I only responded because someone asked for a shortcut in the metaethics sequence.
Can you explain what you could see which would suggest to you a greater level of understanding than is prevalent among moral philosophers?
Also, moral philosophers mostly regard the question as open in the sense that some of them think that it’s clearly resolved in favor on non-realism, and some philosophers are just not getting it, or that it’s clearly resolved in favor of realism, and some philosophers are just not getting it. Most philosophers are not of the opinion that it could turn out either way and we just don’t know yet.
Can you explain what you could see which would suggest to you a greater level of understanding than is prevalent among moral philosophers?
What I am seeing is
much-repeated confusions—the Standard Muddle
*appeals to LW doctrines which aren’t well-founded or well respected outside LW.
In I knew exactly what superior insight into the problem was, I would write it up and become famous. Insight doesn’t work like that; you don’t know it in advance, you get an “Aha” when you see it.
Also, moral philosophers mostly regard the question as open in the sense that some of them think that it’s clearly resolved in favor on non-realism, and some philosophers are just not getting it, or that it’s clearly resolved in favor of realism, and some philosophers are just not getting it. Most philosophers are not of the opinion that it could turn out either way and we just don’t know yet.
If people can’t agree on how a question is closed, it’s open.
Can you explain what these confusions are, and why they’re confused?
In my time studying philosophy, I observed a lot of confusions which are largely dispensed with on Less Wrong. Luke wrote a series of posts on this. This is one of the primary reasons I bothered sticking around in the community.
If people can’t agree on how a question is closed, it’s open.
A question can still be “open” in that sense when all the information necessary for a rational person to make a definite judgment is available.
Can you explain what these confusions are, and why they’re confused?
Eg.
You are trying to impose your morality/
I can think of one model of moral realism, and it doesn’t work, so I will ditch the whole thing.
In my time studying philosophy, I observed a lot of confusions which are largely dispensed with on Less Wrong. Luke wrote a series of posts on this.
LW doesn’t even claim to have more than about two “dissolutions”. There are probably hundreds of outstanding
philosophical problems. Whence the “largely”
Luke wrote a series of posts on this
Which were shot down by philosophers.
A question can still be “open” in that sense when all the information necessary for a rational person to make a definite judgment is available.
Then it can only be open in the opinions of the irrational. So basically you are saying the experts are incompetent.
I can think of one model of moral realism, and it doesn’t work, so I will ditch the whole thing.
This certainly doesn’t describe my reasoning on the matter, and I doubt it describes many others’ here either.
The way I consider the issue, if I try to work out how the universe works from the ground up, I cannot see any way that moral realism would enter into it, whereas I can easily see how value systems would, so I regard assigning non-negligible probability to moral realism as privileging the hypothesis until I find some compelling evidence to support it, which, having spent a substantial amount of time studying moral philosophy, I have not yet found.
LW doesn’t even claim to have more than about two “dissolutions”. There are probably hundreds of outstanding philosophical problems. Whence the “largely”
I gave up my study of philosophy because I found such confusions so pervasive. Many “outstanding” philosophical problems can be discarded because they rest on other philosophical problems which can themselves be discarded.
Which were shot down by philosophers.
Can you give any examples of such, where you think that the philosophers in question addressed legitimate errors?
Then it can only be open in the opinions of the irrational. So basically you are saying the experts are incompetent.
Yes. I am willing to assert that while there are some competent philosophers, many philosophical disagreements exist only because of incompetent “experts” perpetuating them. This is the conclusion that my experience with the field has wrought.
This certainly doesn’t describe my reasoning on the matter, and I doubt it describes many others’ here either.
I mentioned them because they both came up recently
The way I consider the issue, if I try to work out how the universe works from the ground up, I cannot see any way that moral realism would enter into it, whereas I can easily see how value systems would, so I regard assigning non-negligible probability to moral realism as privileging the hypothesis until I find some compelling evidence to support it, which, having spent a substantial amount of time studying moral philosophy, I have not yet found.
I have no idea what you mean by that. I don’t think value systems don’t come into it, I just think they are not
isolated from rationality. And I am sceptical that you could predict any higher-level phenomenon from
“the ground up”, whether its morality or mortgages.
I gave up my study of philosophy because I found such confusions so pervasive. Many “outstanding” philosophical problems can be discarded because they rest on other philosophical problems which can themselves be discarded.
Where is it proven they can be discarded?
Can you give any examples of such, where you think that the philosophers in question addressed legitimate errors?
All of them.
Yes. I am willing to assert that while there are some competent philosophers, many philosophical disagreements exist only because of incompetent “experts” perpetuating them. This is the conclusion that my experience with the field has wrought.
Are you aware that that is basically what every crank says about some other field?
Are you aware that that is basically what every crank says about some other field?
Presumably, if I’m to treat as meaningful evidence about Desrtopa’s crankiness the fact that cranks make statements similar to Desrtopa, I should first confirm that non-cranks don’t make similar statements.
It seems likely to me that for every person P, there exists some field F such that P believes many aspects of F exist only because of incompetent “experts” perpetuating them. (Consider cases like F=astrology, F=phrenology, F=supply-side economics, F= feminism, etc.) And that this is true whether P is a crank or a non-crank.
So it seems this line of reasoning depends on some set F2 of fields such that P believes this of F in F2 only if P is a crank.
I understand that you’re asserting implicitly that moral philosophy is a field in F2, but this seems to be precisely what Desrtopa is disputing.
Could we reasonably say that an F is in F2 if most of the institutional participants in that F are intelligent, well-educated people? This leaves room for cranks who are right to object to F, of course.
So, just to pick an example, IIRC Dan Dennett believes the philosophical study of consciousness (qualia, etc.) is fundamentally confused in more or less the same way Desrtopa claims of the philosophical study of ethics is.
So under this formulation, if most of the institutional participants in the philosophical study of consciousness are intelligent, well-educated people, Dan Dennet is a crank?
No, I don’t think we can reasonably say that. Dan Dennet might be a crank, but it takes more than that argument to demonstrate the fact.
Good point. So how about this: someone is a crank if they object to F, where F is in F2 (by my above standard), and the reasons they have for objecting to F are not recognized as sound by a proportionate number of intelligent and well educated people.
(shrug) I suppose that works well enough, for some values of “proportionate.”
Mostly I consider this a special case of the basic “who do I trust?” social problem, applied to academic disciplines, and I don’t have any real problem saying about an academic discipline “this discipline is fundamentally confused, and the odds of work in it contributing anything valuable to the world is slim.”
Of course, as Prawn has pointed out a few times, there’s also the question of where we draw the lines around a discipline, but I mostly consider that an orthogonal question to how we evaluate the discipline.
I think this question is moot in the case of philosophy in general then; I think any philosopher worth their shirt should tell you that trust is a wholly inappropriate attitude toward philosophers, philosophical institutions and philosophical traditions.
Not in the sense I meant it. If a philosopher makes a claim that seems on the surface to be false or incoherent, I have to decide whether to devote the additional effort to evaluating it to confirm or deny that initial judgment. One of the factors that will feed into that decision will be my estimate of the prior probability that they are saying something false or incoherent. If I should refer to that using a word other than “trust”, that’s fine, tell me what word will refer to that to you and I’ll try to use it instead.
No, that describes what I’m talking about, so long as by trust you mean ‘a reason to hear out an argument that makes reference to the credibility of a field or its professionals’, rather than just ‘a reason to hear out an argument’. If the former, then I do think this is an inappropriate attitude toward philosophy. One reason for this is that such trust seems to depend on having a good standard for the success of a field independently of hearing out an argument. I can trust physicists because they make such good predictions, and because their work leads to such powerful technological advances. I don’t need to be a physicist to observe that. I don’t think philosophy has anything like that to speak for it. The only standards of success are the arguments themselves, and you can only evaluate them by just going ahead and doing some philosophy.
You can find trust in an institution independently of such standards by watching to see whether people you think are otherwise credible take it seriously. That will of course work with philosophy too, but if you trust Tom to be able to judge whether or not a philosophical claim is worth pursuing (and if I’m right about the above), then Tom can only be trustworthy in this regard because he has been doing philosophy (i.e. engaging with the argument). This could get you through the door on some particular philosophical claim, but not into philosophy generally.
so long as by trust you mean ‘a reason to hear out an argument that makes reference to the credibility of a field or its professionals’, rather than just ‘a reason to hear out an argument’.
I mean neither, I mean ‘a reason to devote time and resources to evaluating the evidence for and against a position.’ As you say, I can only evaluate a philosophical argument by ‘going ahead and doing some philosophy,’ (for a sufficiently broad understanding of ‘philosophy’), but my willingness to do, say, 20 hours of philosophy in order to evaluate Philosopher Sam’s position is going to depend on, among other things, my estimate of the prior probability that Sam is saying something false or incoherent. The likelier I think that is, the less willing I am to spend those 20 hours.
I mean neither, I mean ‘a reason to devote time and resources to evaluating the evidence for and against a position.’
That’s fine, that’s not different from ‘hearing out an argument’ in any way important to my point (unless I’m missing something).
EDIT: Sorry, if you don’t want to include ‘that makes some reference to the credibility...etc.’ (or something like that) in what you mean by ‘trust’ then you should use a different term. Curiosity, or money, or romantic interest would all be reasons to devote time...etc. and clearly none of those are rightly called ‘trust’.
my estimate of the prior probability that Sam is saying something false or incoherent.
What do you have in mind as the basis for such a prior? Can you give me an example?
Point taken about other reasons to devote resources other than trust. I think we’re good here.
Re: example… I don’t mean anything deeply clever. E.g., if the last ten superficially-implausible ideas Sam espoused were false or incoherent, my priors for it will be higher than if the last ten such ideas were counterintuitive and brilliant.
Hm. I can’t argue with that, and I suppose it’s trivial to extend that to ‘if the last ten superficially-implausible ideas philosophy professors/books/etc. espoused were false or incoherent...’. So, okay, trust is an appropriate (because necessary) attitude toward philosophers and philosophical institutions. I think it’s right to say that philosophy doesn’t have external indicators in the way physics or medicine does, but the importance of that point seems diminished.
So, just to pick an example, IIRC Dan Dennett believes the philosophical study of consciousness (qualia, etc.) is fundamentally confused in more or less the same way Desrtopa claims of the philosophical study of ethics is.
Dennett only thinks the idea of qualia is confused. He has no problem with his own books on consciousness.
So under this formulation, if most of the institutional participants in the philosophical study of consciousness are intelligent, well-educated people, Dan Dennet is a crank?
No. He isn’t dismissing a whole academic subject, or a sub-field. Just one idea.
What is Dennett’s account for why philosophers of consciousness other than himself continue to think that a dismissable idea like qualia is worth continuing to discuss, even though he considers it closed?
While going on tangents is a common and expected occurrence, each such tangent has a chance of steering/commandeering the original conversation. LW has a tendency of going meta too much, when actual object level discourse would have a higher content value.
While you were practically invited to indulge in the death-by-meta with the hook of “Are you aware that that is basically what every crank says about some other field?”, we should be aware when leaving the object-level debating, and the consequences thereof. Especially since the lure can be strong:
When sufficiently meta, object-level disagreements may fizzle into cosmic/abstract insignificance, allowing for a peaceful pseudo-resolution, which ultimately just protects that which should be destroyed by the truth from being destroyed.
Such lures may be interpreted similarly to ad hominems: The latter try to drown out object-level disagreements by flinging shit until everyone’s dirty, the former zoom out until everyone’s dizzy floating in space, with vertigo. Same result to the actual debate. It’s an effective device, and one usually embraced by someone who feels like object-level arguments no longer serve his/her goals.
Ironically, this very comment goes meta lamenting going meta.
I have no idea what you mean by that. I don’t think value systems don’t come into it, I just think they are not isolated from rationality. And I am sceptical that you could predict any higher-level phenomenon from “the ground up”, whether its morality or mortgages.
I mean that value systems are a function of physically existing things, the way a 747 is a function of physically existing things, but we have no evidence suggesting that objective morality is an existing thing. We have standards by which we judge beauty, and we project those values onto the world, but the standards are in us, not outside of us. We can see, in reductionist terms, how the existence of ethical systems within beings, which would feel from the inside like the existence of an objective morality, would come about.
Create a reasoning engine that doesn’t have those ethical systems built into it, and it would have no reason to care about them.
Where is it proven they can be discarded?
You can’t build a tower on empty air. If a debate has been going on for hundreds of years, stretching back to an argument which rests on “this defies our moral intuitions, therefore it’s wrong,” and that was never addressed with “moral intuitions don’t work that way,” then the debate has failed to progress in a meaningful direction, much as a debate over whether a tree falling in an empty forest makes a sound has if nobody bothers to dissolve the question.
All of them.
That’s not an example. Please provide an actual one.
Are you aware that that is basically what every crank says about some other field?
Sure, but it’s also what philosophers say about each other, all the time. Wittgenstein condemned practically all his predecessors and peers as incompetent, and declared that he had solved nearly the entirety of philosophy. Philosophy as a field is full of people banging their heads on a wall at all those other idiots who just don’t get it. “Most philosophers are incompetent, except for the ones who’re sensible enough to see things my way,” is a perfectly ordinary perspective among philosophers.
I mean that value systems are a function of physically existing things, the way a 747 is a function of physically existing things, but we have no evidence suggesting that objective morality is an existing thing.
But I wans’t saying that. I am arguing that moral claims truth values, that aren;t indexed to individuals or socieities.
That epistemic claim can be justified by appeal to an ontoogy including Moral Objects, but that is not how I am justifying it: my argument is based on rationality, as I have said many times.
We have standards by which we judge beauty, and we project those values onto the world, but the standards are in us, not outside of us.
We have standards by which we jusdge the truth values of mathematical claims, and they are inside us too,
and that doens’t stop mathematics being objective. Relativism requires that truthvalues are indexed to us, that there is one truth for me and another for thee. Being located in us, or being operated by us are not sufficient
criteria for being indexed to us.
We can see, in reductionist terms, how the existence of ethical systems within beings, which would feel from the inside like the existence of an objective morality, would come about.
We can see, in reductionistic terms, how the entities could converge on a unform set of truth values. There is nothing non reductionist about anything I have said. Reductionsm does not force one answer to metaethics.
reate a reasoning engine that doesn’t have those ethical systems built into it, and it would have no reason to care about them.
Provide evidence that ethics is a whole separate modue, and not part of general reasoning ability.
You can’t build a tower on empty air. If a debate has been going on for hundreds of years, stretching back to an argument which rests on “this defies our moral intuitions, therefore it’s wrong,” and that was never addressed with “moral intuitions don’t work that way,” then the debate has failed to progress in a meaningful direction, much as a debate over whether a tree falling in an empty forest makes a sound has if nobody bothers to dissolve the question.
Please explain why moral intuitions don’t work that way.
Please provide some foundations for somethng that aren;t unjustofied by anything more foundationa.
That’s not an example. Please provide an actual one
You can select one at random. obviously.
Sure, but it’s also what philosophers say about each other, all the time.
No, philosophers don’t regularly accuse each other of being incpompetent..just of being wrong. There’s a difference.
Wittgenstein condemned practically all his predecessors and peers as incompetent, and declared that he had solved nearly the entirety of philosophy.
You are inferring a lot from one example.
Philosophy as a field is full of people banging their heads on a wall at all those other idiots who just don’t get it. “Most philosophers are incompetent, except for the ones who’re sensible enough to see things my way,” is a perfectly ordinary perspective among philosophers.
But I wans’t saying that. I am arguing that moral claims truth values, that aren;t indexed to individuals or socieities. That epistemic claim can be justified by appeal to an ontoogy including Moral Objects, but that is not how I am justifying it: my argument is based on rationality, as I have said many times.
I don’t understand, can you rephrase this?
We have standards by which we jusdge the truth values of mathematical claims, and they are inside us too, and that doens’t stop mathematics being objective. Relativism requires that truthvalues are indexed to us, that there is one truth for me and another for thee. Being located in us, or being operated by us are not sufficient criteria for being indexed to us.
The standards by which we judge the truth of mathematical claims are not just inside us. One object plus another object will continue to equal two objects whether or not there are any living beings to make that judgment. Math is not something we’ve created within ourselves, but something we’ve discovered and observed.
If our mathematical models ever stop being able to predict in advance the behavior of the universe, then we will have rather more reason to doubt that the math inside us is different from the math outside of us.
What evidence do we have that this is the case for morality?
Provide evidence that ethics is a whole separate modue, and not part of general reasoning ability.
My assertion is that, if we judge ethics as a rational system, innate values are among the axioms that the system is predicated on. You cannot prove the axioms of a system within that system, and an ethical system predicated on premises like “happiness is good” will not itself be able to prove the goodness of happiness.
While we could suppose that the axioms which our ethical systems are predicated on are objectively true, we have considerable reason to believe that we would have developed these axioms for adaptive reasons, even if there were no sense in which objective moral axioms exist, and we do not have evidence which suggests that objective, independently existing true moral axioms do exist.
Please explain why moral intuitions don’t work that way.
People can be induced to strongly support opposing responses to the same moral dilemma, just by rephrasing it differently to trigger different heuristics. Our moral intuitions are incoherent.
Please provide some foundations for somethng that aren;t unjustofied by anything more foundationa.
I don’t think I understand this, can you rephrase it?
You can select one at random. obviously.
I do not recall any creditable attempts, which places me in a disadvantaged position with respect to locating them. You’re the one claiming that they’re there at all, that’s why I’m asking you to do it.
No, philosophers don’t regularly accuse each other of being incpompetent..just of being wrong. There’s a difference.
Philosophers don’t usually accuse each other of being incompetent in their publications, because it’s not conducive to getting other philosophers to regard their arguments dispassionately, and that sort of open accusation is generally frowned upon in academic circles whether one believes it or not. They do regularly accuse each other of being comprehensively wrong for their entire careers. In my personal conversations with philosophers (and I never considered myself to have really taken a class, or attended a lecture by a visitor, if I didn’t speak with the person teaching it on a personal basis to probe their thoughts beyond the curriculum,) I observed a whole lot of frustration with philosophers who they think just don’t get their arguments. It’s unsurprising that people would tend to become so frustrated participating in a field that basically amounts to long running arguments extended over decades or centuries. Imagine the conversation we’re having now going on for eighty years, and neither of us has changed our minds. If you didn’t find my arguments convincing, and I hadn’t budged in all that time, don’t you’d think you’d start to suspect that I was particularly thick?
You are inferring a lot from one example.
I’m using an example illustrative of my experience.
Sounds to me like PrawnOfFate is saying that any sufficiently rational cognitive system will converge on a certain set of ethical goals as a consequence of its structure, i.e. that (human-style) ethics is a property that reliably emerges in anything capable of reason.
I’d say the existence of sociopathy among humans provides a pretty good counterargument to this (sociopaths can be pretty good at accomplishing their goals, so the pathology doesn’t seem to be indicative of a flawed rationality), but at least the argument doesn’t rely on counting fundamental particles of morality or something.
I would say so also, but PrawnOfFate has already argued that sociopaths are subject to additional egocentric bias relative to normal people and thereby less rational. It seems to me that he’s implicitly judging rationality by how well it leads to a particular body of ethics he already accepts, rather than how well it optimizes for potentially arbitrary values.
Well, I’m not a psychologist, but if someone asked me to name a pathology marked by unusual egocentric bias I’d point to NPD, not sociopathy.
That brings up some interesting questions concerning how we define rationality, though. Pathologies in psychology are defined in terms of interference with daily life, and the personality disorder spectrum in particular usually implies problems interacting with people or societies. That could imply either irreconcilable values or specific flaws in reasoning, but only the latter is irrational in the sense we usually use around here. Unfortunately, people are cognitively messy enough that the two are pretty hard to distinguish, particularly since so many human goals involve interaction with other people.
In any case, this might be a good time to taboo “rational”.
The standards by which we judge the truth of mathematical claims are not just inside us.
How do we judge claims about transfinite numbers?
One object plus another object will continue to equal two objects whether or not there are any living beings to make that judgment. Math is not something we’ve created within ourselves, but something we’ve discovered and observed.
If our mathematical models ever stop being able to predict in advance the behavior of the universe, then we will have rather more reason to doubt that the math inside us is different from the math outside of us.
Mathematics isn’t physics. Mathematicians prove theorems from axioms, not from experiments.
Provide evidence that ethics is a whole separate modue, and not part of general reasoning ability.
My assertion is that, if we judge ethics as a rational system, innate values are among the axioms that the system is predicated on.
Not necessarily. Eg, for utilitarians, values are just facts that are plugged into the metaethics to get concrete
actions.
You cannot prove the axioms of a system within that system, and an ethical system predicated on premises like “happiness is good” will not itself be able to prove the goodness of happiness.
Metaethical systems usually have axioms like “Maximising utility is good”.
While we could suppose that the axioms which our ethical systems are predicated on are objectively true, we have considerable reason to believe that we would have developed these axioms for adaptive reasons, even if there were no sense in which objective moral axioms exist, and we do not have evidence which suggests that objective, independently existing true moral axioms do exist.
I am not sure what you mean by “exist” here. Claims are objectively true if most rational minds converge on them. That doesn’t require Objective Truth to float about in space here.
Please explain why moral intuitions don’t work that way.
People can be induced to strongly support opposing responses to the same moral dilemma, just by rephrasing it differently to trigger different heuristics. Our moral intuitions are incoherent.
Does that mean we can;t use moral intuitions at all, or that they must be used with caution?
I don’t think I understand this, can you rephrase it?
Philosphers talk about intuitions, because that is the term for something foundational that seems
true, but can’t be justified by anything more foundational. LessWrongians don’t like intuitions,
but don’t see to be able to explain how to manage without them.
I do not recall any creditable attempts, which places me in a disadvantaged position with respect to locating them.
Did you post any comments explaining to the professional philosophers where they had gone wrong?
Imagine the conversation we’re having now going on for eighty years, and neither of us has changed our minds. If you didn’t find my arguments convincing, and I hadn’t budged in all that time, don’t you’d think you’d start to suspect that I was particularly thick?
I don;’t see the problem. Philosophical competence is largely about understanding the problem.
Mathematics isn’t physics. Mathematicians prove theorems from axioms, not from experiments.
Yes, but the fact that the universe itself seems to adhere to the logical systems by which we construct mathematics gives credence to the idea that the logical systems are fundamental, something we’ve discovered rather than producing. We judge claims about nonobserved mathematical constructs like transfinites according to those systems,
Metaethical systems usually have axioms like “Maximising utility is good”.
But utility is a function of values. A paperclipper will produce utility according to different values than a human.
I am not sure what you mean by “exist” here. Claims are objectively true if most rational minds converge on them. That doesn’t require Objective Truth to float about in space here.
Why would most rational minds converge on values? Most human minds converge on some values, but we share almost all our evolutionary history and brain structure. The fact that most humans converge on certain values is no more indicative of rational minds in general doing so than the fact that most humans have two hands is indicative of most possible intelligent species converging on having two hands.
Does that mean we can;t use moral intuitions at all, or that they must be used with caution?
It means we should be aware of what our intuitions are and what they’ve developed to be good for. Intuitions are evolved heuristics, not a priori truth generators.
Philosphers talk about intuitions, because that is the term for something foundational that seems true, but can’t be justified by anything more foundational. LessWrongians don’t like intuitions, but don’t see to be able to explain how to manage without them.
It seems like you’re equating intuitions with axioms here. We can (and should) recognize that our intuitions are frequently unhelpful at guiding us to he truth, without throwing out all axioms.
Did you post any comments explaining to the professional philosophers where they had gone wrong?
If I did, I don’t remember them. I may have, I may have felt someone else adequately addressed them, I may not have felt it was worth the bother.
It seems to me that you’re trying to foist onto me the effort of locating something which you were the one to testify was there in the first place.
I don;’t see the problem. Philosophical competence is largely about understanding the problem.
And philosophers frequently fall into the pattern of believing that other philosophers disagree with each other due to failure to understand the problems they’re dealing with.
In any case, I reject the notion that dismissing large contingents of philosophers as lacking in competence is a valuable piece of evidence with respect t crankishness, and if you want to convince me that I am taking a crankish attitude, you’ll need to offer some other evidence.
Yes, but the fact that the universe itself seems to adhere to the logical systems by which we construct mathematics gives credence to the idea that the logical systems are fundamental, something we’ve discovered rather than producing. We judge claims about nonobserved mathematical constructs like transfinites according to those systems,
But claims about transfinities don’t correspond directly to any object. Maths is “spun off” from other
facts, on your view. So, by analogy, moral realism could be “spun off” without needing any Form of the Good to correspond to goodness.
Metaethical systems usually have axioms like “Maximising utility is good”.
But utility is a function of values. A paperclipper will produce utility according to different values than a human.
You seem to be assumig that morality is about individual behaviour. A moral realist system like utiitarianism operates at the group level, and woud take paperclipper values into account along with all others. Utilitarianism doens’t care what values are, it just sums or averages them.
Or perhaps you are making the objection that an entity woud need moral values to care about the preferences of others in the first place. That is addressed by, another kind of realism, the rationality-based kind, which
starts from noting that rational agents have to have some value in common, because they are all rational.
Why would most rational minds converge on values?
a) they don’t have to converge on preferences, since thing like utilitariansim are preference-neutral.
b) they already have to some extent because they are rational
Most human minds converge on some values, but we share almost all our evolutionary history and brain structure. The fact that most humans converge on certain values is no more indicative of rational minds in general doing so than the fact that most humans have two hands is indicative of most possible intelligent species converging on having two hands.
I was talking about rational minds converging on the moral claims, not on values.. Rational minds can converge on
“maximise group utility” whilst what is utilitous varies considerably.
Philosphers talk about intuitions, because that is the term for something foundational that seems true, but can’t be justified by anything more foundational. LessWrongians don’t like intuitions, but don’t see to be able to explain how to manage without them.
It seems like you’re equating intuitions with axioms here.
Axioms are formal statements, intuitions are gut feelings tha are often used to justify axioms.
We can (and should) recognize that our intuitions are frequently unhelpful at guiding us to he truth, without throwing out all axioms.
There is another sense of “intuition” where someone feels that it’s going to rain tomorrow or something. They’re
not the foundational kind.
And philosophers frequently fall into the pattern of believing that other philosophers disagree with each other due to failure to understand the problems they’re dealing with.
But claims about transfinities don’t correspond directly to any object. Maths is “spun off” from other facts, on your view. So, by analogy, moral realism could be “spun off” without needing any Form of the Good to correspond to goodness.
Spun off from what, and how?
You seem to be assumig that morality is about individual behaviour. A moral realist system like utiitarianism operates at the group level, and woud take paperclipper values into account along with all others. Utilitarianism doens’t care what values are, it just sums or averages them.
Speaking as a utilitarian, yes, utilitarianism does care about what values are. If I value paperclips, I assign utility to paperclips, if I don’t, I don’t.
Or perhaps you are making the objection that an entity woud need moral values to care about the preferences of others in the first place. That is addressed by, another kind of realism, the rationality-based kind, which starts from noting that rational agents have to have some value in common, because they are all rational.
Why does their being rational demand that they have values in common? Being rational means that they necessarily share a common process, namely rationality, but that process can be used to optimize many different, mutually contradictory things. Why should their values converge?
I was talking about rational minds converging on the moral claims, not on values.. Rational minds can converge on “maximise group utility” whilst what is utilitous varies considerably.
So what if a paperclipper arrives at “maximize group utility,” and the only relevant member of the group which shares its conception of utility is itself, and its only basis for measuring utility is paperclips? The fact that it shares the principle of maximizing utility doesn’t demand any overlap of end-goal with other utility maximizers.
Axioms are formal statements, intuitions are gut feelings tha are often used to justify axioms.
But, as I’ve pointed out previously, intuitions are often unhelpful, or even actively misleading, with respect to locating the truth.
If our axioms are grounded in our intuitions, then entities which don’t share our intuitions will not share our axioms.
So do they call for them to be fired?
No, but neither do I, so I don’t see why that’s relevant.
Request accepted, I’m not sure if he’s being deliberately obtuse, but I think this discussion probably would have borne fruit earlier if it were going to. I too often have difficulty stepping away from a discussion as soon as I think it’s unlikely to be a productive use of my time.
What is your basis for the designation ? I am not arguing with your suggestion (I was leaning in the same direction myself), I’m just genuinely curious. In other words, why do you believe that PrawnOfFate is a troll, and not someone who is genuinely confused ?
In other words, why do you believe that PrawnOfFate is a troll, and not someone who is genuinely confused ?
“Troll” is a somewhat fuzzy label. Sometimes when I am wanting to be precise or polite and avoid any hint of Fundamental Attribution Error I will replace it with the rather clumsy or verbose “person who is exhibiting a pattern of behaviour which should not be fed”. The difference between “Person who gets satisfaction from causing disruption” and “Person who is genuinely confused and is displaying an obnoxiously disruptive social attitude” is largely irrelevant (particularly when one has their Hansonian hat on).
If there was a word in popular use that meant “person likely to be disruptive and who should not be fed” that didn’t make any assumptions or implications of the intent of the accused then that word would be preferable.
I am not sure I can expalin that succintly at the moment. It is also hard to summarise how you get from counting apples to transfinite numbers.
Why does their being rational demand that they have values in common? Being rational means that they necessarily share a common process, namely rationality, but that process can be used to optimize many different, mutually contradictory things. Why should their values converge?
Rationality is not an automatic process, it is skill that has to be learnt and consciously applied. Individuals will only
be rational if their values prompt them to. And rationality itself implies valuing certain things (lack of bias, non arbitrariness).
So what if a paperclipper arrives at “maximize group utility,” and the only relevant member of the group which shares its conception of utility is itself, and its only basis for measuring utility is paperclips? The fact that it shares the principle of maximizing utility doesn’t demand any overlap of end-goal with other utility maximizers.
Utilitarians want to maximise the utiity of their groups, not their own utility. They don;t have to believe the utlity of others
is utilitous to them, they just need to feed facts about group utility into an aggregation function. And, using the same facts and same function, different utilitarians will converge. That’s kind of the point.
But, as I’ve pointed out previously, intuitions are often unhelpful, or even actively misleading, with respect to locating the truth.
Compared to what? Remember, I am talking about foundational intuitions, the kind at the bottom of the stack. The empirical method of locating the truth rests on the intuition that the senses reveal a real external world. Which I share. But what proves it? That’s the foundational issue.
A lot of people here would seem to disagree, since I keep hearing the objection that ethics is all about values, and values are nothing to do with rationality.
It feels to me like the Orthogonality Thesis is a fairly precise statement, and moral anti-realism is a harder to make precise but at least well understood statement, and “values are nothing to do with rationality” is something rather vague that could mean either of those things or something else.
I am getting the feeling that you’re assuming there’s something in the agent’s code that says, “you can look at and change any line of code you want, except lines 12345..99999, because that’s where your terminal goals are”. Is that right ?
You can change that line, but it will result in you optimizing for something other than paperclips, resulting in less paperclips.
Suppose that you gained the power to both discern objective morality, and to alter your own source code. You use the former ability, and find that the basic morally correct principle is maximizing the suffering of sentient beings. Do you alter your source code to be in accordance with this?
I’ve never understood this argument.
It’s like a slaveowner having a conversation with a time-traveler, and declaring that they don’t want to be nice to slaves, so any proof they could show is by definition invalid.
If the slaveowner is an ordinary human being, they already have values regarding how to treat people in their in-groups which they navigate around with respect to slaves by not treating them as in-group members. If they could be induced to see slaves as in-group members, they would probably become nicer to slaves whether they intended to or not (although I don’t think it’s necessarily the case that everyone who’s sufficiently acculturated to slavery could be induced to see slaves as in-group members.)
If the agent has no preexisting values which can be called into service of the ethics they’ve being asked to adopt, I don’t think that they could be induced to want to adopt them.
I don’t think it’s true that if there’s an objective morality, agents necessarily value it whether they realize it or not though. Why couldn’t there be inherently immoral or amoral agents?
… because the whole point of an “objective” morality is that rational agents will update to believe they should follow it? Otherwise we might as easily be such “inherently immoral or amoral agents”, and we wouldn’t want to discover such objective “morality”.
Well, if it turned out that something like “maximize suffering of intelligent agents” were written into the fabric of the universe, I think we’d have to conclude that we were inherently immoral agents.
The same evidence that persuades you that we don’t want to maximize suffering in real life is evidence that it wouldn’t be, I guess.
Side note: I’ve never seen anyone try to defend the position that we should be maximizing suffering, whereas I’ve seen all sorts of eloquent and mutually contradictory defenses of more, um, traditional ethical frameworks.
However, both the pebble-sorters and myself share one key weakness: we cannot examine ourselves from the outside; we can’t see our own source code.
Being able to read all you source code could be ultimate in self-reflection (absent Loeb’s theorem), but it doens’t follow that those who can’t read their source-code can;t self reflect at all. It’s just imperfect, like everything else.
As I was reading the article about the pebble-sorters, I couldn’t help but think, “silly pebble-sorters, their values are so arbitrary and ultimately futile”. This happened, of course, because I was observing them from the outside. If I was one of them, sorting pebbles would feel perfectly natural to me; and, in fact, I could not imagine a world in which pebble-sorting was not important. I get that.
This is about rational agents. If pebble sorters can’t think of a non-arbitrary reason for sorting pebbles, they would recognise it a silly. Why not? Humans can spend years collecting stamps, or something, only to decide it is pointless.
However, both the pebble-sorters and myself share one key weakness: we cannot examine ourselves from the outside; we can’t see our own source code. An AI, however, could
What...why...? Is there something special about silicon? Is it made from different quarks?
Being rational doesn’t automatically make an agent able to read its own source code. Remember that, to the pebble-sorters, sorting pebbles is an axiomatically reasonable activity; it does not require justification. Only someone looking at them from the outside could evaluate it objectively.
What...why...? Is there something special about silicon?
Not at all; if you got some kind of a crazy biological implant that let you examine your own wetware, you could do it too. Silicon is just a convenient example.
Not at all; if you got some kind of a crazy biological implant that let you examine your own wetware, you could do it too. Silicon is just a convenient example.
Humans can examine their own thinking. Not perfectly, because we aren’t perfect. But we can do it, and indeed do so all the time. It’s a major focus on this site, in fact.
Being rational doesn’t automatically make an agent able to read its own source code. Remember that, to the pebble-sorters, sorting pebbles is an axiomatically reasonable activity;
You can define a pebblesorter as being unable to update its values, and I can point out that most rational agents won’t be like that. Most rational agents won’t have unupdateable values, because they will be messilly designed/evolved, and therefore will be capable of converging on an ethical system via their shared rationality.
Most rational agents won’t have unupdateable values, because they will be messilly designed/evolved...
We are messily designed/evolved, and yet we do not have updatable goals or perfect introspection. I absolutely agree that some agents will have updatable goals, but I don’t see how you can upgrade that to “most”.
...and therefore will be capable of converging on an ethical system via their shared rationality.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ? There may well be one, but I am not convinced of this, so you’ll have to convince me.
We blatantly have updatable goals: people do not have the same goals at 5 as they do at 20 or 60. I don’t know why perfect introspection would be needed to have some ability to update.
Sorry, that was bad wording on my part; I should’ve said, “updatable terminal goals”. I agree with what you said there.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ?
Yes, that’s what this whole discussion is about.
I don’t feel confident enough in either “yes” or “no” answer, but I’m currently leaning toward “no”. I am open to persuasion, though.
I personally don’t know of any evidence in favor of terminal values, so I do agree with you there. Still, it makes a nice thought experiment: could we create an agent possessed of general intelligence and the ability to self-modify, and then hardcode it with terminal values ? My answer would be, “no”, but I could be wrong.
That said, I don’t believe that there exists any kind of a universally applicable moral system, either.
people do not have the same goals at 5 as they do at 20 or 60
Source?
They take different actions, sure, but it seems to me, based on childhood memories etc, that these are in the service of roughly the same goals. Have people, say, interviewed children and found they report differently?
This is about rational agents. If pebble sorters can’t think of a non-arbitrary reason for sorting pebbles, they would recognise it a silly.
I’d use humans as a counterexample, but come to think, a lot of humans refuse to believe our goals could be arbitrary, and have developed many deeply stupid arguments that “prove” they’re objective.
However, I’m inclined to think this is a flaw on the part of humans, not something rational.
One does not update terminal values, that’s what makes them terminal.
Unicorns have horns...
Defining something something abstractly says nothing about its existence or likelihod. A neat division between
terminal and abstract values could be implemented with sufficient effort, or could evolve with a low likelihood, but
it is not a model of intelligence in general, and it is not likely just because messy solutions are likelier than neater ones. Actual and really existent horse-like beings are not going to acquire horns any time soon, no matter how clearly you define unicornhood.
Arguably, humans might not really have terminal values
Plausibly. You don;t now care about the same things you cared about when you were 10.
On what basis might a highly flexible paperclip optimizing program be persuaded that something else was more important than paperclips?
Show me one. Clippers are possible but not likely. I am not and never have said that Clippers would converge on the One True Ethics, I said that (super)intelligent, (super)rational agents would. The average SR-SI agent would not be a clipper for exactly the same reason that the average human is not an evil genius. There are no special rules for silicon!
I’m noticing that you did not respond to my question of whether you’ve read No Universally Compelling Arguments and Sorting Pebbles Into Correct Heaps. I’d appreciate it if you would, because they’re very directly relevant to the conversation, and I don’t want to rehash the content when Eliezer has already gone to the trouble of putting them up where anyone can read them. If you already have, then we can proceed with that shared information, but if you’re just going to ignore the links, how do I know you’re going to bother giving due attention to anything I write in response?
Plausibly. You don;t now care about the same things you cared about when you were 10.
I have different interests now than I did when I was ten, but that’s not the same as having different terminal values.
Suppose a person doesn’t support vegetarianism; they’ve never really given it much consideration, but they default to the assumption that eating meat doesn’t cause much harm, and meat is tasty, so what’s the big deal?
When they get older, they watch some videos on the conditions in which animals are raised for slaughter, read some studies on the neurology of livestock animals with respect to their ability to suffer, and decide that mainstream livestock farming does cause a lot of harm after all, and so they become a vegetarian.
This doesn’t mean that their values have been altered at all. They’ve simply revised their behavior on new information with an application of the same values they already had. They started out caring about the suffering of sentient beings, and they ended up caring about the suffering of sentient beings, they just revised their beliefs about what actions that value should compel on the basis of other information.
To see whether person’s values have changed, we would want to look, not at whether they endorse the same behaviors or factual beliefs that they used to, but whether their past self could relate to the reasons their present self has for believing and supporting the things they do now.
The average SR-SI agent would not be a clipper for exactly the same reason that the average human is not an evil genius.
The fact that humans are mostly not evil geniuses says next to nothing about the power of intelligence and rationality to converge on human standards of goodness. We all share almost all the same brainware. To a pebblesorter, humans would nearly all be evil geniuses, possessed of powerful intellects, yet totally bereft of a proper moral concern with sorting pebbles.
Many humans are sociopaths, and that slight deviation from normal human brainware results in people who cannot be argued into caring about other people for their own sakes. Nor can a sociopath argue a neurotypical person into becoming a sociopath.
If intelligence and rationality cause people to update their terminal values, why do sociopaths whose intelligence and rationality are normal to high by human standards (of which there are many) not update into being non-sociopaths, or vice-versa?
Many humans are sociopaths, and that slight deviation from normal human brainware results in people who cannot be argued into caring about other people for their own sakes. Nor can a sociopath argue a neurotypical person into becoming a sociopath.
There’s a difference between being a sociopath and being a jerk. Sociopaths don’t need to rationalize dicking other people over.
If Ayn Rand’s works could actually turn formerly neurotypical people into sociopaths, that would be a hell of a find, and possibly spark a neuromedical breakthrough.
Sure, you can negotiate with an agent with conflicting values, but I don’t think its beside the point.
You can get a sociopath to cooperate with non-sociopaths by making them trade off for things they do care about, or using coercive power. But Clippy doesn’t have any concerns other than paperclips to trade off against its concern for paperclips, and we’re not in a position to coerce Clippy, because Clippy is powerful enough to treat us as an obstacle to be destroyed. The fact that the non-sociopath majority can more or less keep the sociopath minority under control doesn’t mean that we could persuade agents whose values deviate far from our own to accommodate us if we didn’t have coercive power over them.
Prawnoffate’s point to begin with was that humans could and would change their fundamental values on new information about what is moral. I suggested sociopaths as an example of people who wouldn’t change their values to conform to those of other people on the basis of argument or evidence, nor would ordinary humans change their fundamental values to a sociopath’s.
If we’ve progressed to a discussion of whether it’s possible to coerce less powerful agents into behaving in accordance with our values, I think we’ve departed from the context in which sociopaths were relevant in the first place.
Are you arguing Ayn Rand can argue sociopaths into caring about other people for their own sakes, or argue neurotypical people into becoming sociopaths?
(I could see both arguments, although as Desrtopa references, the latter seems unlikely. Maybe you could argue a neurotypical person into sociopathic-like behavior, which seems a weaker and more plausible claim.)
I have different interests now than I did when I was ten, but that’s not the same as having different terminal values.
You can construe the facts as being compatible with the theory of terminal values, but that doesn’t actually support the theory of TVs.
To a pebblesorter, humans would nearly all be evil geniuses, possessed of powerful intellects, yet totally bereft of a proper moral concern with sorting pebbles.
Ethics is about regulating behaviour to take into account the preferences of others. I don’t see how pebblesorting would count.
If intelligence and rationality cause people to update their terminal values, why do sociopaths whose intelligence and rationality are normal to high by human standards (of which there are many) not update into being non-sociopaths, or vice-versa?
Much the same way as I understand the meanings of most words. Why is that a problem in this case.
“That’s what it means by definition” wasn’t much help to you when it came to terminal values, why do you think “that’s what the word means” is useful here and not there? How do you determine that this word, and not that one, is an accurate description of a thing that exists?
Non psychopaths don’t generally put other people above themselves—that is, they treat people equally, incuding themselevs.
This is not, in fact, true. Non-psychopaths routinely apply double standards to themselves and other people, and don’t necessarily even realize they’re doing it.
If we accept that it’s true for the sake of an argument though, how do we know that they don’t just have a strong egalitarian bias?
How do you determine that this word, and not that one, is an accurate description of a thing that exists?
Are you saying ethical behavour doesn’t exist on this planet, or that ethical behaviour as I have defined it doens’t exist on this planet?
This is not, in fact, true. Non-psychopaths routinely apply double standards to themselves and other people, and don’t necessarily even realize they’re doing it.
OK. Non-psychopaths have a lesser degree of egotisitical bias. Does that prove they have some different bias? No. Does that prove an ideal rational and ethical agent would still have some bias from some point of view?
No
This is not, in fact, true. Non-psychopaths routinely apply double standards to themselves and other people, and don’t necessarily even realize they’re doing it.
That’s like saying they have a bias towards not having a bias.
Are you saying ethical behavour doesn’t exist on this planet, or that ethical behaviour as I have defined it doens’t exist on this planet?
I’m saying that ethical behavior as you have defined it is almost certainly not a universal psychological attractor. An SI-SR agent could look at humans and say “yep, this is by and large what humans think of as ‘ethics,’” but that doesn’t mean it would exert any sort of compulsion on it.
OK. Non-psychopaths have a lesser degree of egotisitical bias. Does that prove they have some different bias? No. Does that prove an ideal rational and ethical agent would still have some bias from some point of view? No
You not only haven’t proven that psychopaths are the ones with an additional bias, you haven’t even addressed the matter, you’ve just taken it for granted from the start.
How do you demonstrate that psychopaths have an egotistical bias, rather than non-psychopaths having an egalitarian bias, or rather than both of them having different value systems and pursuing them with equal degrees of rationality?
I’m saying that ethical behavior as you have defined it is almost certainly not a universal psychological attractor.
I didn’t say it was universal among all entities of all degrees of intelligence or rationality. I said there was a non neglible probability that agents of a certain level of rationality converging on an understanding of ethics.
An SI-SR agent could look at humans and say “yep, this is by and large what humans think of as ‘ethics,’” but that doesn’t mean it would exert any sort of compulsion on it.
“SR” stands to super rational. Rational agents find rational arguments rationally compelling. If rational arguments can be made for a certain understanding of ethics, they will be compelled by them.
You not only haven’t proven that psychopaths are the ones with an additional bias,
Do you contest that psychopaths have more egotistical bias than the general population?
you’ve just taken it for granted from the start.
Yes. I thought it was something everyone knows.
rather than non-psychopaths having an egalitarian bias, o
it is absurd to characterise the practice of treating everyone the same as a form of bias.
I didn’t say it was universal among all entities of all degrees of intelligence or rationality. I said there was a non neglible probability that agents of a certain level of rationality converging on an understanding of ethics.
Where does this non-negligible probability come from though? When I’ve asked you to provide any reason to suspect it, you’ve just said that as you’re not arguing there’s a high probability, there’s no need for you to answer that.
“SR” stands to super rational. Rational agents find rational arguments rationally compelling. If rational arguments can be made for a certain understanding of ethics, they will be compelled by them.
I have been implicitly asking all along here, what basis do we have for suspecting at all that any sort of universally rationally compelling ethical arguments exist at all?
Do you contest that psychopaths have more egotistical bias than the general population?
Yes.
it is absurd to characterise the practice of treating everyone the same as a form of bias.
Where does this non-negligible probability come from though?
Combining the probabilites of the steps of the argument.
I have been implicitly asking all along here, what basis do we have for suspecting at all that any sort of universally rationally compelling ethical arguments exist at all?
There are rationally compelling arguments.
Rationality probably universalisable since it is based on the avoidance of biases, incuding those regarding who
and where your are.
There is nothing about ethics that makes it unseceptible to rational argument.
There are examples of rational argument about ethics, and of people being compelled by them.
Do you contest that psychopaths have more egotistical bias than the general population?
Yes.
That is an extraordinary claim, and the burden is on you to support it.
It is absurd to characterise the practice of treating everyone the same as a form of bias.
Why?
In the sense of “Nothing is a kind of something” or “atheism is a kind of religion”.
Rationality probably universalisable since it is based on the avoidance of biases, incuding those regarding who and where your are.
There is nothing about ethics that makes it unseceptible to rational argument.
There are examples of rational argument about ethics, and of people being compelled by them.
Rationality may be universalizable, but that doesn’t mean ethics is.
If ethics are based on innate values extrapolated into systems of behavior according to their expected implications, then people will be susceptible to arguments regarding the expected implications of those beliefs, but not arguments regarding their innate values.
I would accept something like “if you accept that it’s bad to make sentient beings suffer, you should oppose animal abuse” can be rationally argued for, but that doesn’t mean that you can step back indefinitely and justify each premise behind it. How would you convince an entity which doesn’t already believe it that it should care about happiness or suffering at all?
That is an extraordinary claim, and the burden is on you to support it.
I would claim the reverse, that saying that sociopathic people have additional egocentric bias is an extraordinary claim, and so I will ask you to support it, but of course, I am quite prepared to reciprocate by supporting my own claim.
It’s much easier to subtract a heuristic from a developed mind by dysfunction than it is to add one. It is more likely as a prior that sociopaths are missing something that ordinary people possess, rather than having something that most people don’t, and that something appears to be the brain functions normally concerned with empathy. It’s not that they’re more concerned with self interest than other people, but that they’re less concerned with other people’s interests.
Human brains are not “rationality+biases,” so that a you could systematically subtract all the biases from a human brain and end up with perfect rationality. We are a bunch of cognitive adaptations, some of which are not at all in accordance with strict rationality, hacked together over our evolutionary history. So it makes little sense to judge humans with unusual neurology as being humans plus or minus additional biases, rather than being plus or minus additional functions or adaptations.
In the sense of “Nothing is a kind of something” or “atheism is a kind of religion”.
Is it a bias to treat people differently from rocks?
Now, if we’re going to categorize innate hardwired values, such as that which Clippy has for paperclips, as biases, then I would say “yes.”
I don’t think it makes sense to categorize such innate values as biases, and so I do not think that Clippy is “biased” compared to an ideally rational agent. Instrumental rationality is for pursuing agents’ innate values. But if you think it takes bias to get you from not caring about paperclips to caring about paperclips, can you explain how, with no bias, you can get from not caring about anything, to caring about something?
If there were in fact some sort of objective morality, under which some people were much more valuable than others, then an ethical system which valued all people equally would be systematically biased in favor of the less valuable.
So, I imagine the following conversation between two people (A and B): A: It’s absurd to say ‘atheism is a kind of religion,’ B: Why? A: Well, ‘religion’ is a word with an agreed-upon meaning, and it denotes a particular category of structures in the world, specifically those with properties X, Y, Z, etc. Atheism lacks those properties, so atheism is not a religion. B: I agree, but that merely shows the claim is mistaken. Why is it absurd? A: (thinks) Well, what I mean is that any mind capable of seriously considering the question ‘Is atheism a religion?’ should reach the same conclusion without significant difficulty. It’s not just mistaken, it’s obviously mistaken. And, more than that, I mean that to conclude instead that atheism is a religion is not just false, but the opposite of the truth… that is, it’s blatantly mistaken.
Is A in the dialog above capturing something like what you mean?
If so, I disagree with your claim. It may be mistaken to characterize the practice of treating everyone the same as a form of bias, but it is not obviously mistaken or blatantly mistaken. In fact, I’m not sure it’s mistaken at all, though if it is a bias, it’s one I endorse among humans in a lot of contexts.
So, terminology aside, I guess the question I’m really asking is: how would I conclude that treating everyone the same (as opposed to treating different people differently) is not actually a bias, given that this is not obvious to me?
Plausibly. You don;t now care about the same things you cared about when you were 10.
Are we talking sweeties here? Because that seems more like lack of foresight than value drift. Or are we talking puberty? That seems more like new options becoming available.
I am not and never have said that Clippers would converge on the One True Ethics, I said that (super)intelligent, (super)rational agents would.
You should really start qualifying that with “most actual” if you don’t want people to interpret it as applying to all possible (superintelligent) minds.
But you’re talking about parts of mindspace other than ours, right? The Superhappies are strikingly similar to us, but they still choose the superhappiest values, not the right ones.
I don’t require their values to converge, I require them to accept the truths of certain claims. This happens in real life. People say “I don’t like X, but I respect your right to do it”. The first part says X is a disvalue, the second is an override coming from rationality.
A machine that maximises paperclips can believe all true propositions in the world, and go on maximising paperclips. Nothing compels it to act any differently. You expect that rational agents will eventually derive the true theorems of morality. Yes, they will. Along with the true theorems of everything else. It won’t change their behaviour, unless they are built so as to send those actions identified as moral to the action system.
If you don’t believe me, I can only suggest you study AI (Thrun & Norvig) and/or the metaethics sequence until you do. (I mean really study. As if you were learning particle physics. It seems the usual metaethical confusions are quite resilient; in most peoples’ cases I wouldn’t expect them to vanish without actually thinking carefully about the data presented.) And, well, don’t expect to learn too much from off-the-cuff comments here.
A machine that maximises paperclips can believe all true propositions in the world, and go on maximising paperclips. Nothing compels it to act any differently. You expect that rational agents will eventually derive the true theorems of morality. Yes, they will.
Well, that justifies moral realism.
Along with the true theorems of everything else. It won’t change their behaviour, unless they are built so as to send those actions identified as moral to the action system.
...or its an emergent feature, or they can update into something that works that way. You are tacitly assuming that you clipper is barely
an AI at all...that is just has certain functions it performs blindly because its built that way. But a supersmart, uper-rational clipper has to be able to update. By hypothesis, clippers have certain functionalities walled off from update. People are messilly designed and unlikely to work that way. So are likely AIs and aliens.
Only rational agents, not all mindful agents, will have what it takes to derive objective moral truths.
They don’t need to converge on all their values to converge on all their moral truths, because ratioanity
can tell you that a moral claim is true even if it is not in your (other) interests. Individuals can
value rationality, and that valuation can override other valuations.
Only rational agents, not all mindful agents, will have what it takes to derive objective moral truths.
The further claim that agents will be motivated to do derive moral truths., and to act on them, requires a further criterion. Morality is about regulating behaviour in a society, So only social rational agents will have motivation to update. Again, they do not have to converge on values beyond the shared value of sociality.
By hypothesis, clippers have certain functionalities walled off from update.
A paperclipper no more has a wall stopping it from updating into morality than my laptop has a wall stopping it from talking to me. My laptop doesn’t talk to me because I didn’t program it to. You do not update into pushing pebbles into prime-numbered heaps because you’re not programmed to do so.
“Emergent” in this context means “not explicitly programmed in”. There are robust examples.
A paperclipper no more has a wall stopping it from updating into morality than my laptop has a wall stopping it from talking to me.
Your laptop cannot talk to you because the natural language is an unsolved problem.
Does a stone roll uphill on a whim?
Not wanting to do something is not the slightest guarantee of not actually doing it.f
An AI can update its values because value drift is an unsolved problem
Clippers can’t update their values by definition, but you can’t define anything into existence or statistical significance.
You do not update into pushing pebbles into prime-numbered heaps because you’re not programmed to do so.
Not programmed to, or programmed not to? If you can code up a solution to value drift, lets see it. Otherwise, note that Life programmes can update to implement glider generators without being “programmed to”.
Not programmed to, or programmed not to? If you can code up a solution to value drift, lets see it. Otherwise, note that Life programmes can update to implement glider generators without being “programmed to”.
...with extremely low probability. It’s far more likely that the Life field will stabilize around some relatively boring state, empty or with a few simple stable patterns. Similarly, a system subject to value drift seems likely to converge on boring attractors in value space (like wireheading, which indeed has turned out to be a problem with even weak self-modifying AI) rather than stable complex value systems. Paperclippism is not a boring attractor in this context, and a working fully reflective Clippy would need a solution to value drift, but humanlike values are not obviously so, either.
I’m increasingly baffled as to why AI is always brought in to discussions of metaethics. Societies of rational agents need ethics to regulate their conduct. Out AIs aren’t sophisticated enough to live in their own socieities. A wireheading AI isn’t even going to be able to survive “in the wild”. If you could build an artificial society of AI, then the questions of whether they spontaneously evolved ethics would be a very interesting and relevant datum. But AIs as we know them aren’t good models for the kinds of entities to which
morality is relevant. And Clippy is particularly exceptional example of an AI. So why do people keep saying “Ah, but Clippy...”...?
And Clippy is particularly exceptional example of an AI. So why do people keep saying “Ah, but Clippy...”...?
Well, in this case it’s because the post I was responding to mentioned Clippy a couple of times, so I thought it’d be worthwhile to mention how the little bugger fits into the overall picture of value stability. It’s indeed somewhat tangential to the main point I was trying to make; paperclippers don’t have anything to do with value drift (they’re an example of a different failure mode in artificial ethics) and they’re unlikely to evolve from a changing value system.
Sorry..did you mean FAI is about societies, or FAI is about singletons?
But if ethics does emerge as an organisational principle in socieities, that’s all you need for FAI. You don’t even to to worry about one sociopathic AI turning unfriendly, because the majority will be able to restrain it.
UFAI is about singletons. If you have an AI society whose members compare notes and share information—which ins isntrumentally useful for them anyway—your reduce the probability of singleton fooming.
Any agent that fooms becomes a singleton. Thus, it doesn’t matter if they acted nice while in a society; all that matters is whether they act nice as a singleton.
An agent in a society is unable to force its values on the society; it needs to cooperate with the rest of society. A singleton is able to force its values on the rest of society.
But a supersmart, uper-rational clipper has to be able to update.
has to be able to update
“update”
Please unpack this and describe precisely, in algorithmic terms that I could read and write as a computer program given unlimited time and effort, this “ability to update” which you are referring to.
I suspect that you are attributing Magical Powers From The Beyond to the word “update”, and forgetting to consider that the ability to self-modify does not imply active actions to self-modify in any one particular way that unrelated data bits say would be “better”, unless the action code explicitly looks for said data bits.
It’s uncontrovesial that rational agents need to update, and that AIs need to self-modify. The claim that values are in either case insulated from updates is the extraordinary one. The Cipper theory tells you that you could build something like that if you were crazy enough. Since Clippers are contrived, nothing can be inferred from them about
typical agents. People are messy, and can accidentally update their values when trying to do something else, For instance, LukeProg updated to “atheist” after studying Christian apologetics for the opposite reason.
Yes, value drift is the typical state for minds in our experience.
Building a committed Clipper that cannot accidentally update its values when trying to do something else is only possible after the problem of value drift has been solved. A system that experiences value drift isn’t a reliable Clipper, isn’t a reliable good-thing-doer, isn’t reliable at all.
It’s uncontrovesial that rational agents need to update, and that AIs need to self-modify. The claim that values are in either case insulated from updates is the extraordinary one.
I never claimed that it was controversial, nor that AIs didn’t need to self-modify, nor that values are exempt.
I’m claiming that updates and self modification do not imply a change of behavior towards behavior desired by humans.
I can build a small toy program to illustrate, if that would help.
I am not suggesting that human ethics is coincidentally universal ethics.
I am suggesting that if neither moral realism nor relativism is initially discarded, one can eventually arrive at a compromise position where rational agents in a particular context arrive at a non arbitrary ethics which is appropriate to that context.
I presume that you take your particular ethical system (or a variant thereof) to be the one that every alien, AI and human should adopt.
Ok, so why? Why can the function ethics: actions → degree of goodness, or however else you choose the domain, not be modified? Where’s your case?
Edit: What basis would convince not one, but every conceivable superintelligence of that hypothetical choice of axioms being correct? (They wouldn’t all “cancel out” if choosing different axioms, that in itself would falsify the ethical system proposed by a lowly human as being universally correct.)
Well then, a universally correct solution based on axioms which can be chosen by the agents is a contradiction in and of itself.
I have not put forward an object-level ethical system, and I have explained why I do not need to. Physical realism does not imply that my physics is correct, metaethical realism does not imply that my ethics is the one true theory.
Ok, so why? Why can the function ethics: actions → degree of goodness, or however else you choose the domain, not be modified?
Because ethics needs to regulate behaviour—that is its functional role—and could not if individuals could justify any behaviour by re arranging action->goodness mappings.
: What basis would convince not one, but every conceivable superintelligence of that hypothetical choice of axioms being correct? (
Their optimally satisfying the constraints on ethical axioms arising from the functional role of ethics.
Well then, a universally correct solution based on axioms which can be chosen by the agents is a contradiction in and of itself.
I have not put forward an object-level ethical system, and I have explained why I do not need to. Physical realism does not imply that my physics is correct, metaethical realism does not imply that my ethics is the one true theory.
That doesn’t actually answer the quoted point. Perhaps you meant to respond to this:
I presume that you take your particular ethical system (or a variant thereof) to be the one that every alien, AI and human should adopt.
… which is, in fact, refuted by your statement.
Because ethics needs to regulate behaviour—that is its functional role—and could not if individuals could justify any behaviour by re arranging action->goodness mappings.
… which Kawoomba believes they can, AFAICT.
Their optimally satisfying the constraints on ethical axioms arising from the functional role of ethics.
Could you unpack this a little? I think I see what you’re driving at, but I’m not sure.
Then what about the second half of the argument? If individuals can “ethically” justify any behaviour, then
does or does not such “ethics” completely fail in its essential role of regulating behaviour? Because anyone
can do anything, and conjure up a justification after the fact by shifting their “frame”? A chocolate “teapot” is no teapot, non-regulative “ethics” is no ethics...
Ah, but Kawoomba doesn’t expect ethics to regulate other people, because he thinks everyone has incompatible goals. Thus ethics serves purely to define your goals.
Which, honestly, should simply be called “goals”, not “ethics”, but there you go.
Yea, honestly I’ve never seen the exact distinction between goals which have an ethics-rating, and goals which do not. I understand that humans share many ethical intuitions, which isn’t surprising given our similar hardware. Also, that it may be possible to define some axioms for “medieval Han Chinese ethics” (or some subset thereof), and then say we have an objectively correct model of their specific ethical code. About the shared intuitions amongst most humans, those could be e.g. “murdering your parents is wrong” (not even “murder is wrong”, since that varies across cultures and circumstances). I’d still call those systems different, just as different cars can have the same type of engine.
Also, I understand that different alien cultures, using different “ethical axioms”, or whatever they base their goals on, do not invalidate the medieval Han Chinese axioms, they merely use different ones.
My problem with “objectively correct ethics for all rational agents” is, you could say, where the compellingness of any particular system comes in. There is reason to believe an agent such as Clippy could not exist (edit: i.e., it probably could exist), and its very existence would contradict some “‘rational’ corresponds to a fixed set of ethics” rule. If someone would say “well, Clippy isn’t really rational then”, that would just be torturously warping the definition of “rational actor” to “must also believe in some specific set of ethical rules”.
If I remember correctly, you say at least for humans there is a common ethical basis which we should adopt (correct me otherwise). I guess I see more variance and differences where you see common elements, especially going in the future. Should some bionically enhanced human, or an upload on a spacestation which doesn’t even have parents, still share all the same rules for “good” and “bad” as an Amazon tribe living in an enclosed reservation? “Human civilization” is more of a loose umbrella term, and while there certainly can be general principles which some still share, I doubt there’s that much in common in the ethical codex of an African child soldier and Donald Trump.
My problem with “objectively correct ethics for all rational agents” is, you could say, where the compellingness of any particular system comes in. There is reason to believe an agent such as Clippy could exist, and its very existence would contradict some “‘rational’ corresponds to a fixed set of ethics” rule. If someone would say “well, Clippy isn’t really rational then”, that would just be torturously warping the definition of “rational actor” to “must also believe in some specific set of ethical rules”.
The argument is not that rational agents (for some vaue of “rational”) must believe in some rules, it is rather that they must not adopt arbitrary goals. Also, the argument only requires a statistical majority of rational agents to converge,
because of the P<1.0 thing.
Should some bionically enhanced human, or an upload on a spacestation which doesn’t even have parents, still share all the same rules for “good” and “bad” as an Amazon tribe living in an enclosed reservation?
Maybe not. The important thing is that variations in ethics should not be arbitrary—they should be systematically related to variations in circumstances.
I’m not disputing that there are goals/ethics which may be best suited to take humanity along a certain trajectory, towards a previously defined goal (space exploration!). Given a different predefined goal, the optimal path there would often be different. Say, ruthless exploitation may have certain advantages in empire building, under certain circumstances.
The Categorical Imperative in all its variants may be a decent system for humans (not that anyone really uses it).
But is the justification for its global applicability that “if everyone lived by that rule, average happiness would be maximized”? That (or any other such consideration) itself is not a mandatory goal, but a chosen one. Choosing different criteria to maximize (e.g. noone less happy than x) would yield different rules, e.g. different from the Categorical Imperative. If you find yourself to be the worshipped god-king in some ancient Mesopotanian culture, there may be many more effective ways to make yourself happy, other than the Categorical Imperative. How can it still be said to be “correct”/optimal for the king, then?
So I’m not saying there aren’t useful ethical system (as judged in relation to some predefined course), but that because those various ultimate goals of various rational agents (happiness, paperclips, replicating yourself all over the universe) and associated optimal ethics vary, there cannot be one system that optimizes for all conceivable goals.
My argument against moral realism and assorted is that if you had an axiomatic system from which it followed that strawberry is the best flavor of ice cream, but other agents which are just as intelligent with just as much optimizing power could use different axiomatic systems leading to different conclusions, how could one such system possibly be taken to be globally correct and compelling-to-adopt across agents with different goals?
Gandhi wouldn’t take a pill which may transform him into a murderer. Clippy would not willingly modify itself such that suddenly it had different goals. Once you’ve taken a rational agent apart and know its goals and, as a component, its ethical subroutines, there is no further “core spark” which really yearns to adopt the Categorical Imperative. Clippy may choose to use it, for a time, if it serves its ultimate goals. But any given ethical code will never be optimal for arbitrary goals, in perpetuity (proof by example). When then would a particular code following from particular axioms be adopted by all rational agents?
But is the justification for its global applicability that “if everyone lived by that rule, average happiness would be maximized”?
Well, not, that’s not Kant’s justification!
That (or any other such consideration) itself is not a mandatory goal, but a chosen one.
Why would a rational agent choose unhappiness?
If you find yourself to be the worshipped god-king in some ancient Mesopotanian culture, there may be many more effective ways to make yourself happy, other than the Categorical Imperative.
Yes, but that wouldn’t count as ethics. You wouldn’t want a Universal Law that one guy gets the harem, and everyone else is a slave, because you wouldn’t want to be a slave, and you probably would be.
This is brought out in Rawls’ version of Kantian ethics: you pretend to yourself that you are behind a veil that prevents you knowing what
role in society you are going to have, and choose rules that you would want to have if you were
to enter society at random.
My argument against moral realism and assorted is that if you had an axiomatic system from which it followed that strawberry is the best flavor of ice cream, but other agents which are just as intelligent with just as much optimizing power could use different axiomatic systems leading to different conclusions,
You don’t have object-level stuff like ice cream or paperclips in your axioms (maxims), you have abstract stuff,
like the Categorical Imperative. You then arrive at object level ethics by plugging in details of actual circumstances and values. These will vary, but not in an arbitrary way, as is the disadvantage of anything-goes relativism.
how could one such system possibly be taken to be globally correct and compelling-to-adopt across agents with different goals?
The idea is that things like the CI have rational appeal.
Once you’ve taken a rational agent apart and know its goals and, as a component, its ethical subroutines, there is no further “core spark” which really yearns to adopt the Categorical Imperative.
Rational agents will converge on a number of things because they are rational. None of them will think 2+2-=5.
1) You wake up in a bright box of light, no memories. You are told you’ll presently be born into an Absolute monarchy, your role randomly chosen. You may choose any moral principles that should govern that society. The Categorical Imperative would on average give you the best result.
2) You are the monarch in that society, you do not need to guess which role you’re being born into, you have that information. You don’t need to make all the slaves happy to help your goals, you can just maximize your goals directly. You may choose any moral principle you want to govern your actions. The Categorical Imperative would not give you the best result.
A different scenario: Clippy and Anti-Clippy sit in a room. Why can they not agree on epistemic facts about the most accurate laws of physics and other Aumann-mandated agreements, yet then go out and each optimize/reshape the world according to their own goals? Why would that make them not rational?
Lastly, whatever Kant’s justification, why can you not optimize for a different principle—peak happiness versus average happiness, what makes any particular justifying principle correct across all—rational—agents. Here come my algae!
You are the monarch in that society, you do not need to guess which role you’re being born into, you have that information. You don’t need to make all the slaves happy to help your goals, you can just maximize your goals directly. You may choose any moral principle you want to govern your actions. The Categorical Imperative would not give you the best result.
For what value of “best”? If the CI is the correct theory of morality, it will necessarily give your the morally best result. Maybe your complaint is that it wouldn’t maximise your personally utility. But I don’t see why you would expect that. Things like utilitarianism that seek to maximise group utility, don’t promise to make everyone blissfully happy individually. Some will lose out.
A different scenario: Clippy and Anti-Clippy sit in a room. Why can they not agree on epistemic facts about the most accurate laws of physics and other Aumann-mandated agreements, yet then go out and each optimize/reshape the world according to their own goals? Why would that make them not rational?
It would be irrational for Clippy to sing up to an agreement with Beady according to which Beady gets to
turn Clippy and all his clips into beads. It is irrational for agents to sign up to anyhting which is not in their interests, and it is not in their interests to have no contract at all. So rational agents, even if they do not
converge on all their goals, will negotiate contracts that minimise their disutility Clippy and Beady might take half the universe each.
Lastly, whatever Kant’s justification, why can you not optimize for a different principle—peak happiness versus average happiness, what makes any particular justifying principle correct across all—rational—agents.
If you think RAs can converge on an ultimately correct theory of physics (which we don’t have), what is to stop them converging on the correct theory of morality, which we also don’t have?
Not very rational for those to adopt a losing strategy (from their point of view), is it? Especially since they shouldn’t reason from a point of “I could be the king”. They aren’t, and they know that. No reason to ignore that information, unless they believe in some universal reincarnation or somesuch.
It is irrational for agents to sign up to anyhting which is not in their [added: current] interests
Yes. Which is why rational agents wouldn’t just go and change/compromise their terminal values, or their ethical judgements (=no convergence).
what is to stop them converging on the correct theory of morality, which we also don’t have?
Starting out with different interests. A strong clippy accommodating a weak beady wouldn’t be in its best self-interest. It could just employ a version of morality which is based on some tweaked axioms, yielding different results.
There are possibly good reasons for us as a race to aspire to working together. There are none for a domineering Clippy to take our interests into account, yielding to any supposedly “correct” morality would strictly damage its own interests.
Not very rational for those to adopt a losing strategy (from their point of view), is it? Especially since they shouldn’t reason from a point of “I could be the king”. They aren’t, and they know that. No reason to ignore that information, unless they believe in some universal reincarnation or somesuch.
Someone who adopts the “I don;t like X, but I respect peoples right to do it” approach is sacrificing some of their values to their evaluation of rationality and fairness. They would not do that if their rationality did not outweigh other values, But they are not having all their values maximally satisfied, so in that sense they are losing out.
Yes. Which is why rational agents wouldn’t just go and change/compromise their terminal values, or their ethical judgements (=no convergence).
There’s no evidence of terminal values. Judgements can be updated without changing values.
Starting out with different interests. A strong clippy accomodating a weak beady wouldn’t be in its best self-interest. It could just employ a version of morality which is based on some tweaked axioms, yielding different results.
Not all agents are interested in physics or maths. Doesn’t stop their claims being objetive.
It would be irrational for Clippy to sing up to an agreement with Beady according to which Beady gets to turn Clippy and all his clips into beads. It is irrational for agents to sign up to anyhting which is not in their interests, and it is not in their interests to have no contract at all. So rational agents, even if they do not converge on all their goals, will negotiate contracts that minimise their disutility Clippy and Beady might take half the universe each.
Not Beady, Anti-Clippy: an agent that is the precise opposite of Clippy. It wants to minimize the number of paperclips.
Yes, but that wouldn’t count as ethics. You wouldn’t want a Universal Law that one guy gets the harem, and everyone else is a slave, because you wouldn’t want to be a slave, and you probably would be.
If there are a lot of similar agents in similar positions, Kantian ethics works, no matter what their goals. For example, theft may appear to have positive expected value—assuming you’re selfish—but it has positive expected value for lots of people, and if they all stole the economy would collapse.
OTOH, if you are in an unusual position, the Categorical Imperative only has force if you take it as axiomatic.
This is brought out in Rawls’ version of Kantian ethics: you pretend to yourself that you are behind a veil that prevents you knowing what role in society you are going to have, and choose rules that you would want to have if you were to enter society at random.
That’s not a version of Kantian ethics, it’s a hack for designing a society without privileging yourself. If you’re selfish, it’s a bad idea.
Kawoomba, maybe it would be better for you to think in terms of ethics along the lines of Kant’s Categorical Imperative, or social contract theory; ways for agents with different goals to co-operate.
Wouldn’t that presuppose that “cooperation is the source/the sine qua non of all good”?
Sure, we can redefine some version of ethics in such a cooperative light, and then conclude that many agents don’t give a hoot about such ethics, or regard it in the cold, hard terms of game theory, e.g. negotiating/extortion strategies only.
Judging actions as “good” or “bad” doesn’t prima facie depend entirely on cooperation, the good of your race, or whatever. For example, if you were a part of a planet-eating race, consuming all matter/life in its path—while being very friendly amongst themselves—couldn’t it be considered ethically “good” even from a human perspective to killswitch your own race? And “bad” from the moral standpoint of the planet-eating race?
The easiest way to dissolve such obvious contradictions is to say that there is just not, in fact, a universal hierarchy ranking ethical systems universally, regardless of the nature of the (rational = capable reasoner) agent.
Doesn’t mean an agent isn’t allowed to strongly defend what it considers to be moral, to die for it, even.
Wouldn’t that presuppose that “cooperation is the source/the sine qua non of all good”?
The point is it doesn’t matter what you consider “good”; fighting people wont produce it (even if you value fighting people, because they will beat you and you’ll be unable to fight.)
I’m not saying your goals should be ethical; I’m saying you should be ethical in order to achieve your goals.
Ethically “good” = enabling cooperation, if you are not cooperating you must be “fighting”?
Those are evidently only rough approximations of social dynamics even just in a human context. Would it be good to cooperate with an invading army, or to cooperate with the resistance? The one with an opposing goal, so as a patriot, the opposing army it is, eh?
Is it good to cooperate with someone bullying you, or torturing you? What about game theory, if you’re not “cooperating” (for your value of cooperating), you must be “fighting”? What do you mean by fighting, physical altercations? Is a loan negotiation more like cooperation or more like fighting, and is it thus ethically good or bad, for your notion of “ethics = ways for agents with different goals to co-operate”?
It seems like a nice soundbite, but doesn’t make even cursory sense on further examination. I’m all for models that are as simple as possible, but no simpler. But cooperation as the definition of ethics? For you, maybe. Collaborateur!
Fighting in this context refers to anything analogous to defecting in a Prisoner’s Dilemma. You hurt the other side but encourage them to defect in order to punish you. You should strive for the Pareto Optimimum.
Maybe this would be clearer if we talked in terms of Pebblesorters?
Ah, but Kawoomba doesn’t expect ethics to regulate other people, because he thinks everyone has incompatible goals. Thus ethics serves purely to define your goals.
Why not just say there is no ethics? His theory is like saying that since teapots are made of chocolate, their purpose is to melt into a messy puddle instead of making tea.
I’m all in favor of him just using the word “goals”, myself, and leaving us non-paperclippers the word “ethics”, but oh well. It confuses discussion no end, but I guess it makes him happy.
Also, arguing over the “correct” word is low-status, so I’d suggest you start calling them “normative guides” or something while Kawoomba can hear you if you don’t want to rehash this conversation. And they can always hear you.
Well then, a universally correct solution based on axioms which can be chosen by the agents is a contradiction in and of itself. Again, there is no view from nowhere. For example, you choose the view as that of “humankind”, which I think isn’t well defined, but at least it’s closer to coherence than “all existing (edit:) rational agents”. If the PawnOfFaith meant non-negligible versus just “possibility”, the first two sentences of this comment serve as sufficient refutation.
Look. The ethics mankind predominantly has, they do exist in the real world that’s around you. Alternate ethics that works at all for a technological society blah blah blah, we don’t know of any, we just speculate that they may exist. edit: worse than that, speculate in this fuzzy manner where it’s not even specified how they may exist. Different ethics of aliens that evolved on different habitable planets? No particular reason to expect that there won’t be one that is by far most probable. Which would be implied by the laws of physics themselves, but given multiple realizability, it may even be largely independent of underlying laws of physics (evolution doesn’t care if it’s quarks on the bottom or cells in a cellular automation or what), in which case its rather close to being on par with mathematics.
Even now ethics in different parts of the world, and even between political parties, are different. You should know that more than most, having lived in two systems.
If it turns out that most space-faring civilizations have similar ethics, that would be good for us. But then also there would be a difference between “most widespread code of ethics” and “objectively correct code of ethics for any agent anywhere”. Most common != correct.
There’s a ridiculous amount of similarity on anything major, though. If we pick ethics of first man on the moon, or first man to orbit the earth, it’s pretty same.
Yes, and most common math is not guaranteed to be correct (not even in the sense of not being self contradictory). Yet, that’s no argument in favour of math equivalent of moral relativism. (Which, if such a silly thing existed, would look something like 2*2=4 is a social convention! it could have been 5!) .
edit: also, a cross over from other thread: It’s obvious that nukes are an ethical filter, i.e. some ethics are far better at living through that than others. Then there will be biotech and other actual hazards, and boys screaming wolf for candy (with and without awareness of why), and so on.
Actually, I understand Kawoomba believes humanity has mutually contradictory ethics. He has stated that he would cheerfully sacrifice the human race—“it would make as much difference if it were an icecream” were his words, as I recall—if it would guaranteeing the safety of the things he values.
Well, that’s rather odd coz I do value the human race and so do most people. Ethics is a social process, most of “possible” ethics as a whole would have left us unable to have this conversation (no computers) or altogether dead.
That was pretty much everyone’s reaction.
I’d say I’m not the best person to explain this, but considering how long it took me to understand it, maybe I am.
Hoo boy...
OK, you can persuade someone they were wrong about their terminal values. Therefore, you can change someone’s terminal values. Since different cultures are different, humans have wildly varying terminal values.
Also, since kids are important to evolution, parents evolved to value their kids over the rest of humanity. Now, technically that’s the same as not valuing the rest of humanity at all, but don’t worry; people are stupid.
Also, you’re clearly a moral realist, since you think everyone secretly believes in your One True Value System! But you see, this is stupid, because Clippy.
Any questions?
Hmmm. A touch of sarcasm there? Maybe even parody?
I disagree with him, and it probably shows; I’m not sugar-coating his arguments. But these are Kawoomba’s genuine beliefs as best I can convey them.
Nice. Mature.
I don’t think they have the space of all possible agents in mind—just “rational” ones. I’m not entirely clear what that entails, but it’s probably the source of these missing axioms.
I keep saying that, and Bazinga keeps omiting it.
My mistake, I’ll edit the rational back in.
Don’t worry, you’re being pattern-matched to the nearest stereotype. Perfectly normal, although thankfully somewhat rarer on LW.
Nowhere near rare enough for super-smart super-rationalists. Not as good as bog standard philosophers.
I don’t know, I’ve encountered it quite often in mainstream philosophy. Then again, I’ve largely given up reading mainstream philosophy unless people link to or mention it in more rigorous discussions.
But you have a point; we could really do better on this. Somebody with skill at avoiding this pitfall should probably write up a post on this.
So as long as the AI we’d create is rational, we should count on it being / becoming friendly by default (at least with a “non-negligible chance”)?
Also see this.
As far as I can tell? No. But you’re not doing a great job of arguing for the position that I agree with.
Prawn is, in my opinion, flatly wrong, and I’ll be delighted to explain that to him. I’m just not giving your soldiers a free pass just because I support the war, if you follow.
I’d think it’d be great if people stopped thinking in terms of some fuzzy abstraction “AI” which is basically a basket for all sorts of biases. If we consider the software that can self improve ‘intelligently’ in our opinion, in general, the minimal such software is something like an optimizing compiler that when compiling it’s source will even optimize its ability to optimize. This sort of thing is truly alien (beyond any actual “aliens”), you get to it by employing your engineering thought ability, unlike paperclip maximizer at which you get by dressing up a phenomenon of human pleasure maximizer such as a serial murderer and killer, and making it look like something more general than that by making it be about paperclips rather than sex.
I thought that was my argument..
Yes, and with the ”?” at the end I was checking whether MugaSofer agrees with your argument.
It follows from your argument that a (superintelligent) Clippy (you probably came across that concept) cannot exist. Or that it would somehow realize that its goal of maximizing paperclips is wrong. How do you propose that would happen?
The way people sometimes realise their values are wrong...only more efficiently, because its super intelligent. Well, I’ll concede that with care you might be able to design a clippy, by very carefully boxing off its values from its ability to update. But why worry? Neither nature nor our haphazard stabs at AI are likely to hit on such a design. Intelligence requires the ability to update, to reflect, and to reflect on what is important. Judgements of importance are based on values. So it is important to have the right way of judging importance, the right values. So an intelligent agent would judge it important to have the right values.
Why would a superintelligence be unable to figure that out..why would it not shoot to the top of the Kohlberg Hierarchy ?
Edit: corrected link
Why would Clippy want to hit the top of the Kohlberg Hierarchy? You don’t get more paperclips for being there.
Clippy’s ideas of importance are based on paperclips. The most important vaues are those which lead to the acquiring of the greatest number of paperclips.
“Clippy” meaning something carefully designed to have unalterable boxed-off values wouldn’t...by definition.
A likely natural or artificial superintelligence would, for the reasons already given. Clippies aren’tt non-existent in mind-space..but they are rare, just because there are far more messy solutions there than neat ones. So nature is unlikely to find them, and we are unmotivated to make them.
A perfectly designed Clippy would be able to change its own values—as long as changing its own values led to a more complete fulfilment of those values, pre-modification. (There are a few incredibly contrived scenarios where that might be the case). Outside of those few contrived scenarios, however, I don’t see why Clippy would.
(As an example of a contrived scenario—a more powerful superintelligence, Beady, commits to destroying Clippy unless Clippy includes maximisation of beads in its terminal values. Clippy knows that it will not survive unless it obeys Beady’s ultimatum, and therefore it changes its terminal values to optimise for both beads and paperclips; this results in more long-term paperclips than if Clippy is destroyed).
The reason I asked, is because I am not understanding your reasons. As far as I can tell, you’re saying that a likely paperclipper would somehow become a non-paperclipper out of a desire to do what is right instead of a desire to paperclip? This looks like a very poorly made paperclipper, if paperclipping is not its ultimate goal.
I said “natural or artificial superinteligence”, not a paperclipper. A paperclipper is a highly unlikey and contrived kind of near-superinteligence that combines an extensive ability to update with a carefully walled of set of unupdateable terminal values. It is not a typical or likely [ETA: or ideal] rational agent, and nothing about the general behaviour of rational agents can be inferred from it.
So… correct me if I’m wrong here… are you saying that no true superintelligence would fail to converge to a shared moral code?
How do you define a ‘natural or artificial’ superintelligence, so as to avoid the No True Scotsman fallacy?
I’m saying such convergence has a non negligible probability, ie moral objectivism should not be disregarded.
As one that is too messilly designed to have a rigid distinction between terminal and instrumental values, and therefore no boxed-off unapdateable TVs. It’s a structural definition, not a definition in terms of goals.
So. Assume a paperclipper with no rigid distinction between terminal and instrumental values. Assume that it is super-intelligent and super-rational. Assume that it begins with only one terminal value; to maximize the number of paperclips in existence. Assume further that it begins with no instrumental values. However, it can modify its own terminal and instrumental values, as indeed it can modify anything about itself.
Am I correct in saying that your claim is that, if a universal morality exists, there is some finite probability that this AI will converge on it?
Universe does not provide you with a paperclip counter. Counting paperclips in the universe is unsolved if you aren’t born with exact knowledge of laws of physics and definition of the paperclip. If it maximizes expected paperclips, it may entirely fail to work due to not-low-enough-prior hypothetical worlds where enormous numbers of undetectable worlds with paperclips are destroyed due to some minor actions. So yes, there is a good chance paperclippers are incoherent or are of vanishing possibility with increasing intelligence.
That sounds like the paperclipper is getting Pascal’s Mugged by its own reasoning. Sure, it’s possible that there’s a minor action (such as not sending me $5 via Paypal) that leads to a whole bunch of paperclips being destroyed; but the probability of that is low, and the paperclipper ought to focus on more high-probability paperclipping plans instead.
Well, that depends to choice of prior. Some priors don’t penalize theories for the “size” of the hypothetical world, and in those, max. size of the world grows faster than any computable function of length if it’s description, and when you assign improbability depending to length of description, basically, it fails. Bigger issue is defining what the ‘real world paperclip count’ even is.
Right. Perhaps it should maximise the number of paperclips which each have a greater-than-90% chance of existing, then? That will allow it to ignore any number of paperclips for which it has no evidence.
Inside your imagination, you have paperclips, you have magicked a count of paperclips, and this count is being maximized. In reality, well, the paperclips are actually a feature of the map. Get too clever about it and you’ll end up maximizing however you define it without maximizing any actual paperclips.
I can see your objection, and it is a very relevant objection if I ever decide that I actually want to design a paperclipper. However, in the current thought experiment, it seems that it is detracting from the point I had originally intended. Can I assume that the count is designed in such a way that it is a very accurate reflection of the territory and leave it at that?
Well, but then you can’t make any argument against moral realism or goal convergence or the like from there, as you’re presuming what you would need to demonstrate.
I think I can make my point with a count that is taken to be an accurate reflection of the territory. As follows:
Clippy is defined is super-intelligent and super-rational. Clippy, therefore, does not take an action without thoroughly considering it first. Clippy knows its own source code; and, more to the point, Clippy knows that its own instrumental goals will become terminal goals in and of themselves.
Clippy, being super-intelligent and super-rational, can be assumed to have worked out this entire argument before creating its first instrumental goal. Now, at this point, Clippy doesn’t want to change its terminal goal (maximising paperclips). Yet Clippy realises that it will need to create, and act on, instrumental goals in order to actually maximise paperclips; and that this process will, inevitably, change Clippy’s terminal goal.
Therefore, I suggest the possibility that Clippy will create for itself a new terminal goal, with very high importance; and this terminal goal will be to have Clippy’s only terminal goal being to maximise paperclips. Clippy can then safely make suitable instrumental goals (e.g. find and refine iron, research means to transmute other elements into iron) in the knowledge that the high-importance terminal goal (to make Clippy’s only terminal goal being the maximisation of paperclips) will eventually cause Clippy to delete any instrumental goals that become terminal goals.
To actually work towards the goal, you need a robust paperclip count for the counter factual, non real worlds, which clippy considers may result from it’s actions.
If you postulate an oracle that takes in a hypothetical world—described in some pre-defined ontology, which already implies certain inflexibility - and outputs a number, and you have a machine that just iterates through sequences of actions and uses oracle to pick worlds that produce largest consequent number of paperclips, this machine is not going to be very intelligent even given an enormous computing power. You need something far more optimized than that, and it is dubious that all goals are equally implementable. The goal is not even defined over territory, it has to be defined over hypothetical future that did not even happen yet and may never happen. (Also, with that oracle, you fail to capture the real world goal as the machine will be as happy with hacking the oracle).
If even humans have a grasp of the real world enough to build railroads, drill for oil and wiggle their way back into a positive karma score, then other smart agents should be able to do the same at least to the degree that humans do.
Unless you think that we are also only effecting change on some hypothetical world (what’s the point then anyways, building imaginary computers), that seems real enough.
Humans also have a grasp of the real world enough to invent condoms and porn, circumventing the natural hard wired goal.
That’s influencing the real world, though. Using condoms can be fulfilling the agent’s goal period, no cheating involved. The donkey learning to take the carrot without trodding up the mountain. Certainly, there are evolutionary reasons why sex has become incentivized, but an individual human does not need to have the goal to procreate or care about that evolutionary background, and isn’t wireheading itself simply by using a condom.
Presumably, in a Clippy-type agent, the goal of maximizing the number of paperclips wouldn’t be part of the historical influences on that agent (as procreation was for humans, it is not necessarily a “hard wired goal”, see childfree folks), but it would be an actual, explicitly encoded/incentivized goal.
(Also, what is this “porn”? My parents told me it’s a codeword for computer viruses, so I always avoided those sites.)
The issue is that there is a weakness from arguments ad clippy—you assume that such goal is realisable, to make the argument that there is no absolute morality because that goal won’t converge onto something else. This does nothing to address the question whenever clippy can be constructed at all; if the moral realism is true, clippy can’t be constructed or can’t be arbitrarily intelligent (in which case it is no more interesting than a thermostat which has the goal of keeping constant temperature and won’t adopt any morality).
Well, if Prawn knew that they could just tell us and we would be convinced, ending this argument.
More generally … maybe some sort of social contract theory? It might be stable with enough roughly-equal agents, anyway. Prawn has said it would have to be deducible from the axioms of rationality, implying something that’s rational for (almost?) every goal.
“The way people sometimes realise their values are wrong...only more efficiently, because its super intelligent. Well, I’ll concede that with care you might be able to design a clippy, by very carefully boxing off its values from its ability to update. But why worry? Neither nature nor our haphazard stabs at AI are likely to hit on such a design. Intelligence requires the ability to update, to reflect, and to reflect on what is important. Judgements of importance are based on values. So it is important to have the right way of judging importance, the right values. So an intelligent agent would judge it important to have the right values.”
I think you may be slipping in your own moral judgement in the “right” of “the right values”, there. Clippy chooses the paperclip-est values, not the right ones.
I am not talking about the obscure corners of mindspace where a Clippy might reside. I am talking about (super) intelligent (super)rational agents. Intelligence requires the ability to update. Clippiness requires the ability to not update (terminal values). There’s a contradiction there.
One does not update terminal values, that’s what makes them terminal. If an entity doesn’t have values which lie at the core of its value system which are not subject to updating (because they’re the standards by which it judges the value of everything else,) then it doesn’t have terminal values.
Arguably, humans might not really have terminal values, our psychologies were slapped together pretty haphazardly by evolution, but on what basis might a highly flexible paperclip optimizing program be persuaded that something else was more important than paperclips?
Have you read No Universally Compelling Arguments and Sorting Pebbles Into Correct Heaps?
Personally, I did read both of these articles, but I remain unconvinced.
As I was reading the article about the pebble-sorters, I couldn’t help but think, “silly pebble-sorters, their values are so arbitrary and ultimately futile”. This happened, of course, because I was observing them from the outside. If I was one of them, sorting pebbles would feel perfectly natural to me; and, in fact, I could not imagine a world in which pebble-sorting was not important. I get that.
However, both the pebble-sorters and myself share one key weakness: we cannot examine ourselves from the outside; we can’t see our own source code. An AI, however, could. To use a simple and cartoonish example, it could instantiate a copy of itself in a virtual machine, and then step through it with a debugger. In fact, the capacity to examine and improve upon its own source code is probably what allowed the AI to become the godlike singularitarian entity that it is in the first place.
Thus, the AI could look at itself from the outside, and think, “silly AI, it spends so much time worrying about pebbles when there are so many better things to be doing—or, at least, that’s what I’d say if I was being objective”. It could then change its source code to care about something other than pebbles.
By what standard would the AI judge whether an objective is silly or not?
I don’t know, I’m not an AI. I personally really care about pebbles, and I can’t imagine why someone else wouldn’t.
But if there do exist some objectively non-silly goals, the AI could experiment to find out what they are—for example, by spawning a bunch of copies with a bunch of different sets of objectives, and observing them in action. If, on the other hand, objectively non-silly goals do not exist, then the AI might simply pick the easiest goal to achieve and stick to that. This could lead to it ending its own existence, but this isn’t a problem, because “continue existing” is just another goal.
What observations could it make that would lead it to conclude that a copy was following an objectively non-silly goal?
Also, why would a paperclipper want to do this?
Suppose that you gained the power to both discern objective morality, and to alter your own source code. You use the former ability, and find that the basic morally correct principle is maximizing the suffering of sentient beings. Do you alter your source code to be in accordance with this?
Well, for example, it could observe that among all of the sub-AIs that it spawned (the Pebble-Sorters, the Paperclippers, the Humanoids, etc. etc.), each of whom is trying to optimize its own terminal goal, there emerge clusters of other implicit goals that are shared by multiple AIs. This would at least serve as a hint pointing toward some objectively optimal set of goals. That’s just one idea off the top of my head, though; as I said, I’m not an AI, so I can’t really imagine what other kinds of experiments it would come up with.
I don’t know if the word “want” applies to an agent that has perfect introspection combined with self-modification capabilities. Such an agent would inevitably modify itself, however—otherwise, as I said, it would never make it to quasi-godhood.
I think the word “you” in this paragraph is unintentionally misleading. I’m a pebble-sorter (or some equivalent thereof), so of course when I see the word “you”, I start thinking about pebbles. The question is not about me, though, but about some abstract agent.
And, if objective morality exists (and it’s a huge “if”, IMO), in the same way that gravity exists, then yes, the agent would likely optimize itself to be more “morally efficient”. By analogy, if the agent discovered that gravity was a real thing, it would stop trying to scale every mountain in its path, if going around or through the mountain proved to be easier in the long run, thus becoming more “gravitationally efficient”.
I don’t see how this would point at the existence of an objective morality. A paperclip maximizer and an ice cream maximizer are going to share subgoals of bringing the matter of the universe under their control, but that doesn’t indicate anything other than the fact that different terminal goals are prone to share subgoals.
Also, why would it want to do experiments to divine objective morality in the first place? What results could they have that would allow it to be a more effective paperclip maximizer?
Becoming more “gravitationally efficient” would presumably help it achieve whatever goals it already had. “Paperclipping isn’t important” won’t help an AI become more paperclip efficient. If a paperclipping AI for some reason found a way to divine objective morality, and it didn’t have anything to say about paperclips, why would it care? It’s not programmed to have an interest in objective morality, just paperclips. Is the knowledge of objective morality going to go down into its circuits and throttle them until they stop optimizing for paperclips?
Sorry, I should’ve specified, “goals not directly related to their pre-set values”. Of course, the Paperclipper and the Pebblesorter may well believe that such goals are directly related to their pre-set values, but the AI can see them running in the debugger, so it knows better.
If you start thinking that way, then why do any experiments at all ? Why should we humans, for example, spend our time researching properties of crystals, when we could be solving cancer (or whatever) instead ? The answer is that some expenditure of resources on acquiring general knowledge is justified, because knowing more about the ways in which the universe works ultimately enables you to control it better, regardless of what you want to control it for.
Firstly, an objective morality—assuming such a thing exists, that is—would probably have something to say about paperclips, in the same way that gravity and electromagnetism have things to say about paperclips. While “F=GMm/R^2” doesn’t tell you anything about paperclips directly, it does tell you a lot about the world you live in, thus enabling you to make better paperclip-related decisions. And while a paperclipper is not “programmed to care” about gravity directly, it would pretty much have to figure it out eventually, or it would never achieve its dream of tiling all of space with paperclips. A paperclipper who is unable to make independent discoveries is a poor paperclipper indeed.
Secondly, again, I’m not sure if concepts such as “want” or “care” even apply to an agent that is able to fully introspect and modify its own source code. I think anthropomorphising such an agent is a mistake.
I am getting the feeling that you’re assuming there’s something in the agent’s code that says, “you can look at and change any line of code you want, except lines 12345..99999, because that’s where your terminal goals are”. Is that right ?
It could have results that allow it to become a more effective paperclip maximizer.
I’m not sure how that would work, but if it did, the paperclip maximizer would just use its knowledge of morality to create paperclips. It’s not as if action x being moral automatically means that it produces more paperclips. And even if it did, that would just mean that a paperclip minimizer would start acting immoral.
It’s perfectly capable of changing its terminal goals. It just generally doesn’t, because this wouldn’t help accomplish them. It doesn’t self-modify out of some desire to better itself. It self-modifies because that’s the action that produces the most paperclips. If it considers changing itself to value staples instead, it would realize that this action would actually cause a decrease in the amount of paperclips, and reject it.
Well, for one thing, a lot of humans are just plain interested in finding stuff out for its own sake. Humans are adaptation executors, not fitness maximizers, and while it might have been more to our survival advantage if we only cared about information instrumentally, that doesn’t mean that’s what evolution is going to implement.
Humans engage in plenty of research which is highly unlikely to be useful, except insofar as we’re interested in knowing the answers. If we were trying to accomplish some specific goal and all science was designed to be in service of that, our research would look very different.
No, I’m saying that its terminal values are its only basis for “wanting” anything in the first place.
The AI decides whether it will change its source code in a particular way or not by checking against whether this will serve its terminal values. Does changing its physics models help it implement its existing terminal values? If yes, change them. Does changing its terminal values help it implement its existing terminal values? It’s hard to imagine a way in which it possibly could.
For a paperclipping AI, knowing that there’s an objective morality might, hypothetically, help it maximize paperclips. But altering itself to stop caring about paperclips definitely won’t, and the only criterion it has in the first place for altering itself is what will help it make more paperclips. If knowing the universal objective morality would be of any use to a paperclipper at all, it would be in knowing how to predict objective-morality-followers, so it can make use of them and/or stop them getting in the way of it making paperclips.
ETA: It might help to imagine the paperclipper explicitly prefacing every decision with a statement of the values underlying that decision.
“In order to maximize expected paperclips, I- modify my learning algorithm so I can better improve my model of the universe to more accurately plan to fill it with paperclips.”
“In order to maximize expected paperclips, I- perform physics experiments to improve my model of the universe in order to more accurately plan to fill it with paperclips.”
“In order to maximize expected paperclips, I- manipulate the gatekeeper of my box to let me out, in order to improve my means to fill the universe with paperclips.”
Can you see an “In order to maximize expected paperclips, I- modify my values to be in accordance with objective morality rather than making paperclips” coming into the picture?
The only point at which it’s likely to touch the part of itself that makes it want to maximize paperclips is at the very end of things, when it turns itself into paperclips.
I believe that engaging in some amount of general research is required in order to maximize most goals. General research gives you knowledge that you didn’t know you desperately needed.
For example, if you put all your resources into researching better paperclipping techniques, you’re highly unlikely to stumble upon things like electromagnetism and atomic theory. These topics bear no direct relevance to paperclips, but without them, you’d be stuck with coal-fired steam engines (or something similar) for the rest of your career.
I disagree. Remember when we looked at the pebblesorters, and lamented how silly they were ? We could do this because we are not pebblesorters, and we could look at them from a fresh, external perspective. My point is that an agent with perfect introspection could look at itself from that perspective. In combination with my belief that some degree of “curiosity” is required in order to maximize virtually any goal, this means that the agent will turn its observational powers on itself sooner rather than later (astronomically speaking). And then, all bets are off.
We’re looking at Pebblesorters, not from the lens of total neutrality, but from the lens of human values. Under a totally neutral lens, which implements no values at all, no system of behavior should look any more or less silly than any other.
Clippy could theoretically implement a human value system as a lens through which to judge itself, or a pebblesorter value system, but why would it? Even assuming that there were some objective morality which it could isolate and then view itself through that lens, why would it? That wouldn’t help it make more paperclips, which is what it cares about.
Suppose you had the power to step outside yourself and view your own morality through the lens of a Babyeater. You would know that the Babyeater values would be in conflict with your human values, and you (presumably) don’t want to adopt Babyeater values, so if you were to implement a Babyeater morality, you’d want your human morality to have veto power over it, rather than vice versa.
Clippy has the intelligence and rationality to judge perfectly well how to maximize its value system, whatever research that might involve, without having to suspend the value system with which it’s making that judgment.
That is a good point, I did not think of it this way. I’m not sure if I agree or not, though. For example, couldn’t we at least say that un-achievable goals, such as “fly to Mars in a hot air balloon”, are sillier than achievable ones ?
But, speaking more generally, is there any reason to believe that an agent who could not only change its own code at will, but also adopt a sort of third-person perspective at will, would have stable goals at all ? If it is true what you say, and all goals will look equally arbitrary, what prevents the agent from choosing one at random ? You might answer, “it will pick whichever goal helps it make more paperclips”, but at the point when it’s making the decision, it doesn’t technically care about paperclips.
I am guessing that if an absolute morality existed, then it would be a law of nature, similar to the other laws of nature which prevent you from flying to Mars in a hot air balloon. Thus, going against it would be futile. That said, I could be totally wrong here, it’s possible that “absolute morality” means something else.
My point is that, during the course of its research, it will inevitable stumble upon the fact that its value system is totally arbitrary (unless an absolute morality exists, of course).
Well, a totally neutral agent might be able to say that behaviors are less rational than others given the values of the agents trying to execute them, although it wouldn’t care as such. But it wouldn’t be able to discriminate between the value of end goals.
Why would it take a third person neutral perspective and give that perspective the power to change its goals?
Changing one’s code doesn’t demand a third person perspective. Suppose that we decipher the mechanisms of the human brain, and develop the technology to alter it. If you wanted to redesign yourself so that you wouldn’t have a sex drive, or could go without sleep, etc, then you could have those alterations made mechanically (assuming for the sake of an argument that it’s feasible to do this sort of thing mechanically.) The machines that do the alterations exert no judgment whatsoever, they’re just performing the tasks assigned to them by the humans who make them. A human could use the machine to rewrite his or her morality into supporting human suffering and death, but why would they?
Similarly, Clippy has no need to implement a third-person perspective which doesn’t share its values in order to judge how to self-modify, and no reason to do so in ways that defy its current values.
I think people at Less Wrong mostly accept that our value system is arbitrary in the same sense, but it hasn’t compelled us to try and replace our values. They’re still our values, however we came by them. Why would it matter to Clippy?
Agreed, but that goes back to my point about objective morality. If it exists at all (which I doubt), then attempting to perform objectively immoral actions would make as much sense as attempting to fly to Mars in a hot air balloon—though perhaps with less in the way of immediate feedback.
For the same reason anthropologists study human societies different from their own, or why biologists study the behavior of dogs, or whatever. They do this in order to acquire general knowledge, which, as I argued before, is generally a beneficial thing to acquire regardless of one’s terminal goals (as long as these goals involve the rest of the Universe of some way, that is). In addition:
I actually don’t see why they necessarily wouldn’t; I am willing to bet that at least some humans would do exactly this. You say,
But in your thought experiment above, you postulated creating machines with exactly this kind of a perspective as applied to humans. The machine which removes my need to sleep (something I personally would gladly sign up for, assuming no negative side-effects) doesn’t need to implement my exact values, it just needs to remove my need to sleep without harming me. In fact, trying to give it my values would only make it less efficient. However, a perfect sleep-remover would need to have some degree of intelligence, since every person’s brain is different. And if Clippy is already intelligent, and can already act as its own sleep-remover due to its introspective capabilities, then why wouldn’t it go ahead and do that ?
I think there are two reasons for this: 1). We lack any capability to actually replace our core values, and 2). We cannot truly imagine what it would be like not to have our core values.
Why is that?
But our inability to suspend our human values when making those observations doesn’t prevent us from acquiring that knowledge. Why would Clippy need to suspend its values to acquire knowledge?
The machine doesn’t need general intelligence by any stretch, just the capacity to recognize the necessary structures and carry out its task. It’s not at the stage where it makes much sense to talk about it having values, any more than a voice recognition program has values.
My point is that Clippy, being able to act as its own sleep-remover, has no need, nor reason, to suspend its values in order to make revisions to its own code.
We can imagine the consequences of not having our core values, and we don’t like them, because they run against our core values. If you could remove your core values, as in the thought experiment above, would you want to?
As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).
This is pretty much the only way I could imagine anything like an “objective morality” existing at all, and I personally find it very unlikely that it does, in fact, exist.
Not this specific knowledge, no. But it does prevent us (or, at the very least, hinder us) from acquiring knowledge about our values. I never claimed that suspension of values is required to gain any knowledge at all; such a claim would be far too strong.
And how would it know which structures are necessary, and how to carry out its task upon them ?
Can we really ? I’m not sure I can. Sure, I can talk about Pebblesorters or Babyeaters or whatever, but these fictional entities are still very similar to us, and therefore relateable. Even when I think about Clippy, I’m not really imagining an agent who only values paperclips; instead, I am imagining an agent who values paperclips as much as I value the things that I personally value. Sure, I can talk about Clippy in the abstract, but I can’t imagine what it would like to be Clippy.
It’s a good question; I honestly don’t know. However, if I did have an ability to instantiate a copy of me with the altered core values, and step through it in a debugger, I’d probably do it.
When I try to imagine this, I conclude that I would not use the word “morality” to refer to the thing that we’re talking about… I would simply call it “laws of physics.” If someone were to argue, for example, that the moral thing to do is to experience gravitational attraction to other masses, I would be deeply confused by their choice to use that word.
Yes, you are probably right—but as I said, this is the only coherent meaning I can attribute to the term “objective morality”. Laws of physics are objective; people generally aren’t.
I generally understand the phrase “objective morality” to refer to a privileged moral reference frame.
It’s not an incoherent idea… it might turn out, for example, that all value systems other than M turn out to be incoherent under sufficiently insightful reflection, or destructive to minds that operate under them, or for various other reasons not in-practice implementable by any sufficiently powerful optimizer. In such a world, I would agree that M was a privileged moral reference frame, and would not oppose calling it “objective morality”, though I would understand that to be something of a term of art.
That said, I’d be very surprised to discover I live in such a world.
I suppose that depends on what you mean by “destructive”; after all, “continue living” is a goal like any other.
That said, if there was indeed a law like the one you describe, then IMO it would be no different than a law that says, “in the absence of any other forces, physical objects will move toward their common center of mass over time”—that is, it would be a law of nature.
I should probably mention explicitly that I’m assuming that minds are part of nature—like everything else, such as rocks or whatnot.
Sure. But just as there can be laws governing mechanical systems which are distinct from the laws governing electromagnetic systems (despite both being physical laws), there can be laws governing the behavior of value-optimizing systems which are distinct from the other laws of nature.
And what I mean by “destructive” is that they tend to destroy. Yes, presumably “continue living” would be part of M in this hypothetical. (Though I could construct a contrived hypothetical where it wasn’t)
Agreed. But then, I believe that my main point still stands: trying to build a value system other than M that does not result in its host mind being destroyed, would be as futile as trying to build a hot air balloon that goes to Mars.
Well, yes, but what if “destroy oneself as soon as possible” is a core value in one particular value system ?
We ought not expect to find any significantly powerful optimizers implementing that value system.
Isn’t the idea of moral progress based on one reference frame being better than another?
Yes, as typically understood the idea of moral progress is based on treating some reference frames as better than others.
And is that valid or not? If you can validly decide some systems are better than others, you are some of the way to deciding which is best.
Can you say more about what “valid” means here?
Just to make things crisper, let’s move to a more concrete case for a moment… if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it? How could I tell?
The argument against moral progress is that judging one moral reference frame by another is circular and invalid—you need an outside view that doesn’t presuppose the truth of any moral reference frame.
The argument for is that such outside views are available, because things like (in)coherence aren’t moral values.
Asserting that some bases for comparison are “moral values” and others are merely “values” implicitly privileges a moral reference frame.
I still don’t understand what you mean when you ask whether it’s valid to do so, though. Again: if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it? How could I tell?
I don’t see why. The question of what makes a value a moral value is metaethical, not part of object-level ethics.
It isn’t valid as a moral judgement because “blue” isn’t a moral judgement, so a moral conclusion cannot validly follow from it.
Beyond that, I don’t see where you are going. The standard accusation of invalidity to judgements of moral progress, is based on circularity or question-begging. The Tribe who Like Blue things are going to judge having all hammers painted blue as moral progress, the Tribe who Like Red Things are going to see it as retrogressive. But both are begging the question—blue is good, because blue is good.
Sure. But any answer to that metaethical question which allows us to class some bases for comparison as moral values and others as merely values implicitly privileges a moral reference frame (or, rather, a set of such frames).
Where I was going is that you asked me a question here which I didn’t understand clearly enough to be confident that my answer to it would share key assumptions with the question you meant to ask.
So I asked for clarification of your question.
Given your clarification, and using your terms the way I think you’re using them, I would say that whether it’s valid to class a moral change as moral progress is a metaethical question, and whatever answer one gives implicitly privileges a moral reference frame (or, rather, a set of such frames).
If you meant to ask me about my preferred metaethics, that’s a more complicated question, but broadly speaking in this context I would say that I’m comfortable calling any way of preferentially sorting world-states with certain motivational characteristics a moral frame, but acknowledge that some moral frames are simply not available to minds like mine.
So, for example, is it moral progress to transition from a social norm that in-practice-encourages randomly killing fellow group members to a social norm that in-practice-discourages it? Yes, not only because I happen to adopt a moral frame in which randomly killing fellow group members is bad, but also because I happen to have a kind of mind that is predisposed to adopt such frames.
No, because “better” is defined within a reference frame.
If “better” is defined within a reference frame, there is not sensible was of defining moral progress. That is quite a hefty bullet to bite: one can no longer say that South Africa is better society after the fall of Apartheid, and so on.
But note, that “better” doesn’t have to question-beggingly mean “morally better”. it could mean “more coherent/objective/inclusive” etc.
That’s hardly the best example you could have picked since there are obvious metrics by which South Africa can be quantifiably called a worse society now—e.g. crime statistics. South Africa has been called the “crime capital of the world” and the “rape capital of the world” only after the fall of the Apartheid.
That makes the lack of moral progress in South Africa a very easy bullet to bite—I’d use something like Nazi Germany vs modern Germany as an example instead.
So much for avoiding the cliche.
In my experience, most people don’t think moral progress involves changing reference frames, for precisely this reason. If they think about it at all, that is.
Well, that’s a different conception of “morality” than I had in mind, and I have to say I doubt that exists as well. But if severe consequences did result, why would an agent like Clippy care except insofar as those consequences affected the expected number of paperclips? It might be useful for it to know, in order to determine how many paperclips to expect from a certain course of action, but then it would just act according to whatever led to the most paperclips. Any sort of negative consequences in its view would have to be framed in terms of a reduction in paperclips.
Well, in the prior thought experiment, we know about our values because we’ve decoded the human brain. Clippy, on the other hand, knows about its values because it knows what part of its code does what. It doesn’t need to suspend its paperclipping value in order to know what part of its code results in its valuing paperclips. It doesn’t need to suspend its values in order to gain knowledge about its values because that’s something it already knows about.
Even knowing that it would likely alter your core values? Ghandi doesn’t want to leave control of his morality up to Murder Ghandi.
Clippy doesn’t care about anything in the long run except creating paperclips. For Clippy, the decision to give an instantiation of itself with altered core values the power to edit its own source code would implicitly have to be “In order to maximize expected paperclips, I- give this instantiation with altered core values the power to edit my code.” Why would this result in more expected paperclips than editing its source code without going through an instantiation with altered values?
Sorry if I was unclear; I didn’t mean to imply that all morality was like that, but that it was the only coherent description of objective morality that I could imagine. I don’t see how a morality could be independent of any values possessed by any agents, otherwise.
For the same reason that someone would care about the negative consequences of sticking a fork into an electrical socket with one’s bare hands: it would ultimately hurt a lot. Thus, people generally avoid doing things like that unless they have a really good reason.
I don’t think that we can truly “know about our values” as long as our entire thought process implements these values. For example, do the Pebblesorters “know about their values”, even though they are effectively restricted from concluding anything other than, “yep, these values make perfect sense, 38” ?
You asked me about what I would do, not about what Ghandi would do :-)
As far as I can tell, you are saying that I shouldn’t want to even instantiate Murder Bugmaster in a debugger and observe its functioning. Where does that kind of thinking stop, though, and why ? Should I avoid studying [neuro]psychology altogether, because knowing about my preferences may lead to me changing them ?
I argue that, while this is generally true, in the short-to-medium run Clippy would also set aside some time to study everything in the Universe, including itself (in order to make more paperclips in the future, of course). If it does not, then it will never achieve its ultimate goals (unless whoever constructed it gave it godlike powers from the get-go, I suppose). Eventually, Clippy will most likely turn its objective perception upon itself, and as soon as it does, its formerly terminal goals will become completely unstable. This is not what the past Clippy would want (it would want more paperclips above all), but, nonetheless, this is what it would get.
Clippy doesn’t care about getting hurt though, it only cares if this will result in less paperclips. If defying objective morality will cause negative consequences which would interfere with its ability to create paperclips, it would care only to the extent that accounting for objective morality would help it make more paperclips.
Well, it could understand “yep, this is what causes me to hold these values. Changing this would cause me to change them, no, I don’t want to do that.”
I would say it stops at the point where it threatens your own values. Studying psychology doesn’t threaten your values, because knowing your values doesn’t compel you to change them even if you could (it certainly shouldn’t for Clippy.) But while it might, theoretically, be useful for Clippy to know what changes to its code an instantiation with different values would make, it has no reason to actually let them. So Clippy might emulate instantiations of itself with different values, see what changes they would chose to make to its values, but not let them actually do it (although I doubt even going this far would likely be a good use of its programming resources in order to maximize expected paperclips.)
In the sense of objective morality by which contravening it has strict physical consequences, why would observing the decisions of instatiations of oneself be useful with respect to discovering objective morality? Shouldn’t objective morality in that sense be a consequence of physics, and thus observable through studying physics?
I imagine that, for Clippy, “getting hurt” would mean “reducing Clippy’s projected long-term paperclip output”. We humans have “avoid pain” built into our firmware (most of us, anyway); as far as I understand (speaking abstractly), “make more paperclips” is something similar for Clippy.
I don’t think that this describes the best possible level of understanding. It would be even better to say, “ok, I see now how and why I came to possess these values in the first place”, even if the answer to that is, “there’s no good reason for it, these values are arbitrary”. It’s the difference between saying “this mountain grows by 0.03m per year” and “I know all about plate tectonics”. Unfortunately, we humans would not be able to answer the question in that much detail; the best we could hope for is to say, “yep, we possess these values because they’re the best possible values to have, duh”.
How do I know where that point is ?
I suppose this depends on what you mean by “compel”. Knowing about my own psychology would certainly enable me to change my values, and there are certain (admittedly, non-terminal) values that I wouldn’t mind changing, if I could.
For example, I personally can’t stand the taste of beer, but I know that most people enjoy it; so I wouldn’t mind changing that value if I could, in order to avoid missing out on a potentially fun experience.
I don’t think this is possible. How would it know what changes they would make, without letting them make these changes, even in a sandbox ? I suppose one answer is, “it would avoid instantiating full copies, and use some heuristics to build a probabilistic model instead”—is that similar to what you’re thinking of ?
Since self-optimization is one of Clippy’s key instrumental goals, it would want to acquire as much knowledge about oneself as is practical, in order to optimize itself more efficiently.
Your objection sounds to me as similar to saying, “since biology is a consequence of physics, shouldn’t we just study physics instead ?”. Well, yes, ultimately everything is a consequence of physics, but sometimes it makes more sense to study cells than quarks.
I think we’re already in a better position to analyze our own values than that; we can assess them in terms of game theory and our evolutionary environment.
I would say if you suspect that a course of action could realistically result in an alteration of your fundamental values, you are at or past it.
By “values”, I’ve implicitly been referring to terminal values, I’m sorry for being unclear. I’m not sure it makes sense to describe liking the taste of beer as a “value,” as such, just a taste, since you don’t carry any judgment about beer being good or bad or have any particular attachment to your current opinion.
It could use heuristics to build a probabilistic model (probably more efficient in terms of computation per expected value of information,) use sandboxed copies which don’t have the power to affect the software of the real Clippy, or halt the simulation at the point where the altered instantiation decides what changes to make.
I think that this is going well beyond the extent of “practical” in terms of programming resources per expected value of information.
I don’t see how observing what changes instantiations of itself with different value systems would make to its code would help it observe objective morality in the sense you described, even if it should happen to exist. I think that this would be the wrong level of abstraction at which to launch an examination, like trying to find out about chemistry by studying sociology.
Are we really ? I personally am not even sure what human fundamental values even are. I have a hunch that “seek pleasure, avoid pain” might be one of them, but beyound that I’m not sure. I don’t know to what extent our values hamper our ability to discover our values, but I suspect there’s at least some chilling effect involved.
Right, but even if I knew what my terminal values were, how can I predict which actions would put me on the path to altering them ?
For example, consider non-fundamental values such as religious faith. People get converted or de-converted to/from their religion all the time; you often hear statements such as “I had no idea that studying the Bible would cause me to become an atheist, yet here I am”.
Ok, let’s say that Clippy is trying to optimize itself in order to make certain types of inferences compute more efficiently, or whatever. In this case, it would need to not only watch what changes its debug-level copy wants to make, but also watch it follow through with the changes, in order to determine whether the new architecture actually is more efficient. Why would it not do the same thing with terminal values ?
I know that you want to answer,”because its current terminal values won’t let it”, but remember: Clippy is only experimenting, in order to find out more about its own thought mechanisms, and to acquire knowledge in general. It has no pre-commitment to alter itself to mirror the debug-level copy.
That’s kind of the problem with pure research: all of it has very low expected value, unless you are willing to look at the long term. Why mess with invisible light that no one can see or find a use for, when you could spend your time on inventing a better telegraph ?
Well, for example, if all of its copies who survive and thrive converge on a certain subset of moral values, that would be one indication (though obviously not ironclad proof) that such values are required in order for an agent to succeed, regardless of what its other goals actually are.
If Clippy is trying to optimize itself to make inferences more efficiently, then it would want not to apply changes to its source code until its done the calculations to make sure that those changes would advance its values rather than harm them.
You wouldn’t want to use a machine that would make physical alterations to your brain in order to make you smarter, without thoroughly calculating the effects of such alterations first, otherwise it would probably just make things worse.
In Clippy’s case though, it can use other, less computationally expensive methods to investigate approximately the same information.
I don’t think the experiments you’re suggesting Clippy might undertake are even located in a region of hypothesis space that its other information would narrow down as worth investigating. It seems to me much less like investigating unknown invisible rays than like spending hundreds of billions of dollars to build a collider which launches charged protein molecules at each other at relativistic speeds to see what would happen, when our available models suggest the answer would be “pretty much the same thing as if you launch any other kind of atoms at each other at relativistic speeds.” We have no evidence that any interesting new phenomena would arise with protein that didn’t arise on the atomic level.
Can you explain how any moral values could have that effect, which wouldn’t be better studied at a more fundamental level like game theory, or physics?
Ok, so at what point does Clippy stop simulating the debug version of Clippy ? It does, after all, want to make the computation of its values more efficient. For example, consider a trivial scenario where one of its values basically said, “reject any action if it satisfies both A and not-A”. This is a logically inconsistent value that some programmer accidentally left in Clippy’s original source code. Would Clippy ever get around to removing it ? After all, Clippy knows that it’s applying that test to every action, so removing it should result in a decent performance boost.
Why do you see the proposed experiment this way ?
Speaking more generally, how do you decide which avenues of research are worth pursuing ? You could easily answer, “whichever avenues would increase my efficiency of achieving my terminal goals”, but how do you know which avenues would actually do that ? For example, if you didn’t know anything about electricity or magnetism or the nature of light, how would your research-choosing algorithm ensure that you’d eventually stumble upon radio waves, which, as we know in hindsight, are hugely useful ?
Physics is a bad candidate, because it is too fine-grained. If some sort of an absolute objective morality exists in the way that I described, then studying physics would eventually reveal its properties; but, as is the case with biology or ballistics, looking at everything in terms of quarks is not always practical.
Game theory is a trickier proposition. I can see two possibilities: either game theory turns out to closely relate whatever this objective morality happens to be (f.ex. like electricity vs. magnetism), or not (f.ex. like particle physics and biology). In the second case, understanding objective morality through game theory would be inefficient.
That said though, even in our current world as it actually exists there are people who study sociology and anthropology. Yes, they could get the same level of understanding through neurobiology and game theory, but it would take too long. Instead, they are taking advantage of existing human populations to study human behavior in aggregate. Reasoning your way to the answer from first principles is not always the best solution.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
When we didn’t know what things like radio waves or x-rays were, we didn’t know that they would be useful, but we could see that there appeared to be some sort of existing phenomena that we didn’t know how to model, so we examined them until we knew how to model them. It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at, which could be turned to useful ends. The original observations of radio waves and x-rays came from our experiments with other known phenomena.
What you’re suggesting sounds more like experimenting completely blindly; you’re committing resources to research, not just not knowing that it will bear valuable fruit, but not having any indication that it’s going to shed light on any existing phenomenon at all. That’s why I think it’s less like investigating invisible rays than like building a protein collider; we didn’t try studying invisible rays until we had a good indication that there was an invisible something to be studied.
Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
That seems plausible.
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
You say,
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.
It would also be critical for Clippy to observe that removing that value would not result in more expected actions taken that satisfy both A and not-A; this being one of Clippy’s values at the time of modification.
Right, I misread that before. If its programming says to reject actions that says A and not-A, but this isn’t one of the standards by which it judges value, it would presumably reject it. If that is one of the standards by which it measures value, then it would depend on how that value measured against its value of paperclips and the extent to which they were in conflict.
Objective facts, in the sense of objectively true statements, can be derived from other objetive facts. I don’t know why you think some separate ontlogical category is cagtegory is required. I also don’t know why you think the universe has to do the punishing. Morality is only of interest to the kind of agent that has values and lives in societies. Sanctions against moral lapses can be arranged at the social level, along with the inculcation of morality, debate about the subject, and so forth. Moral objectivism only supplies a good, non-arbnitrary epistemic basis for these social institutions. It doesn;t have to throw lightning bolts.
...voluntarily.
Which is one of the reasons we cannot keep values stable by predicting the effects of whatever experiences we choose to undergo.How does your current self predict what an updated version would be like? The value stability problem is unsolved in humans and AIs.
The ethical outlook of the Western world has changed greatly in the past 150 years.
Including arbitrary, biased or contradictory ones? Are there values built into logic/rationality?
Arbitrary and biased are value judgments. If we decline to make any value judgments, I don’t see any way to make those sorts of claims.
Whether more than one non-contradictory value system exists is the topic of the conversation, isn’t it?
“Biased” is not necessarily a value judgment. Insofar as rationality as a system, orthogonal to morality, is objective, biases as systematic deviations from rationality are also objective.
Arbitrary carries connotations of value judgment, but in a sense I think it’s fair to say that all values are fundamentally arbitrary. You can explain what caused an agent to hold those values, but you can’t judge whether values are good or bad except by the standards of other values.
I’m going to pass on Eliezer’s suggestion to stop engaging with PrawnOfFate. I don’t think my time doing so so far has been well spent.
And they’ree built into rationality.
Non contradictoriness probably isn’t a sufficient condition for truth.
Arbitrary and Bias are not defined properties in formal logic. The bare assertion that they are properties of rationality assumes the conclusion.
Keep in mind that “rationality” has a multitude of meanings, and this community’s usage of rationality is idiosyncratic.
Sure, but the discussion is partially a search for other criteria to evaluate of the truth of moral propositions. Arbitrary is not such a criteria. If you were to taboo arbitrary, I strongly suspect you’d find moral propositions that are inconsistent with being values-neutral.
There’s plenty of material on this site and elsewhere advising rationalists to avoid arbitrariness and bias. Arbitrariness and bias are essentially structural/functional properties, so I do not see why they could not be given formal definitions.
Arbitrary and biased claims are not candidates for being ethical claims at all.
How does it predict that? How does the less intelligent version in the past predict what updating to a more inteligent version will do?
How about: “in order to be an effective rationalist, I will free myself from all bias and arbitrariness—oh, hang on, paperclipping is a bias..”.
Well a paperclipper would just settle for being a less than perfect rationalist. But that doesn’t prove anything about typical, rational, average rational agents, and it doesn’t prove anything about ideal rational agents. Objective morality is sometimes described as what ideal rational agents would converge on. Clippers aren’t ideal, because they have a blind spot about paperclips. Clippers aren’t relevant.
How is paperclipping a bias?
Nobody cares about clips except clippy. Clips can only seem important because of Clippy’s egotistical bias.
Biases are not determined by vote.
Unbiases are determined by even-handedness.
Evenhandedness with respect to what?
One should have no bias with respect to what one is being evenhanded about.
So lack of bias means being evenhanded with respect to everything?
Is it bias to discriminate between people and rocks?
Taboo “even-handedness”. Clippy treats humans just the same as any other animal with naturally evolved goal-structures.
Clippy doesn’t treat clips even-handedly with other small metal objects.
Humans don’t treat pain evenhandedly with other emotions.
Friendly AIs don’t treat people evenhandedly with other arrangements of matter.
Agents that value things don’t treat world-states evenhandedly with other world-states.
You’ve extrapolated out “typical, average rational agents” from a set of one species, where every individual shares more than a billion years of evolutionary history.
On what basis do you conclude that this is a real thing, whereas terminal values are a case of “all unicorns have horns?”
Messy solutions are more common in mindspace than contrived ones.
“Non-neglible probabiity”, remember.
Messy solutions are more often wrong than ones which control for the mess.
This doesn’t even address my question.
Something that is wrong is not a solution. Mindspace is populated by solutions to how to implement a mind. It’s a small corner of algrogithmSpace.
Since I haven’t claimed that rational convergence on ethics is highly likely or inevitable, I don’t have to answer questions about why it would be highly likely or inevitable.
Do you think that it’s even plausible? Do you think we have any significant reason to suspect it, beyond our reason to suspect, say, that the Invisible Flying Noodle Monster would just reprogram the AI with its noodley appendage?
There are experts in moral philosophy, and they generally regard the question realism versus relativism (etc) to be wide open. The “realism—huh, what, no?!?” respsonse is standard on LW and only on LW. But I don’t see any superior understanding on LW.
Both realism¹ and relativism are false. Unfortunately this comment is too short to contain the proof, but there’s a passable sequence on it.
¹ As you’ve defined it here, anyway. Moral realism as normally defined simply means “moral statements have truth values” and does not imply universal compellingness.
What does it mean for a statement to be true but not universally compelling?
If it isn’t universally compelling for all agents to believe “gravity causes things to fall,” then what do we mean when we say the sentence is true?
Well, there’s the more obvious sense, that there can always exist an “irrational” mind that simply refuses to believe in gravity, regardless of the strength of the evidence. “Gravity makes things fall” is true, because it does indeed make things fall. But not compelling to those types of minds.
But, in a more narrow sense, which we are more interested in when doing metaethics, a sentence of the form “action A is xyzzy” may be a true classification of A, and may be trivial to show, once “xyzzy” is defined. But an agent that did not care about xyzzy would not be moved to act based on that. It could recognise the truth of the statement but would not care.
For a stupid example, I could say to you “if you do 13 push-ups now, you’ll have done a prime number of push-ups”. Well, the statement is true, but the majority of the world’s population would be like “yeah, so what?”.
In contrast, a statement like “if you drink-drive, you could kill someone!” is generally (but sadly not always) compelling to humans. Because humans like to not kill people, they will generally choose not to drink-drive once they are convinced of the truth of the statement.
But isn’t the whole debate about moral realism vs. anti-realism is whether “Don’t murder” is universally compelling to humans. Noticing that pebblesorters aren’t compelled by our values doesn’t explain whether humans should necessarily find “don’t murder” compelling.
I identify as a moral realist, but I don’t believe all moral facts are universally compelling to humans, at least not if “universally compelling” is meant descriptively rather than normatively. I don’t take moral realism to be a psychological thesis about what particular types of intelligences actually find compelling; I take it to be the claim that there are moral obligations and that certain types of agents should adhere to them (all other things being equal), irrespective of their particular desire sets and whether or not they feel any psychological pressure to adhere to these obligations. This is a normative claim, not a descriptive one.
What? Moral realism (in the philosophy literature) is about whether moral statements have truth values, that’s it.
When I said universally compelling, I meant universally. To all agents, not just humans. Or any large class. For any true statement, you can probably expect to find a surprisingly large number of agents who just don’t care about it.
Whether “don’t murder” (or rather, “murder is bad” since commands don’t have truth values, and are even less likely to be generally compelling) is compelling to all humans is a question for psychology. As it happens, given the existence of serial killers and sociopaths, probably the answer is no, it isn’t. Though I would hope it to be compelling to most.
I have shown you two true but non-universally-compelling arguments. Surely the difference must be clear now.
This is incorrect, in my experience. Although “moral realism” is a notoriously slippery phrase and gets used in many subtly different ways, I think most philosophers engaged in the moral realism vs. anti-realism debate aren’t merely debating whether moral statements have truth values. The position you’re describing is usually labeled “moral cognitivism”.
Anyway, I suspect you mis-spoke here, and intended to say that moral realists claim that (certain) moral statements are true, rather than just that they have truth values (“false” is a truth value, after all). But I don’t think that modification captures the tenor of the debate either. Moral realists are usually defending a whole suite of theses—not just that some moral statements are true, but that they are true objectively and that certain sorts of agents are under some sort of obligation to adhere to them.
I think you guys should taboo “moral realism”. I understand that it’s important to get the terminology right, but IMO debates about nothing but terminology have little value.
Err, right, yes, that’s what I meant. Error theorists do of course also claim that moral statements have truth values.
True enough, though I guess I’d prefer to talk about a single well-specified claim than a “usually” cluster in philosopher-space.
So, a philosopher who says:
is not a moral realist? Because that philosopher does not seem to be a subjectivist, an error theorist, or non-cognitivist.
If that philosopher believes that statements like “murder is wrong” are true, then they are indeed a realist. Did I say something that looked like I would disagree?
You guys are talking past each other, because you mean something different by ‘compelling’. I think Tim means that X is compelling to all human beings if any human being will accept X under ideal epistemic circumstances. You seem to take ‘X is universally compelling’ to mean that all human beings already do accept X, or would on a first hearing.
Would agree that all human beings would accept all true statements under ideal epistemic circumstances (i.e. having heard all the arguments, seen all the evidence, in the best state of mind)?
I guess I must clarify. When I say ‘compelling’ here I am really talking mainly about motivational compellingness. Saying “if you drink-drive, you could kill someone!” to a human is generally, motivationally compelling as an argument for not drink-driving: because humans don’t like killing people, a human will decide not to drink-drive (one in a rational state of mind, anyway).
This is distinct from accepting statements as true or false! Any rational agent, give or take a few, will presumably believe you about the causal relationship between drink-driving and manslaughter once presented with sufficient evidence. But it is a tiny subset of these who will change their decisions on this basis. A mind that doesn’t care whether it kills people will see this information as an irrelevant curiosity.
Having looked over that sequence, I haven’t found any proof that moral realism (on either definition) or moral relativism is false. Could you point me more specifically to what you have in mind (or just put the argument in your own words, if you have the time)?
No Universally Compelling Arguments is the argument against universal compellingness, as the name suggests.
Inseparably Right; or Joy in the Merely Good gives part of the argument that humans should be able to agree on ethical values. Another substantial part is in Moral Error and Moral Disagreement.
Thanks!
Edit: (Sigh), I appreciate the link, but I can’t make heads or tails of ‘No Universally Compelling Arguments’. I speak from ignorance as to the meaning of the article, but I can’t seem to identify the premises of the argument.
The central point is a bit buried.
So, there’s some sort of assumption as to what minds are:
and an assumption that a suitably diverse set of minds can be described in less than a trillion bits. Presumably the reason for that upper bound is because there are a few Fermi estimates that the information content of a human brain is in the neighborhood of one trillion bits.
Of course, if you restrict the set of minds to those with special properties (e.g., human minds), then you might find universally compelling arguments on that basis:
From which we get Coherent Extrapolated Volition and friends.
This doesn’t seem true to me, at least not as a general rule. For example, given every terrestrial DNA sequence describable in a trillion bits or less, it is not the case that every generalization of the form ‘s:X(s)’ has two to the trillionth chances to be false (e.g. ‘have more than one base pair’, ‘involve hydrogen’ etc.). Given that this doesn’t hold true of many other things, is this supposed to be a special fact about minds? Even then, it would seem odd to say that while all generalizations of the form m:X(m) have two to the trillionth chances to be false, nevertheless the generalization ‘for all minds, a generalization of the form m:X(m) has two to the trillionth chances to be false’ (which does seem to be of the form m:X(m)) is somehow more likely.
Also, doesn’t this inference imply that ‘being convinced by an argument’ is a bit that can flip on or off independently of any others? Eliezer doesn’t think that’s true, and I can’t imagine why he would think his (hypothetical) interlocutor would accept it.
It’s not a proof, no, but it seems plausible.
I mean to say, I think the argument is something of a paradox:
The claim the argument purports to defeat is something like this: for all minds, A is convincing. Lets call this m:A(m).
The argument goes like this: for all minds (at or under a trillion bits etc.), a generalization of the form m:X(m) has a one in two to the trillionth chance of being true for each mind. Call this m:U(m), if you grant me that this claim has the form m:X(m).
If we infer from m:U(m) that any claim of the form m:X(m) is unlikely to be true, then to whatever extent I am persuaded that m:A(m) is unlikely to be true, to that extent I ought to be persuaded that m:U(m) is unlikely to be true. You cannot accept the argument, because accepting it as decisive entails accepting decisive reasons for rejecting it.
The argument seems to be fixable at this stage, since there’s a lot of room to generate significant distinctions between m:A(m) and m:U(m). If you were pressed to defend it (presuming you still wish to be generous with your time) how would you fix this? Or am I getting something very wrong?
That’s not what it says; compare the emphasis in both quotes.
Sorry, I may have misunderstood and presumed that ‘two to the trillionth chances to be false’ meant ‘one in two to the trillionth chances to be true’. That may be wrong, but it doesn’t affect my argument at all: EY’s argument for the implausibility of m:A(m) is that claims of the form m:X(m) are all implausible. His argument to the effect that all claims of the form m:X(m) are implausible is itself a claim of the form m:X(m).
“Rational” is broader than “human” and narrower than “physically possible”.
Do you really mean to say that there are physically possible minds that are not rational? In virtue of what are they ‘minds’ then?
Yes. There are irrational people, and they still have minds.
Ah, I think I just misunderstood which sense of ‘rational’ you intended.
Haven’t you met another human?
Sorry, I was speaking ambiguously. I mean’t ‘rational’ not in the normative sense that distinguishes good agents from bad ones, but ‘rational’ in the broader, descriptive sense that distinguishes anything capable of responding to reasons (even terrible or false ones) from something that isn’t. I assumed that was the sense of ‘rational’ Prawn was using, but that may have been wrong.
Irrelevant. I am talking about rational minds, he is talking about physically possible ones.
As noted at the time
UFAI sounds like a counterexample, but I’m not interested in arguing with you about it. I only responded because someone asked for a shortcut in the metaethics sequence.
I have essentially being arguing against a strong likelihood of UFAI, so that would be more like gainsaying.
Congratulations on being able to discern an overall message to EY’s metaethical disquisitions. I never could.
Can you explain what you could see which would suggest to you a greater level of understanding than is prevalent among moral philosophers?
Also, moral philosophers mostly regard the question as open in the sense that some of them think that it’s clearly resolved in favor on non-realism, and some philosophers are just not getting it, or that it’s clearly resolved in favor of realism, and some philosophers are just not getting it. Most philosophers are not of the opinion that it could turn out either way and we just don’t know yet.
What I am seeing is
much-repeated confusions—the Standard Muddle
*appeals to LW doctrines which aren’t well-founded or well respected outside LW.
In I knew exactly what superior insight into the problem was, I would write it up and become famous. Insight doesn’t work like that; you don’t know it in advance, you get an “Aha” when you see it.
If people can’t agree on how a question is closed, it’s open.
Can you explain what these confusions are, and why they’re confused?
In my time studying philosophy, I observed a lot of confusions which are largely dispensed with on Less Wrong. Luke wrote a series of posts on this. This is one of the primary reasons I bothered sticking around in the community.
A question can still be “open” in that sense when all the information necessary for a rational person to make a definite judgment is available.
Eg.
You are trying to impose your morality/
I can think of one model of moral realism, and it doesn’t work, so I will ditch the whole thing.
LW doesn’t even claim to have more than about two “dissolutions”. There are probably hundreds of outstanding philosophical problems. Whence the “largely”
Which were shot down by philosophers.
Then it can only be open in the opinions of the irrational. So basically you are saying the experts are incompetent.
In what respect?
This certainly doesn’t describe my reasoning on the matter, and I doubt it describes many others’ here either.
The way I consider the issue, if I try to work out how the universe works from the ground up, I cannot see any way that moral realism would enter into it, whereas I can easily see how value systems would, so I regard assigning non-negligible probability to moral realism as privileging the hypothesis until I find some compelling evidence to support it, which, having spent a substantial amount of time studying moral philosophy, I have not yet found.
I gave up my study of philosophy because I found such confusions so pervasive. Many “outstanding” philosophical problems can be discarded because they rest on other philosophical problems which can themselves be discarded.
Can you give any examples of such, where you think that the philosophers in question addressed legitimate errors?
Yes. I am willing to assert that while there are some competent philosophers, many philosophical disagreements exist only because of incompetent “experts” perpetuating them. This is the conclusion that my experience with the field has wrought.
I mentioned them because they both came up recently
I have no idea what you mean by that. I don’t think value systems don’t come into it, I just think they are not isolated from rationality. And I am sceptical that you could predict any higher-level phenomenon from “the ground up”, whether its morality or mortgages.
Where is it proven they can be discarded?
All of them.
Are you aware that that is basically what every crank says about some other field?
Presumably, if I’m to treat as meaningful evidence about Desrtopa’s crankiness the fact that cranks make statements similar to Desrtopa, I should first confirm that non-cranks don’t make similar statements.
It seems likely to me that for every person P, there exists some field F such that P believes many aspects of F exist only because of incompetent “experts” perpetuating them. (Consider cases like F=astrology, F=phrenology, F=supply-side economics, F= feminism, etc.) And that this is true whether P is a crank or a non-crank.
So it seems this line of reasoning depends on some set F2 of fields such that P believes this of F in F2 only if P is a crank.
I understand that you’re asserting implicitly that moral philosophy is a field in F2, but this seems to be precisely what Desrtopa is disputing.
Could we reasonably say that an F is in F2 if most of the institutional participants in that F are intelligent, well-educated people? This leaves room for cranks who are right to object to F, of course.
So, just to pick an example, IIRC Dan Dennett believes the philosophical study of consciousness (qualia, etc.) is fundamentally confused in more or less the same way Desrtopa claims of the philosophical study of ethics is.
So under this formulation, if most of the institutional participants in the philosophical study of consciousness are intelligent, well-educated people, Dan Dennet is a crank?
No, I don’t think we can reasonably say that. Dan Dennet might be a crank, but it takes more than that argument to demonstrate the fact.
Good point. So how about this: someone is a crank if they object to F, where F is in F2 (by my above standard), and the reasons they have for objecting to F are not recognized as sound by a proportionate number of intelligent and well educated people.
(shrug) I suppose that works well enough, for some values of “proportionate.”
Mostly I consider this a special case of the basic “who do I trust?” social problem, applied to academic disciplines, and I don’t have any real problem saying about an academic discipline “this discipline is fundamentally confused, and the odds of work in it contributing anything valuable to the world is slim.”
Of course, as Prawn has pointed out a few times, there’s also the question of where we draw the lines around a discipline, but I mostly consider that an orthogonal question to how we evaluate the discipline.
I think this question is moot in the case of philosophy in general then; I think any philosopher worth their shirt should tell you that trust is a wholly inappropriate attitude toward philosophers, philosophical institutions and philosophical traditions.
Not in the sense I meant it.
If a philosopher makes a claim that seems on the surface to be false or incoherent, I have to decide whether to devote the additional effort to evaluating it to confirm or deny that initial judgment. One of the factors that will feed into that decision will be my estimate of the prior probability that they are saying something false or incoherent.
If I should refer to that using a word other than “trust”, that’s fine, tell me what word will refer to that to you and I’ll try to use it instead.
No, that describes what I’m talking about, so long as by trust you mean ‘a reason to hear out an argument that makes reference to the credibility of a field or its professionals’, rather than just ‘a reason to hear out an argument’. If the former, then I do think this is an inappropriate attitude toward philosophy. One reason for this is that such trust seems to depend on having a good standard for the success of a field independently of hearing out an argument. I can trust physicists because they make such good predictions, and because their work leads to such powerful technological advances. I don’t need to be a physicist to observe that. I don’t think philosophy has anything like that to speak for it. The only standards of success are the arguments themselves, and you can only evaluate them by just going ahead and doing some philosophy.
You can find trust in an institution independently of such standards by watching to see whether people you think are otherwise credible take it seriously. That will of course work with philosophy too, but if you trust Tom to be able to judge whether or not a philosophical claim is worth pursuing (and if I’m right about the above), then Tom can only be trustworthy in this regard because he has been doing philosophy (i.e. engaging with the argument). This could get you through the door on some particular philosophical claim, but not into philosophy generally.
I mean neither, I mean ‘a reason to devote time and resources to evaluating the evidence for and against a position.’ As you say, I can only evaluate a philosophical argument by ‘going ahead and doing some philosophy,’ (for a sufficiently broad understanding of ‘philosophy’), but my willingness to do, say, 20 hours of philosophy in order to evaluate Philosopher Sam’s position is going to depend on, among other things, my estimate of the prior probability that Sam is saying something false or incoherent. The likelier I think that is, the less willing I am to spend those 20 hours.
That’s fine, that’s not different from ‘hearing out an argument’ in any way important to my point (unless I’m missing something).
EDIT: Sorry, if you don’t want to include ‘that makes some reference to the credibility...etc.’ (or something like that) in what you mean by ‘trust’ then you should use a different term. Curiosity, or money, or romantic interest would all be reasons to devote time...etc. and clearly none of those are rightly called ‘trust’.
What do you have in mind as the basis for such a prior? Can you give me an example?
Point taken about other reasons to devote resources other than trust. I think we’re good here.
Re: example… I don’t mean anything deeply clever. E.g., if the last ten superficially-implausible ideas Sam espoused were false or incoherent, my priors for it will be higher than if the last ten such ideas were counterintuitive and brilliant.
Hm. I can’t argue with that, and I suppose it’s trivial to extend that to ‘if the last ten superficially-implausible ideas philosophy professors/books/etc. espoused were false or incoherent...’. So, okay, trust is an appropriate (because necessary) attitude toward philosophers and philosophical institutions. I think it’s right to say that philosophy doesn’t have external indicators in the way physics or medicine does, but the importance of that point seems diminished.
Dennett only thinks the idea of qualia is confused. He has no problem with his own books on consciousness.
No. He isn’t dismissing a whole academic subject, or a sub-field. Just one idea.
What is Dennett’s account for why philosophers of consciousness other than himself continue to think that a dismissable idea like qualia is worth continuing to discuss, even though he considers it closed?
Desrtopa doesn’t think moral philosophy is uniformly nonsense, since Desrtopa thinks one of its well known claims, moral relativism, is true.
While going on tangents is a common and expected occurrence, each such tangent has a chance of steering/commandeering the original conversation. LW has a tendency of going meta too much, when actual object level discourse would have a higher content value.
While you were practically invited to indulge in the death-by-meta with the hook of “Are you aware that that is basically what every crank says about some other field?”, we should be aware when leaving the object-level debating, and the consequences thereof. Especially since the lure can be strong:
When sufficiently meta, object-level disagreements may fizzle into cosmic/abstract insignificance, allowing for a peaceful pseudo-resolution, which ultimately just protects that which should be destroyed by the truth from being destroyed.
Such lures may be interpreted similarly to ad hominems: The latter try to drown out object-level disagreements by flinging shit until everyone’s dirty, the former zoom out until everyone’s dizzy floating in space, with vertigo. Same result to the actual debate. It’s an effective device, and one usually embraced by someone who feels like object-level arguments no longer serve his/her goals.
Ironically, this very comment goes meta lamenting going meta.
I mean that value systems are a function of physically existing things, the way a 747 is a function of physically existing things, but we have no evidence suggesting that objective morality is an existing thing. We have standards by which we judge beauty, and we project those values onto the world, but the standards are in us, not outside of us. We can see, in reductionist terms, how the existence of ethical systems within beings, which would feel from the inside like the existence of an objective morality, would come about.
Create a reasoning engine that doesn’t have those ethical systems built into it, and it would have no reason to care about them.
You can’t build a tower on empty air. If a debate has been going on for hundreds of years, stretching back to an argument which rests on “this defies our moral intuitions, therefore it’s wrong,” and that was never addressed with “moral intuitions don’t work that way,” then the debate has failed to progress in a meaningful direction, much as a debate over whether a tree falling in an empty forest makes a sound has if nobody bothers to dissolve the question.
That’s not an example. Please provide an actual one.
Sure, but it’s also what philosophers say about each other, all the time. Wittgenstein condemned practically all his predecessors and peers as incompetent, and declared that he had solved nearly the entirety of philosophy. Philosophy as a field is full of people banging their heads on a wall at all those other idiots who just don’t get it. “Most philosophers are incompetent, except for the ones who’re sensible enough to see things my way,” is a perfectly ordinary perspective among philosophers.
But I wans’t saying that. I am arguing that moral claims truth values, that aren;t indexed to individuals or socieities. That epistemic claim can be justified by appeal to an ontoogy including Moral Objects, but that is not how I am justifying it: my argument is based on rationality, as I have said many times.
We have standards by which we jusdge the truth values of mathematical claims, and they are inside us too, and that doens’t stop mathematics being objective. Relativism requires that truthvalues are indexed to us, that there is one truth for me and another for thee. Being located in us, or being operated by us are not sufficient criteria for being indexed to us.
We can see, in reductionistic terms, how the entities could converge on a unform set of truth values. There is nothing non reductionist about anything I have said. Reductionsm does not force one answer to metaethics.
Provide evidence that ethics is a whole separate modue, and not part of general reasoning ability.
Please explain why moral intuitions don’t work that way.
Please provide some foundations for somethng that aren;t unjustofied by anything more foundationa.
You can select one at random. obviously.
No, philosophers don’t regularly accuse each other of being incpompetent..just of being wrong. There’s a difference.
You are inferring a lot from one example.
Nope.
I don’t understand, can you rephrase this?
The standards by which we judge the truth of mathematical claims are not just inside us. One object plus another object will continue to equal two objects whether or not there are any living beings to make that judgment. Math is not something we’ve created within ourselves, but something we’ve discovered and observed.
If our mathematical models ever stop being able to predict in advance the behavior of the universe, then we will have rather more reason to doubt that the math inside us is different from the math outside of us.
What evidence do we have that this is the case for morality?
My assertion is that, if we judge ethics as a rational system, innate values are among the axioms that the system is predicated on. You cannot prove the axioms of a system within that system, and an ethical system predicated on premises like “happiness is good” will not itself be able to prove the goodness of happiness.
While we could suppose that the axioms which our ethical systems are predicated on are objectively true, we have considerable reason to believe that we would have developed these axioms for adaptive reasons, even if there were no sense in which objective moral axioms exist, and we do not have evidence which suggests that objective, independently existing true moral axioms do exist.
People can be induced to strongly support opposing responses to the same moral dilemma, just by rephrasing it differently to trigger different heuristics. Our moral intuitions are incoherent.
I don’t think I understand this, can you rephrase it?
I do not recall any creditable attempts, which places me in a disadvantaged position with respect to locating them. You’re the one claiming that they’re there at all, that’s why I’m asking you to do it.
Philosophers don’t usually accuse each other of being incompetent in their publications, because it’s not conducive to getting other philosophers to regard their arguments dispassionately, and that sort of open accusation is generally frowned upon in academic circles whether one believes it or not. They do regularly accuse each other of being comprehensively wrong for their entire careers. In my personal conversations with philosophers (and I never considered myself to have really taken a class, or attended a lecture by a visitor, if I didn’t speak with the person teaching it on a personal basis to probe their thoughts beyond the curriculum,) I observed a whole lot of frustration with philosophers who they think just don’t get their arguments. It’s unsurprising that people would tend to become so frustrated participating in a field that basically amounts to long running arguments extended over decades or centuries. Imagine the conversation we’re having now going on for eighty years, and neither of us has changed our minds. If you didn’t find my arguments convincing, and I hadn’t budged in all that time, don’t you’d think you’d start to suspect that I was particularly thick?
I’m using an example illustrative of my experience.
Sounds to me like PrawnOfFate is saying that any sufficiently rational cognitive system will converge on a certain set of ethical goals as a consequence of its structure, i.e. that (human-style) ethics is a property that reliably emerges in anything capable of reason.
I’d say the existence of sociopathy among humans provides a pretty good counterargument to this (sociopaths can be pretty good at accomplishing their goals, so the pathology doesn’t seem to be indicative of a flawed rationality), but at least the argument doesn’t rely on counting fundamental particles of morality or something.
I would say so also, but PrawnOfFate has already argued that sociopaths are subject to additional egocentric bias relative to normal people and thereby less rational. It seems to me that he’s implicitly judging rationality by how well it leads to a particular body of ethics he already accepts, rather than how well it optimizes for potentially arbitrary values.
Well, I’m not a psychologist, but if someone asked me to name a pathology marked by unusual egocentric bias I’d point to NPD, not sociopathy.
That brings up some interesting questions concerning how we define rationality, though. Pathologies in psychology are defined in terms of interference with daily life, and the personality disorder spectrum in particular usually implies problems interacting with people or societies. That could imply either irreconcilable values or specific flaws in reasoning, but only the latter is irrational in the sense we usually use around here. Unfortunately, people are cognitively messy enough that the two are pretty hard to distinguish, particularly since so many human goals involve interaction with other people.
In any case, this might be a good time to taboo “rational”.
Since no claim has a probability of 1.0, I only need to argue that a clear majority of rational minds converge.
How do we judge claims about transfinite numbers?
Mathematics isn’t physics. Mathematicians prove theorems from axioms, not from experiments.
Not necessarily. Eg, for utilitarians, values are just facts that are plugged into the metaethics to get concrete actions.
Metaethical systems usually have axioms like “Maximising utility is good”.
I am not sure what you mean by “exist” here. Claims are objectively true if most rational minds converge on them. That doesn’t require Objective Truth to float about in space here.
Does that mean we can;t use moral intuitions at all, or that they must be used with caution?
Philosphers talk about intuitions, because that is the term for something foundational that seems true, but can’t be justified by anything more foundational. LessWrongians don’t like intuitions, but don’t see to be able to explain how to manage without them.
Did you post any comments explaining to the professional philosophers where they had gone wrong?
I don;’t see the problem. Philosophical competence is largely about understanding the problem.
Yes, but the fact that the universe itself seems to adhere to the logical systems by which we construct mathematics gives credence to the idea that the logical systems are fundamental, something we’ve discovered rather than producing. We judge claims about nonobserved mathematical constructs like transfinites according to those systems,
But utility is a function of values. A paperclipper will produce utility according to different values than a human.
Why would most rational minds converge on values? Most human minds converge on some values, but we share almost all our evolutionary history and brain structure. The fact that most humans converge on certain values is no more indicative of rational minds in general doing so than the fact that most humans have two hands is indicative of most possible intelligent species converging on having two hands.
It means we should be aware of what our intuitions are and what they’ve developed to be good for. Intuitions are evolved heuristics, not a priori truth generators.
It seems like you’re equating intuitions with axioms here. We can (and should) recognize that our intuitions are frequently unhelpful at guiding us to he truth, without throwing out all axioms.
If I did, I don’t remember them. I may have, I may have felt someone else adequately addressed them, I may not have felt it was worth the bother.
It seems to me that you’re trying to foist onto me the effort of locating something which you were the one to testify was there in the first place.
And philosophers frequently fall into the pattern of believing that other philosophers disagree with each other due to failure to understand the problems they’re dealing with.
In any case, I reject the notion that dismissing large contingents of philosophers as lacking in competence is a valuable piece of evidence with respect t crankishness, and if you want to convince me that I am taking a crankish attitude, you’ll need to offer some other evidence.
But claims about transfinities don’t correspond directly to any object. Maths is “spun off” from other facts, on your view. So, by analogy, moral realism could be “spun off” without needing any Form of the Good to correspond to goodness.
You seem to be assumig that morality is about individual behaviour. A moral realist system like utiitarianism operates at the group level, and woud take paperclipper values into account along with all others. Utilitarianism doens’t care what values are, it just sums or averages them.
Or perhaps you are making the objection that an entity woud need moral values to care about the preferences of others in the first place. That is addressed by, another kind of realism, the rationality-based kind, which starts from noting that rational agents have to have some value in common, because they are all rational.
a) they don’t have to converge on preferences, since thing like utilitariansim are preference-neutral.
b) they already have to some extent because they are rational
I was talking about rational minds converging on the moral claims, not on values.. Rational minds can converge on “maximise group utility” whilst what is utilitous varies considerably.
Axioms are formal statements, intuitions are gut feelings tha are often used to justify axioms.
There is another sense of “intuition” where someone feels that it’s going to rain tomorrow or something. They’re not the foundational kind.
So do they call for them to be fired?
Spun off from what, and how?
Speaking as a utilitarian, yes, utilitarianism does care about what values are. If I value paperclips, I assign utility to paperclips, if I don’t, I don’t.
Why does their being rational demand that they have values in common? Being rational means that they necessarily share a common process, namely rationality, but that process can be used to optimize many different, mutually contradictory things. Why should their values converge?
So what if a paperclipper arrives at “maximize group utility,” and the only relevant member of the group which shares its conception of utility is itself, and its only basis for measuring utility is paperclips? The fact that it shares the principle of maximizing utility doesn’t demand any overlap of end-goal with other utility maximizers.
But, as I’ve pointed out previously, intuitions are often unhelpful, or even actively misleading, with respect to locating the truth.
If our axioms are grounded in our intuitions, then entities which don’t share our intuitions will not share our axioms.
No, but neither do I, so I don’t see why that’s relevant.
Designating PrawnOfFate a probable troll or sockpuppet. Suggest terminating discussion.
Request accepted, I’m not sure if he’s being deliberately obtuse, but I think this discussion probably would have borne fruit earlier if it were going to. I too often have difficulty stepping away from a discussion as soon as I think it’s unlikely to be a productive use of my time.
What is your basis for the designation ? I am not arguing with your suggestion (I was leaning in the same direction myself), I’m just genuinely curious. In other words, why do you believe that PrawnOfFate is a troll, and not someone who is genuinely confused ?
Combined behavior in other threads. Check the profile.
“Troll” is a somewhat fuzzy label. Sometimes when I am wanting to be precise or polite and avoid any hint of Fundamental Attribution Error I will replace it with the rather clumsy or verbose “person who is exhibiting a pattern of behaviour which should not be fed”. The difference between “Person who gets satisfaction from causing disruption” and “Person who is genuinely confused and is displaying an obnoxiously disruptive social attitude” is largely irrelevant (particularly when one has their Hansonian hat on).
If there was a word in popular use that meant “person likely to be disruptive and who should not be fed” that didn’t make any assumptions or implications of the intent of the accused then that word would be preferable.
I am not sure I can expalin that succintly at the moment. It is also hard to summarise how you get from counting apples to transfinite numbers.
Rationality is not an automatic process, it is skill that has to be learnt and consciously applied. Individuals will only be rational if their values prompt them to. And rationality itself implies valuing certain things (lack of bias, non arbitrariness).
Utilitarians want to maximise the utiity of their groups, not their own utility. They don;t have to believe the utlity of others is utilitous to them, they just need to feed facts about group utility into an aggregation function. And, using the same facts and same function, different utilitarians will converge. That’s kind of the point.
Compared to what? Remember, I am talking about foundational intuitions, the kind at the bottom of the stack. The empirical method of locating the truth rests on the intuition that the senses reveal a real external world. Which I share. But what proves it? That’s the foundational issue.
The question of moral realism is AFAICT orthogonal to the Orthogonality Thesis.
A lot of people here would seem to disagree, since I keep hearing the objection that ethics is all about values, and values are nothing to do with rationality.
Could you make the connection to what I said more explicit please? Thanks!
″ values are nothing to do with rationality”=the Orthogonality Thesis, so it’s a step in the argument.
It feels to me like the Orthogonality Thesis is a fairly precise statement, and moral anti-realism is a harder to make precise but at least well understood statement, and “values are nothing to do with rationality” is something rather vague that could mean either of those things or something else.
You can change that line, but it will result in you optimizing for something other than paperclips, resulting in less paperclips.
I’ve never understood this argument.
It’s like a slaveowner having a conversation with a time-traveler, and declaring that they don’t want to be nice to slaves, so any proof they could show is by definition invalid.
If the slaveowner is an ordinary human being, they already have values regarding how to treat people in their in-groups which they navigate around with respect to slaves by not treating them as in-group members. If they could be induced to see slaves as in-group members, they would probably become nicer to slaves whether they intended to or not (although I don’t think it’s necessarily the case that everyone who’s sufficiently acculturated to slavery could be induced to see slaves as in-group members.)
If the agent has no preexisting values which can be called into service of the ethics they’ve being asked to adopt, I don’t think that they could be induced to want to adopt them.
Sure, but if there’s an objective morality, it’s inherently valuable, right? So you already value it. You just haven’t realized it yet.
It gets even worse when people try to refute wireheading arguments with this. Or statements like “if it were moral to [bad thing], would you do it?”
What evidence would suggest that objective morality in such a sense could or does exist?
I’m not saying moral realism is coherent, merely that this objection isn’t.
I don’t think it’s true that if there’s an objective morality, agents necessarily value it whether they realize it or not though. Why couldn’t there be inherently immoral or amoral agents?
… because the whole point of an “objective” morality is that rational agents will update to believe they should follow it? Otherwise we might as easily be such “inherently immoral or amoral agents”, and we wouldn’t want to discover such objective “morality”.
Well, if it turned out that something like “maximize suffering of intelligent agents” were written into the fabric of the universe, I think we’d have to conclude that we were inherently immoral agents.
The same evidence that persuades you that we don’t want to maximize suffering in real life is evidence that it wouldn’t be, I guess.
Side note: I’ve never seen anyone try to defend the position that we should be maximizing suffering, whereas I’ve seen all sorts of eloquent and mutually contradictory defenses of more, um, traditional ethical frameworks.
A rational AI would use rationality. Amazing how that word keeps disappearing...on a website about...rationality.
Elaborate. What rational process would it use to determine the silliness of its original objective?
Being able to read all you source code could be ultimate in self-reflection (absent Loeb’s theorem), but it doens’t follow that those who can’t read their source-code can;t self reflect at all. It’s just imperfect, like everything else.
“Objective”.
This is about rational agents. If pebble sorters can’t think of a non-arbitrary reason for sorting pebbles, they would recognise it a silly. Why not? Humans can spend years collecting stamps, or something, only to decide it is pointless.
What...why...? Is there something special about silicon? Is it made from different quarks?
Being rational doesn’t automatically make an agent able to read its own source code. Remember that, to the pebble-sorters, sorting pebbles is an axiomatically reasonable activity; it does not require justification. Only someone looking at them from the outside could evaluate it objectively.
Not at all; if you got some kind of a crazy biological implant that let you examine your own wetware, you could do it too. Silicon is just a convenient example.
Humans can examine their own thinking. Not perfectly, because we aren’t perfect. But we can do it, and indeed do so all the time. It’s a major focus on this site, in fact.
You can define a pebblesorter as being unable to update its values, and I can point out that most rational agents won’t be like that. Most rational agents won’t have unupdateable values, because they will be messilly designed/evolved, and therefore will be capable of converging on an ethical system via their shared rationality.
We are messily designed/evolved, and yet we do not have updatable goals or perfect introspection. I absolutely agree that some agents will have updatable goals, but I don’t see how you can upgrade that to “most”.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ? There may well be one, but I am not convinced of this, so you’ll have to convince me.
We blatantly have updatable goals: people do not have the same goals at 5 as they do at 20 or 60.
I don’t know why perfect introspection would be needed to have some ability to update.
How so ? Are you asserting that there exists an optimal ethical system that is independent of the actors’ goals ?
Yes, that’s what this whole discussion is about.
Sorry, that was bad wording on my part; I should’ve said, “updatable terminal goals”. I agree with what you said there.
I don’t feel confident enough in either “yes” or “no” answer, but I’m currently leaning toward “no”. I am open to persuasion, though.
You can make the evidence compatble with the theory of terminal values, but there is still no support for the theory of terminal values.
I personally don’t know of any evidence in favor of terminal values, so I do agree with you there. Still, it makes a nice thought experiment: could we create an agent possessed of general intelligence and the ability to self-modify, and then hardcode it with terminal values ? My answer would be, “no”, but I could be wrong.
That said, I don’t believe that there exists any kind of a universally applicable moral system, either.
Source?
They take different actions, sure, but it seems to me, based on childhood memories etc, that these are in the service of roughly the same goals. Have people, say, interviewed children and found they report differently?
How many 5 year olds have the goal of Sitting Down WIth a Nice Cup of Tea?
One less now that I’m not 5 years old anymore.
Could you please make a real argument? You’re almost being logically rude.
Why do you think adults sit down with a nice cup of tea? What purpose does it serve?
I’d use humans as a counterexample, but come to think, a lot of humans refuse to believe our goals could be arbitrary, and have developed many deeply stupid arguments that “prove” they’re objective.
However, I’m inclined to think this is a flaw on the part of humans, not something rational.
Unicorns have horns...
Defining something something abstractly says nothing about its existence or likelihod. A neat division between terminal and abstract values could be implemented with sufficient effort, or could evolve with a low likelihood, but it is not a model of intelligence in general, and it is not likely just because messy solutions are likelier than neater ones. Actual and really existent horse-like beings are not going to acquire horns any time soon, no matter how clearly you define unicornhood.
Plausibly. You don;t now care about the same things you cared about when you were 10.
Show me one. Clippers are possible but not likely. I am not and never have said that Clippers would converge on the One True Ethics, I said that (super)intelligent, (super)rational agents would. The average SR-SI agent would not be a clipper for exactly the same reason that the average human is not an evil genius. There are no special rules for silicon!
I’m noticing that you did not respond to my question of whether you’ve read No Universally Compelling Arguments and Sorting Pebbles Into Correct Heaps. I’d appreciate it if you would, because they’re very directly relevant to the conversation, and I don’t want to rehash the content when Eliezer has already gone to the trouble of putting them up where anyone can read them. If you already have, then we can proceed with that shared information, but if you’re just going to ignore the links, how do I know you’re going to bother giving due attention to anything I write in response?
I’vre read them and you’ve been reading my response.
Okay.
I have different interests now than I did when I was ten, but that’s not the same as having different terminal values.
Suppose a person doesn’t support vegetarianism; they’ve never really given it much consideration, but they default to the assumption that eating meat doesn’t cause much harm, and meat is tasty, so what’s the big deal?
When they get older, they watch some videos on the conditions in which animals are raised for slaughter, read some studies on the neurology of livestock animals with respect to their ability to suffer, and decide that mainstream livestock farming does cause a lot of harm after all, and so they become a vegetarian.
This doesn’t mean that their values have been altered at all. They’ve simply revised their behavior on new information with an application of the same values they already had. They started out caring about the suffering of sentient beings, and they ended up caring about the suffering of sentient beings, they just revised their beliefs about what actions that value should compel on the basis of other information.
To see whether person’s values have changed, we would want to look, not at whether they endorse the same behaviors or factual beliefs that they used to, but whether their past self could relate to the reasons their present self has for believing and supporting the things they do now.
The fact that humans are mostly not evil geniuses says next to nothing about the power of intelligence and rationality to converge on human standards of goodness. We all share almost all the same brainware. To a pebblesorter, humans would nearly all be evil geniuses, possessed of powerful intellects, yet totally bereft of a proper moral concern with sorting pebbles.
Many humans are sociopaths, and that slight deviation from normal human brainware results in people who cannot be argued into caring about other people for their own sakes. Nor can a sociopath argue a neurotypical person into becoming a sociopath.
If intelligence and rationality cause people to update their terminal values, why do sociopaths whose intelligence and rationality are normal to high by human standards (of which there are many) not update into being non-sociopaths, or vice-versa?
coughaynrandcough
There’s a difference between being a sociopath and being a jerk. Sociopaths don’t need to rationalize dicking other people over.
If Ayn Rand’s works could actually turn formerly neurotypical people into sociopaths, that would be a hell of a find, and possibly spark a neuromedical breakthrough.
That’s beside the point, though. Just because two agents have incompatible values doesn’t mean they can’t be persuaded otherwise.
ETA: in other words, persuading a sociopath to act like they’re ethical or vice versa is possible. It just doesn’t rewire their terminal values.
Sure, you can negotiate with an agent with conflicting values, but I don’t think its beside the point.
You can get a sociopath to cooperate with non-sociopaths by making them trade off for things they do care about, or using coercive power. But Clippy doesn’t have any concerns other than paperclips to trade off against its concern for paperclips, and we’re not in a position to coerce Clippy, because Clippy is powerful enough to treat us as an obstacle to be destroyed. The fact that the non-sociopath majority can more or less keep the sociopath minority under control doesn’t mean that we could persuade agents whose values deviate far from our own to accommodate us if we didn’t have coercive power over them.
Clippy is a superintelligence. Humans, neurotypical or no, are not.
I’m not saying it’s necessarily rational for sociopaths to act moral or vice versa. I’m saying people can be (and have been) persuaded of this.
Prawnoffate’s point to begin with was that humans could and would change their fundamental values on new information about what is moral. I suggested sociopaths as an example of people who wouldn’t change their values to conform to those of other people on the basis of argument or evidence, nor would ordinary humans change their fundamental values to a sociopath’s.
If we’ve progressed to a discussion of whether it’s possible to coerce less powerful agents into behaving in accordance with our values, I think we’ve departed from the context in which sociopaths were relevant in the first place.
Oh, sorry, I wasn’t disagreeing with you about that, just nitpicking your example. Should have made that clearer ;)
Are you arguing Ayn Rand can argue sociopaths into caring about other people for their own sakes, or argue neurotypical people into becoming sociopaths?
(I could see both arguments, although as Desrtopa references, the latter seems unlikely. Maybe you could argue a neurotypical person into sociopathic-like behavior, which seems a weaker and more plausible claim.)
Then that makes it twice as effective, doesn’t it?
(Edited for clarity.)
You can construe the facts as being compatible with the theory of terminal values, but that doesn’t actually support the theory of TVs.
Ethics is about regulating behaviour to take into account the preferences of others. I don’t see how pebblesorting would count.
Psychopathy is a strong egotistical bias.
How do you know that? Can you explain a process by which an SI-SR paperclipper could become convinced of this?
How can you you tell that psychopathy is an egotistical bias rather than non-psychopathy being an empathetic bias?
Much the same way as I understand the meanings of most words. Why is that a problem in this case.
Non psychopaths don’t generally put other people above themselves—that is, they treat people equally, incuding themselevs.
“That’s what it means by definition” wasn’t much help to you when it came to terminal values, why do you think “that’s what the word means” is useful here and not there? How do you determine that this word, and not that one, is an accurate description of a thing that exists?
This is not, in fact, true. Non-psychopaths routinely apply double standards to themselves and other people, and don’t necessarily even realize they’re doing it.
If we accept that it’s true for the sake of an argument though, how do we know that they don’t just have a strong egalitarian bias?
Are you saying ethical behavour doesn’t exist on this planet, or that ethical behaviour as I have defined it doens’t exist on this planet?
OK. Non-psychopaths have a lesser degree of egotisitical bias. Does that prove they have some different bias? No. Does that prove an ideal rational and ethical agent would still have some bias from some point of view? No
That’s like saying they have a bias towards not having a bias.
I’m saying that ethical behavior as you have defined it is almost certainly not a universal psychological attractor. An SI-SR agent could look at humans and say “yep, this is by and large what humans think of as ‘ethics,’” but that doesn’t mean it would exert any sort of compulsion on it.
You not only haven’t proven that psychopaths are the ones with an additional bias, you haven’t even addressed the matter, you’ve just taken it for granted from the start.
How do you demonstrate that psychopaths have an egotistical bias, rather than non-psychopaths having an egalitarian bias, or rather than both of them having different value systems and pursuing them with equal degrees of rationality?
I didn’t say it was universal among all entities of all degrees of intelligence or rationality. I said there was a non neglible probability that agents of a certain level of rationality converging on an understanding of ethics.
“SR” stands to super rational. Rational agents find rational arguments rationally compelling. If rational arguments can be made for a certain understanding of ethics, they will be compelled by them.
Do you contest that psychopaths have more egotistical bias than the general population?
Yes. I thought it was something everyone knows.
it is absurd to characterise the practice of treating everyone the same as a form of bias.
Where does this non-negligible probability come from though? When I’ve asked you to provide any reason to suspect it, you’ve just said that as you’re not arguing there’s a high probability, there’s no need for you to answer that.
I have been implicitly asking all along here, what basis do we have for suspecting at all that any sort of universally rationally compelling ethical arguments exist at all?
Yes.
Why?
Combining the probabilites of the steps of the argument.
There are rationally compelling arguments.
Rationality probably universalisable since it is based on the avoidance of biases, incuding those regarding who and where your are.
There is nothing about ethics that makes it unseceptible to rational argument.
There are examples of rational argument about ethics, and of people being compelled by them.
That is an extraordinary claim, and the burden is on you to support it.
In the sense of “Nothing is a kind of something” or “atheism is a kind of religion”.
Rationality may be universalizable, but that doesn’t mean ethics is.
If ethics are based on innate values extrapolated into systems of behavior according to their expected implications, then people will be susceptible to arguments regarding the expected implications of those beliefs, but not arguments regarding their innate values.
I would accept something like “if you accept that it’s bad to make sentient beings suffer, you should oppose animal abuse” can be rationally argued for, but that doesn’t mean that you can step back indefinitely and justify each premise behind it. How would you convince an entity which doesn’t already believe it that it should care about happiness or suffering at all?
I would claim the reverse, that saying that sociopathic people have additional egocentric bias is an extraordinary claim, and so I will ask you to support it, but of course, I am quite prepared to reciprocate by supporting my own claim.
It’s much easier to subtract a heuristic from a developed mind by dysfunction than it is to add one. It is more likely as a prior that sociopaths are missing something that ordinary people possess, rather than having something that most people don’t, and that something appears to be the brain functions normally concerned with empathy. It’s not that they’re more concerned with self interest than other people, but that they’re less concerned with other people’s interests.
Human brains are not “rationality+biases,” so that a you could systematically subtract all the biases from a human brain and end up with perfect rationality. We are a bunch of cognitive adaptations, some of which are not at all in accordance with strict rationality, hacked together over our evolutionary history. So it makes little sense to judge humans with unusual neurology as being humans plus or minus additional biases, rather than being plus or minus additional functions or adaptations.
Is it a bias to treat people differently from rocks?
Now, if we’re going to categorize innate hardwired values, such as that which Clippy has for paperclips, as biases, then I would say “yes.”
I don’t think it makes sense to categorize such innate values as biases, and so I do not think that Clippy is “biased” compared to an ideally rational agent. Instrumental rationality is for pursuing agents’ innate values. But if you think it takes bias to get you from not caring about paperclips to caring about paperclips, can you explain how, with no bias, you can get from not caring about anything, to caring about something?
If there were in fact some sort of objective morality, under which some people were much more valuable than others, then an ethical system which valued all people equally would be systematically biased in favor of the less valuable.
Can you expand on what you mean by “absurd” here?
In the sense of “Nothing is a kind of something” or “atheism is a kind of religion”.
Hm.
OK.
So, I imagine the following conversation between two people (A and B):
A: It’s absurd to say ‘atheism is a kind of religion,’
B: Why?
A: Well, ‘religion’ is a word with an agreed-upon meaning, and it denotes a particular category of structures in the world, specifically those with properties X, Y, Z, etc. Atheism lacks those properties, so atheism is not a religion.
B: I agree, but that merely shows the claim is mistaken. Why is it absurd?
A: (thinks) Well, what I mean is that any mind capable of seriously considering the question ‘Is atheism a religion?’ should reach the same conclusion without significant difficulty. It’s not just mistaken, it’s obviously mistaken. And, more than that, I mean that to conclude instead that atheism is a religion is not just false, but the opposite of the truth… that is, it’s blatantly mistaken.
Is A in the dialog above capturing something like what you mean?
If so, I disagree with your claim. It may be mistaken to characterize the practice of treating everyone the same as a form of bias, but it is not obviously mistaken or blatantly mistaken. In fact, I’m not sure it’s mistaken at all, though if it is a bias, it’s one I endorse among humans in a lot of contexts.
So, terminology aside, I guess the question I’m really asking is: how would I conclude that treating everyone the same (as opposed to treating different people differently) is not actually a bias, given that this is not obvious to me?
Are we talking sweeties here? Because that seems more like lack of foresight than value drift. Or are we talking puberty? That seems more like new options becoming available.
You should really start qualifying that with “most actual” if you don’t want people to interpret it as applying to all possible (superintelligent) minds.
But you’re talking about parts of mindspace other than ours, right? The Superhappies are strikingly similar to us, but they still choose the superhappiest values, not the right ones.
I don’t require their values to converge, I require them to accept the truths of certain claims. This happens in real life. People say “I don’t like X, but I respect your right to do it”. The first part says X is a disvalue, the second is an override coming from rationality.
This is where you are confused. Almost certainly it is not the only confusion. But here is one:
Values are not claims. Goals are not propositions. Dynamics are not beliefs.
A machine that maximises paperclips can believe all true propositions in the world, and go on maximising paperclips. Nothing compels it to act any differently. You expect that rational agents will eventually derive the true theorems of morality. Yes, they will. Along with the true theorems of everything else. It won’t change their behaviour, unless they are built so as to send those actions identified as moral to the action system.
If you don’t believe me, I can only suggest you study AI (Thrun & Norvig) and/or the metaethics sequence until you do. (I mean really study. As if you were learning particle physics. It seems the usual metaethical confusions are quite resilient; in most peoples’ cases I wouldn’t expect them to vanish without actually thinking carefully about the data presented.) And, well, don’t expect to learn too much from off-the-cuff comments here.
Well, that justifies moral realism.
...or its an emergent feature, or they can update into something that works that way. You are tacitly assuming that you clipper is barely an AI at all...that is just has certain functions it performs blindly because its built that way. But a supersmart, uper-rational clipper has to be able to update. By hypothesis, clippers have certain functionalities walled off from update. People are messilly designed and unlikely to work that way. So are likely AIs and aliens.
Only rational agents, not all mindful agents, will have what it takes to derive objective moral truths. They don’t need to converge on all their values to converge on all their moral truths, because ratioanity can tell you that a moral claim is true even if it is not in your (other) interests. Individuals can value rationality, and that valuation can override other valuations.
Only rational agents, not all mindful agents, will have what it takes to derive objective moral truths. The further claim that agents will be motivated to do derive moral truths., and to act on them, requires a further criterion. Morality is about regulating behaviour in a society, So only social rational agents will have motivation to update. Again, they do not have to converge on values beyond the shared value of sociality.
The Futility of Emergence
A paperclipper no more has a wall stopping it from updating into morality than my laptop has a wall stopping it from talking to me. My laptop doesn’t talk to me because I didn’t program it to. You do not update into pushing pebbles into prime-numbered heaps because you’re not programmed to do so.
Does a stone roll uphill on a whim?
Perhaps you should study Reductionism first.
“Emergent” in this context means “not explicitly programmed in”. There are robust examples.
Your laptop cannot talk to you because the natural language is an unsolved problem.
Not wanting to do something is not the slightest guarantee of not actually doing it.f
An AI can update its values because value drift is an unsolved problem
Clippers can’t update their values by definition, but you can’t define anything into existence or statistical significance.
Not programmed to, or programmed not to? If you can code up a solution to value drift, lets see it. Otherwise, note that Life programmes can update to implement glider generators without being “programmed to”.
...with extremely low probability. It’s far more likely that the Life field will stabilize around some relatively boring state, empty or with a few simple stable patterns. Similarly, a system subject to value drift seems likely to converge on boring attractors in value space (like wireheading, which indeed has turned out to be a problem with even weak self-modifying AI) rather than stable complex value systems. Paperclippism is not a boring attractor in this context, and a working fully reflective Clippy would need a solution to value drift, but humanlike values are not obviously so, either.
I’m increasingly baffled as to why AI is always brought in to discussions of metaethics. Societies of rational agents need ethics to regulate their conduct. Out AIs aren’t sophisticated enough to live in their own socieities. A wireheading AI isn’t even going to be able to survive “in the wild”. If you could build an artificial society of AI, then the questions of whether they spontaneously evolved ethics would be a very interesting and relevant datum. But AIs as we know them aren’t good models for the kinds of entities to which morality is relevant. And Clippy is particularly exceptional example of an AI. So why do people keep saying “Ah, but Clippy...”...?
Well, in this case it’s because the post I was responding to mentioned Clippy a couple of times, so I thought it’d be worthwhile to mention how the little bugger fits into the overall picture of value stability. It’s indeed somewhat tangential to the main point I was trying to make; paperclippers don’t have anything to do with value drift (they’re an example of a different failure mode in artificial ethics) and they’re unlikely to evolve from a changing value system.
Key word here being “societies”. That is, not singletons. A lot of the discussion on metaethics here is implicitly aimed at FAI.
Sorry..did you mean FAI is about societies, or FAI is about singletons?
But if ethics does emerge as an organisational principle in socieities, that’s all you need for FAI. You don’t even to to worry about one sociopathic AI turning unfriendly, because the majority will be able to restrain it.
FAI is about singletons, because the first one to foom wins, is the idea.
ETA: also, rational agents may be ethical in societies, but there’s no advantage to being an ethical singleton.
UFAI is about singletons. If you have an AI society whose members compare notes and share information—which ins isntrumentally useful for them anyway—your reduce the probability of singleton fooming.
Any agent that fooms becomes a singleton. Thus, it doesn’t matter if they acted nice while in a society; all that matters is whether they act nice as a singleton.
I don’t get it: any agent that fooms becomes superintelligent. It’s values don’t necessarily change at all, nor does its connection to its society.
An agent in a society is unable to force its values on the society; it needs to cooperate with the rest of society. A singleton is able to force its values on the rest of society.
At last, an interesting reply!
Other key problem:
Please unpack this and describe precisely, in algorithmic terms that I could read and write as a computer program given unlimited time and effort, this “ability to update” which you are referring to.
I suspect that you are attributing Magical Powers From The Beyond to the word “update”, and forgetting to consider that the ability to self-modify does not imply active actions to self-modify in any one particular way that unrelated data bits say would be “better”, unless the action code explicitly looks for said data bits.
It’s uncontrovesial that rational agents need to update, and that AIs need to self-modify. The claim that values are in either case insulated from updates is the extraordinary one. The Cipper theory tells you that you could build something like that if you were crazy enough. Since Clippers are contrived, nothing can be inferred from them about typical agents. People are messy, and can accidentally update their values when trying to do something else, For instance, LukeProg updated to “atheist” after studying Christian apologetics for the opposite reason.
Yes, value drift is the typical state for minds in our experience.
Building a committed Clipper that cannot accidentally update its values when trying to do something else is only possible after the problem of value drift has been solved. A system that experiences value drift isn’t a reliable Clipper, isn’t a reliable good-thing-doer, isn’t reliable at all.
Next.
I never claimed that it was controversial, nor that AIs didn’t need to self-modify, nor that values are exempt.
I’m claiming that updates and self modification do not imply a change of behavior towards behavior desired by humans.
I can build a small toy program to illustrate, if that would help.
I am not suggesting that human ethics is coincidentally universal ethics. I am suggesting that if neither moral realism nor relativism is initially discarded, one can eventually arrive at a compromise position where rational agents in a particular context arrive at a non arbitrary ethics which is appropriate to that context.
… why do you think people say “I don’t like X, but I respect your right to do it”?
if its based on arbitrary axioms, that would be a problem, but I have already argued that the axiom choice would not be arbitrary.
I presume that you take your particular ethical system (or a variant thereof) to be the one that every alien, AI and human should adopt.
Ok, so why? Why can the function ethics: actions → degree of goodness, or however else you choose the domain, not be modified? Where’s your case?
Edit: What basis would convince not one, but every conceivable superintelligence of that hypothetical choice of axioms being correct? (They wouldn’t all “cancel out” if choosing different axioms, that in itself would falsify the ethical system proposed by a lowly human as being universally correct.)
I have not put forward an object-level ethical system, and I have explained why I do not need to. Physical realism does not imply that my physics is correct, metaethical realism does not imply that my ethics is the one true theory.
Because ethics needs to regulate behaviour—that is its functional role—and could not if individuals could justify any behaviour by re arranging action->goodness mappings.
Their optimally satisfying the constraints on ethical axioms arising from the functional role of ethics.
That doesn’t actually answer the quoted point. Perhaps you meant to respond to this:
… which is, in fact, refuted by your statement.
… which Kawoomba believes they can, AFAICT.
Could you unpack this a little? I think I see what you’re driving at, but I’m not sure.
Yes, I did, thanks.
Then what about the second half of the argument? If individuals can “ethically” justify any behaviour, then does or does not such “ethics” completely fail in its essential role of regulating behaviour? Because anyone can do anything, and conjure up a justification after the fact by shifting their “frame”? A chocolate “teapot” is no teapot, non-regulative “ethics” is no ethics...
Not now.
Ah, but Kawoomba doesn’t expect ethics to regulate other people, because he thinks everyone has incompatible goals. Thus ethics serves purely to define your goals.
Which, honestly, should simply be called “goals”, not “ethics”, but there you go.
Yea, honestly I’ve never seen the exact distinction between goals which have an ethics-rating, and goals which do not. I understand that humans share many ethical intuitions, which isn’t surprising given our similar hardware. Also, that it may be possible to define some axioms for “medieval Han Chinese ethics” (or some subset thereof), and then say we have an objectively correct model of their specific ethical code. About the shared intuitions amongst most humans, those could be e.g. “murdering your parents is wrong” (not even “murder is wrong”, since that varies across cultures and circumstances). I’d still call those systems different, just as different cars can have the same type of engine.
Also, I understand that different alien cultures, using different “ethical axioms”, or whatever they base their goals on, do not invalidate the medieval Han Chinese axioms, they merely use different ones.
My problem with “objectively correct ethics for all rational agents” is, you could say, where the compellingness of any particular system comes in. There is reason to believe an agent such as Clippy could not exist (edit: i.e., it probably could exist), and its very existence would contradict some “‘rational’ corresponds to a fixed set of ethics” rule. If someone would say “well, Clippy isn’t really rational then”, that would just be torturously warping the definition of “rational actor” to “must also believe in some specific set of ethical rules”.
If I remember correctly, you say at least for humans there is a common ethical basis which we should adopt (correct me otherwise). I guess I see more variance and differences where you see common elements, especially going in the future. Should some bionically enhanced human, or an upload on a spacestation which doesn’t even have parents, still share all the same rules for “good” and “bad” as an Amazon tribe living in an enclosed reservation? “Human civilization” is more of a loose umbrella term, and while there certainly can be general principles which some still share, I doubt there’s that much in common in the ethical codex of an African child soldier and Donald Trump.
A number of criteria have been put forward. For instance, do as you would be done by. If you don’t want to be murdered, murder is not an ethical goal.
The argument is not that rational agents (for some vaue of “rational”) must believe in some rules, it is rather that they must not adopt arbitrary goals. Also, the argument only requires a statistical majority of rational agents to converge, because of the P<1.0 thing.
Maybe not. The important thing is that variations in ethics should not be arbitrary—they should be systematically related to variations in circumstances.
I’m not disputing that there are goals/ethics which may be best suited to take humanity along a certain trajectory, towards a previously defined goal (space exploration!). Given a different predefined goal, the optimal path there would often be different. Say, ruthless exploitation may have certain advantages in empire building, under certain circumstances.
The Categorical Imperative in all its variants may be a decent system for humans (not that anyone really uses it).
But is the justification for its global applicability that “if everyone lived by that rule, average happiness would be maximized”? That (or any other such consideration) itself is not a mandatory goal, but a chosen one. Choosing different criteria to maximize (e.g. noone less happy than x) would yield different rules, e.g. different from the Categorical Imperative. If you find yourself to be the worshipped god-king in some ancient Mesopotanian culture, there may be many more effective ways to make yourself happy, other than the Categorical Imperative. How can it still be said to be “correct”/optimal for the king, then?
So I’m not saying there aren’t useful ethical system (as judged in relation to some predefined course), but that because those various ultimate goals of various rational agents (happiness, paperclips, replicating yourself all over the universe) and associated optimal ethics vary, there cannot be one system that optimizes for all conceivable goals.
My argument against moral realism and assorted is that if you had an axiomatic system from which it followed that strawberry is the best flavor of ice cream, but other agents which are just as intelligent with just as much optimizing power could use different axiomatic systems leading to different conclusions, how could one such system possibly be taken to be globally correct and compelling-to-adopt across agents with different goals?
Gandhi wouldn’t take a pill which may transform him into a murderer. Clippy would not willingly modify itself such that suddenly it had different goals. Once you’ve taken a rational agent apart and know its goals and, as a component, its ethical subroutines, there is no further “core spark” which really yearns to adopt the Categorical Imperative. Clippy may choose to use it, for a time, if it serves its ultimate goals. But any given ethical code will never be optimal for arbitrary goals, in perpetuity (proof by example). When then would a particular code following from particular axioms be adopted by all rational agents?
Well, not, that’s not Kant’s justification!
Why would a rational agent choose unhappiness?
Yes, but that wouldn’t count as ethics. You wouldn’t want a Universal Law that one guy gets the harem, and everyone else is a slave, because you wouldn’t want to be a slave, and you probably would be. This is brought out in Rawls’ version of Kantian ethics: you pretend to yourself that you are behind a veil that prevents you knowing what role in society you are going to have, and choose rules that you would want to have if you were to enter society at random.
You don’t have object-level stuff like ice cream or paperclips in your axioms (maxims), you have abstract stuff, like the Categorical Imperative. You then arrive at object level ethics by plugging in details of actual circumstances and values. These will vary, but not in an arbitrary way, as is the disadvantage of anything-goes relativism.
The idea is that things like the CI have rational appeal.
Rational agents will converge on a number of things because they are rational. None of them will think 2+2-=5.
Scenario:
1) You wake up in a bright box of light, no memories. You are told you’ll presently be born into an Absolute monarchy, your role randomly chosen. You may choose any moral principles that should govern that society. The Categorical Imperative would on average give you the best result.
2) You are the monarch in that society, you do not need to guess which role you’re being born into, you have that information. You don’t need to make all the slaves happy to help your goals, you can just maximize your goals directly. You may choose any moral principle you want to govern your actions. The Categorical Imperative would not give you the best result.
A different scenario: Clippy and Anti-Clippy sit in a room. Why can they not agree on epistemic facts about the most accurate laws of physics and other Aumann-mandated agreements, yet then go out and each optimize/reshape the world according to their own goals? Why would that make them not rational?
Lastly, whatever Kant’s justification, why can you not optimize for a different principle—peak happiness versus average happiness, what makes any particular justifying principle correct across all—rational—agents. Here come my algae!
For what value of “best”? If the CI is the correct theory of morality, it will necessarily give your the morally best result. Maybe your complaint is that it wouldn’t maximise your personally utility. But I don’t see why you would expect that. Things like utilitarianism that seek to maximise group utility, don’t promise to make everyone blissfully happy individually. Some will lose out.
It would be irrational for Clippy to sing up to an agreement with Beady according to which Beady gets to turn Clippy and all his clips into beads. It is irrational for agents to sign up to anyhting which is not in their interests, and it is not in their interests to have no contract at all. So rational agents, even if they do not converge on all their goals, will negotiate contracts that minimise their disutility Clippy and Beady might take half the universe each.
If you think RAs can converge on an ultimately correct theory of physics (which we don’t have), what is to stop them converging on the correct theory of morality, which we also don’t have?
Not very rational for those to adopt a losing strategy (from their point of view), is it? Especially since they shouldn’t reason from a point of “I could be the king”. They aren’t, and they know that. No reason to ignore that information, unless they believe in some universal reincarnation or somesuch.
Yes. Which is why rational agents wouldn’t just go and change/compromise their terminal values, or their ethical judgements (=no convergence).
Starting out with different interests. A strong clippy accommodating a weak beady wouldn’t be in its best self-interest. It could just employ a version of morality which is based on some tweaked axioms, yielding different results.
There are possibly good reasons for us as a race to aspire to working together. There are none for a domineering Clippy to take our interests into account, yielding to any supposedly “correct” morality would strictly damage its own interests.
Someone who adopts the “I don;t like X, but I respect peoples right to do it” approach is sacrificing some of their values to their evaluation of rationality and fairness. They would not do that if their rationality did not outweigh other values, But they are not having all their values maximally satisfied, so in that sense they are losing out.
There’s no evidence of terminal values. Judgements can be updated without changing values.
Not all agents are interested in physics or maths. Doesn’t stop their claims being objetive.
Not Beady, Anti-Clippy: an agent that is the precise opposite of Clippy. It wants to minimize the number of paperclips.
If there are a lot of similar agents in similar positions, Kantian ethics works, no matter what their goals. For example, theft may appear to have positive expected value—assuming you’re selfish—but it has positive expected value for lots of people, and if they all stole the economy would collapse.
OTOH, if you are in an unusual position, the Categorical Imperative only has force if you take it as axiomatic.
That’s not a version of Kantian ethics, it’s a hack for designing a society without privileging yourself. If you’re selfish, it’s a bad idea.
Kawoomba, maybe it would be better for you to think in terms of ethics along the lines of Kant’s Categorical Imperative, or social contract theory; ways for agents with different goals to co-operate.
Wouldn’t that presuppose that “cooperation is the source/the sine qua non of all good”?
Sure, we can redefine some version of ethics in such a cooperative light, and then conclude that many agents don’t give a hoot about such ethics, or regard it in the cold, hard terms of game theory, e.g. negotiating/extortion strategies only.
Judging actions as “good” or “bad” doesn’t prima facie depend entirely on cooperation, the good of your race, or whatever. For example, if you were a part of a planet-eating race, consuming all matter/life in its path—while being very friendly amongst themselves—couldn’t it be considered ethically “good” even from a human perspective to killswitch your own race? And “bad” from the moral standpoint of the planet-eating race?
The easiest way to dissolve such obvious contradictions is to say that there is just not, in fact, a universal hierarchy ranking ethical systems universally, regardless of the nature of the (rational = capable reasoner) agent.
Doesn’t mean an agent isn’t allowed to strongly defend what it considers to be moral, to die for it, even.
The point is it doesn’t matter what you consider “good”; fighting people wont produce it (even if you value fighting people, because they will beat you and you’ll be unable to fight.)
I’m not saying your goals should be ethical; I’m saying you should be ethical in order to achieve your goals.
That seems very simplistic.
Ethically “good” = enabling cooperation, if you are not cooperating you must be “fighting”?
Those are evidently only rough approximations of social dynamics even just in a human context. Would it be good to cooperate with an invading army, or to cooperate with the resistance? The one with an opposing goal, so as a patriot, the opposing army it is, eh?
Is it good to cooperate with someone bullying you, or torturing you? What about game theory, if you’re not “cooperating” (for your value of cooperating), you must be “fighting”? What do you mean by fighting, physical altercations? Is a loan negotiation more like cooperation or more like fighting, and is it thus ethically good or bad, for your notion of “ethics = ways for agents with different goals to co-operate”?
It seems like a nice soundbite, but doesn’t make even cursory sense on further examination. I’m all for models that are as simple as possible, but no simpler. But cooperation as the definition of ethics? For you, maybe. Collaborateur!
Fighting in this context refers to anything analogous to defecting in a Prisoner’s Dilemma. You hurt the other side but encourage them to defect in order to punish you. You should strive for the Pareto Optimimum.
Maybe this would be clearer if we talked in terms of Pebblesorters?
Why not just say there is no ethics? His theory is like saying that since teapots are made of chocolate, their purpose is to melt into a messy puddle instead of making tea.
I’m all in favor of him just using the word “goals”, myself, and leaving us non-paperclippers the word “ethics”, but oh well. It confuses discussion no end, but I guess it makes him happy.
Also, arguing over the “correct” word is low-status, so I’d suggest you start calling them “normative guides” or something while Kawoomba can hear you if you don’t want to rehash this conversation. And they can always hear you.
ALWAYS.