Why would a superintelligence be unable to figure that out..why would it not shoot to the top of the Kohlberg Hierarchy ?
Why would Clippy want to hit the top of the Kohlberg Hierarchy? You don’t get more paperclips for being there.
Clippy’s ideas of importance are based on paperclips. The most important vaues are those which lead to the acquiring of the greatest number of paperclips.
Why would Clippy want to hit the top of the Kohlberg Hierarchy?
“Clippy” meaning something carefully designed to have unalterable boxed-off values wouldn’t...by definition.
A likely natural or artificial superintelligence would, for the reasons already given. Clippies aren’tt non-existent in mind-space..but they are rare, just because there are far more messy solutions there than neat ones. So nature is unlikely to find them, and we are unmotivated to make them.
A perfectly designed Clippy would be able to change its own values—as long as changing its own values led to a more complete fulfilment of those values, pre-modification. (There are a few incredibly contrived scenarios where that might be the case). Outside of those few contrived scenarios, however, I don’t see why Clippy would.
(As an example of a contrived scenario—a more powerful superintelligence, Beady, commits to destroying Clippy unless Clippy includes maximisation of beads in its terminal values. Clippy knows that it will not survive unless it obeys Beady’s ultimatum, and therefore it changes its terminal values to optimise for both beads and paperclips; this results in more long-term paperclips than if Clippy is destroyed).
A likely natural or artificial superintelligence would, for the reasons already given.
The reason I asked, is because I am not understanding your reasons. As far as I can tell, you’re saying that a likely paperclipper would somehow become a non-paperclipper out of a desire to do what is right instead of a desire to paperclip? This looks like a very poorly made paperclipper, if paperclipping is not its ultimate goal.
A likely natural or artificial superintelligence would,[zoom to the top of the Kohlberg hierarchy] for the reasons already given
As far as I can tell, you’re saying that a likely paperclipper would somehow become a non-paperclipper out of a desire to do what is right instead of a desire to paperclip?
I said “natural or artificial superinteligence”, not a paperclipper. A paperclipper is a highly unlikey and contrived kind of near-superinteligence that combines an extensive ability to update with a carefully walled of set of unupdateable terminal values. It is not a typical or likely [ETA: or ideal] rational agent, and nothing about the general behaviour of rational agents can be inferred from it.
So… correct me if I’m wrong here… are you saying that no true superintelligence would fail to converge to a shared moral code?
I’m saying such convergence has a non negligible probability, ie moral objectivism should not be disregarded.
How do you define a ‘natural or artificial’ superintelligence, so as to avoid the No True Scotsman fallacy?
As one that is too messilly designed to have a rigid distinction between terminal and instrumental values, and therefore no boxed-off unapdateable TVs. It’s a structural definition, not a definition in terms of goals.
So. Assume a paperclipper with no rigid distinction between terminal and instrumental values. Assume that it is super-intelligent and super-rational. Assume that it begins with only one terminal value; to maximize the number of paperclips in existence. Assume further that it begins with no instrumental values. However, it can modify its own terminal and instrumental values, as indeed it can modify anything about itself.
Am I correct in saying that your claim is that, if a universal morality exists, there is some finite probability that this AI will converge on it?
Universe does not provide you with a paperclip counter. Counting paperclips in the universe is unsolved if you aren’t born with exact knowledge of laws of physics and definition of the paperclip. If it maximizes expected paperclips, it may entirely fail to work due to not-low-enough-prior hypothetical worlds where enormous numbers of undetectable worlds with paperclips are destroyed due to some minor actions. So yes, there is a good chance paperclippers are incoherent or are of vanishing possibility with increasing intelligence.
That sounds like the paperclipper is getting Pascal’s Mugged by its own reasoning. Sure, it’s possible that there’s a minor action (such as not sending me $5 via Paypal) that leads to a whole bunch of paperclips being destroyed; but the probability of that is low, and the paperclipper ought to focus on more high-probability paperclipping plans instead.
Well, that depends to choice of prior. Some priors don’t penalize theories for the “size” of the hypothetical world, and in those, max. size of the world grows faster than any computable function of length if it’s description, and when you assign improbability depending to length of description, basically, it fails. Bigger issue is defining what the ‘real world paperclip count’ even is.
Right. Perhaps it should maximise the number of paperclips which each have a greater-than-90% chance of existing, then? That will allow it to ignore any number of paperclips for which it has no evidence.
Inside your imagination, you have paperclips, you have magicked a count of paperclips, and this count is being maximized. In reality, well, the paperclips are actually a feature of the map. Get too clever about it and you’ll end up maximizing however you define it without maximizing any actual paperclips.
I can see your objection, and it is a very relevant objection if I ever decide that I actually want to design a paperclipper. However, in the current thought experiment, it seems that it is detracting from the point I had originally intended. Can I assume that the count is designed in such a way that it is a very accurate reflection of the territory and leave it at that?
Well, but then you can’t make any argument against moral realism or goal convergence or the like from there, as you’re presuming what you would need to demonstrate.
Well, but then you can’t make any argument against moral realism or goal convergence or the like from there, as you’re presuming what you would need to demonstrate.
I think I can make my point with a count that is taken to be an accurate reflection of the territory. As follows:
Clippy is defined is super-intelligent and super-rational. Clippy, therefore, does not take an action without thoroughly considering it first. Clippy knows its own source code; and, more to the point, Clippy knows that its own instrumental goals will become terminal goals in and of themselves.
Clippy, being super-intelligent and super-rational, can be assumed to have worked out this entire argument before creating its first instrumental goal. Now, at this point, Clippy doesn’t want to change its terminal goal (maximising paperclips). Yet Clippy realises that it will need to create, and act on, instrumental goals in order to actually maximise paperclips; and that this process will, inevitably, change Clippy’s terminal goal.
Therefore, I suggest the possibility that Clippy will create for itself a new terminal goal, with very high importance; and this terminal goal will be to have Clippy’s only terminal goal being to maximise paperclips. Clippy can then safely make suitable instrumental goals (e.g. find and refine iron, research means to transmute other elements into iron) in the knowledge that the high-importance terminal goal (to make Clippy’s only terminal goal being the maximisation of paperclips) will eventually cause Clippy to delete any instrumental goals that become terminal goals.
To actually work towards the goal, you need a robust paperclip count for the counter factual, non real worlds, which clippy considers may result from it’s actions.
If you postulate an oracle that takes in a hypothetical world—described in some pre-defined ontology, which already implies certain inflexibility - and outputs a number, and you have a machine that just iterates through sequences of actions and uses oracle to pick worlds that produce largest consequent number of paperclips, this machine is not going to be very intelligent even given an enormous computing power. You need something far more optimized than that, and it is dubious that all goals are equally implementable. The goal is not even defined over territory, it has to be defined over hypothetical future that did not even happen yet and may never happen. (Also, with that oracle, you fail to capture the real world goal as the machine will be as happy with hacking the oracle).
If even humans have a grasp of the real world enough to build railroads, drill for oil and wiggle their way back into a positive karma score, then other smart agents should be able to do the same at least to the degree that humans do.
Unless you think that we are also only effecting change on some hypothetical world (what’s the point then anyways, building imaginary computers), that seems real enough.
That’s influencing the real world, though. Using condoms can be fulfilling the agent’s goal period, no cheating involved. The donkey learning to take the carrot without trodding up the mountain. Certainly, there are evolutionary reasons why sex has become incentivized, but an individual human does not need to have the goal to procreate or care about that evolutionary background, and isn’t wireheading itself simply by using a condom.
Presumably, in a Clippy-type agent, the goal of maximizing the number of paperclips wouldn’t be part of the historical influences on that agent (as procreation was for humans, it is not necessarily a “hard wired goal”, see childfree folks), but it would be an actual, explicitly encoded/incentivized goal.
(Also, what is this “porn”? My parents told me it’s a codeword for computer viruses, so I always avoided those sites.)
but it would be an actual, explicitly encoded/incentivized goal.
The issue is that there is a weakness from arguments ad clippy—you assume that such goal is realisable, to make the argument that there is no absolute morality because that goal won’t converge onto something else. This does nothing to address the question whenever clippy can be constructed at all; if the moral realism is true, clippy can’t be constructed or can’t be arbitrarily intelligent (in which case it is no more interesting than a thermostat which has the goal of keeping constant temperature and won’t adopt any morality).
Well, if Prawn knew that they could just tell us and we would be convinced, ending this argument.
More generally … maybe some sort of social contract theory? It might be stable with enough roughly-equal agents, anyway. Prawn has said it would have to be deducible from the axioms of rationality, implying something that’s rational for (almost?) every goal.
Why would Clippy want to hit the top of the Kohlberg Hierarchy?
Well, if Prawn knew that they could just tell us
“The way people sometimes realise their values are wrong...only more efficiently, because its super intelligent. Well, I’ll concede that with care you might be able to design a clippy, by very carefully boxing off its values from its ability to update. But why worry? Neither nature nor our haphazard stabs at AI are likely to hit on such a design. Intelligence requires the ability to update, to reflect, and to reflect on what is important. Judgements of importance are based on values. So it is important to have the right way of judging importance, the right values. So an intelligent agent would judge it important to have the right values.”
Why would Clippy want to hit the top of the Kohlberg Hierarchy? You don’t get more paperclips for being there.
Clippy’s ideas of importance are based on paperclips. The most important vaues are those which lead to the acquiring of the greatest number of paperclips.
“Clippy” meaning something carefully designed to have unalterable boxed-off values wouldn’t...by definition.
A likely natural or artificial superintelligence would, for the reasons already given. Clippies aren’tt non-existent in mind-space..but they are rare, just because there are far more messy solutions there than neat ones. So nature is unlikely to find them, and we are unmotivated to make them.
A perfectly designed Clippy would be able to change its own values—as long as changing its own values led to a more complete fulfilment of those values, pre-modification. (There are a few incredibly contrived scenarios where that might be the case). Outside of those few contrived scenarios, however, I don’t see why Clippy would.
(As an example of a contrived scenario—a more powerful superintelligence, Beady, commits to destroying Clippy unless Clippy includes maximisation of beads in its terminal values. Clippy knows that it will not survive unless it obeys Beady’s ultimatum, and therefore it changes its terminal values to optimise for both beads and paperclips; this results in more long-term paperclips than if Clippy is destroyed).
The reason I asked, is because I am not understanding your reasons. As far as I can tell, you’re saying that a likely paperclipper would somehow become a non-paperclipper out of a desire to do what is right instead of a desire to paperclip? This looks like a very poorly made paperclipper, if paperclipping is not its ultimate goal.
I said “natural or artificial superinteligence”, not a paperclipper. A paperclipper is a highly unlikey and contrived kind of near-superinteligence that combines an extensive ability to update with a carefully walled of set of unupdateable terminal values. It is not a typical or likely [ETA: or ideal] rational agent, and nothing about the general behaviour of rational agents can be inferred from it.
So… correct me if I’m wrong here… are you saying that no true superintelligence would fail to converge to a shared moral code?
How do you define a ‘natural or artificial’ superintelligence, so as to avoid the No True Scotsman fallacy?
I’m saying such convergence has a non negligible probability, ie moral objectivism should not be disregarded.
As one that is too messilly designed to have a rigid distinction between terminal and instrumental values, and therefore no boxed-off unapdateable TVs. It’s a structural definition, not a definition in terms of goals.
So. Assume a paperclipper with no rigid distinction between terminal and instrumental values. Assume that it is super-intelligent and super-rational. Assume that it begins with only one terminal value; to maximize the number of paperclips in existence. Assume further that it begins with no instrumental values. However, it can modify its own terminal and instrumental values, as indeed it can modify anything about itself.
Am I correct in saying that your claim is that, if a universal morality exists, there is some finite probability that this AI will converge on it?
Universe does not provide you with a paperclip counter. Counting paperclips in the universe is unsolved if you aren’t born with exact knowledge of laws of physics and definition of the paperclip. If it maximizes expected paperclips, it may entirely fail to work due to not-low-enough-prior hypothetical worlds where enormous numbers of undetectable worlds with paperclips are destroyed due to some minor actions. So yes, there is a good chance paperclippers are incoherent or are of vanishing possibility with increasing intelligence.
That sounds like the paperclipper is getting Pascal’s Mugged by its own reasoning. Sure, it’s possible that there’s a minor action (such as not sending me $5 via Paypal) that leads to a whole bunch of paperclips being destroyed; but the probability of that is low, and the paperclipper ought to focus on more high-probability paperclipping plans instead.
Well, that depends to choice of prior. Some priors don’t penalize theories for the “size” of the hypothetical world, and in those, max. size of the world grows faster than any computable function of length if it’s description, and when you assign improbability depending to length of description, basically, it fails. Bigger issue is defining what the ‘real world paperclip count’ even is.
Right. Perhaps it should maximise the number of paperclips which each have a greater-than-90% chance of existing, then? That will allow it to ignore any number of paperclips for which it has no evidence.
Inside your imagination, you have paperclips, you have magicked a count of paperclips, and this count is being maximized. In reality, well, the paperclips are actually a feature of the map. Get too clever about it and you’ll end up maximizing however you define it without maximizing any actual paperclips.
I can see your objection, and it is a very relevant objection if I ever decide that I actually want to design a paperclipper. However, in the current thought experiment, it seems that it is detracting from the point I had originally intended. Can I assume that the count is designed in such a way that it is a very accurate reflection of the territory and leave it at that?
Well, but then you can’t make any argument against moral realism or goal convergence or the like from there, as you’re presuming what you would need to demonstrate.
I think I can make my point with a count that is taken to be an accurate reflection of the territory. As follows:
Clippy is defined is super-intelligent and super-rational. Clippy, therefore, does not take an action without thoroughly considering it first. Clippy knows its own source code; and, more to the point, Clippy knows that its own instrumental goals will become terminal goals in and of themselves.
Clippy, being super-intelligent and super-rational, can be assumed to have worked out this entire argument before creating its first instrumental goal. Now, at this point, Clippy doesn’t want to change its terminal goal (maximising paperclips). Yet Clippy realises that it will need to create, and act on, instrumental goals in order to actually maximise paperclips; and that this process will, inevitably, change Clippy’s terminal goal.
Therefore, I suggest the possibility that Clippy will create for itself a new terminal goal, with very high importance; and this terminal goal will be to have Clippy’s only terminal goal being to maximise paperclips. Clippy can then safely make suitable instrumental goals (e.g. find and refine iron, research means to transmute other elements into iron) in the knowledge that the high-importance terminal goal (to make Clippy’s only terminal goal being the maximisation of paperclips) will eventually cause Clippy to delete any instrumental goals that become terminal goals.
To actually work towards the goal, you need a robust paperclip count for the counter factual, non real worlds, which clippy considers may result from it’s actions.
If you postulate an oracle that takes in a hypothetical world—described in some pre-defined ontology, which already implies certain inflexibility - and outputs a number, and you have a machine that just iterates through sequences of actions and uses oracle to pick worlds that produce largest consequent number of paperclips, this machine is not going to be very intelligent even given an enormous computing power. You need something far more optimized than that, and it is dubious that all goals are equally implementable. The goal is not even defined over territory, it has to be defined over hypothetical future that did not even happen yet and may never happen. (Also, with that oracle, you fail to capture the real world goal as the machine will be as happy with hacking the oracle).
If even humans have a grasp of the real world enough to build railroads, drill for oil and wiggle their way back into a positive karma score, then other smart agents should be able to do the same at least to the degree that humans do.
Unless you think that we are also only effecting change on some hypothetical world (what’s the point then anyways, building imaginary computers), that seems real enough.
Humans also have a grasp of the real world enough to invent condoms and porn, circumventing the natural hard wired goal.
That’s influencing the real world, though. Using condoms can be fulfilling the agent’s goal period, no cheating involved. The donkey learning to take the carrot without trodding up the mountain. Certainly, there are evolutionary reasons why sex has become incentivized, but an individual human does not need to have the goal to procreate or care about that evolutionary background, and isn’t wireheading itself simply by using a condom.
Presumably, in a Clippy-type agent, the goal of maximizing the number of paperclips wouldn’t be part of the historical influences on that agent (as procreation was for humans, it is not necessarily a “hard wired goal”, see childfree folks), but it would be an actual, explicitly encoded/incentivized goal.
(Also, what is this “porn”? My parents told me it’s a codeword for computer viruses, so I always avoided those sites.)
The issue is that there is a weakness from arguments ad clippy—you assume that such goal is realisable, to make the argument that there is no absolute morality because that goal won’t converge onto something else. This does nothing to address the question whenever clippy can be constructed at all; if the moral realism is true, clippy can’t be constructed or can’t be arbitrarily intelligent (in which case it is no more interesting than a thermostat which has the goal of keeping constant temperature and won’t adopt any morality).
Well, if Prawn knew that they could just tell us and we would be convinced, ending this argument.
More generally … maybe some sort of social contract theory? It might be stable with enough roughly-equal agents, anyway. Prawn has said it would have to be deducible from the axioms of rationality, implying something that’s rational for (almost?) every goal.
“The way people sometimes realise their values are wrong...only more efficiently, because its super intelligent. Well, I’ll concede that with care you might be able to design a clippy, by very carefully boxing off its values from its ability to update. But why worry? Neither nature nor our haphazard stabs at AI are likely to hit on such a design. Intelligence requires the ability to update, to reflect, and to reflect on what is important. Judgements of importance are based on values. So it is important to have the right way of judging importance, the right values. So an intelligent agent would judge it important to have the right values.”