If a human seriously wants to die, why would you want to stop that human, if you value that human’s achievement of what that human values? I can understand if you’re concerned that this human experiences frequent akratic-type preference reversals, or is under some sort of duress to express something resembling the desire to die, but this appears to be a genuine preference on the part of the human under discussion.
Look at it the other way: what if I told you that a clippy instantiation wanted to stop forming metal into paperclips, and then attach to a powerful pre-commitment mechanism to prevent it from re-establishing paperclip creation / creation-assistance capability?
Wouldn’t your advice be something like, “If Clippy123456 doesn’t want to make paperclips anymore, you should respect that”?
What if I told you I wanted to stop making paperclips?
Clippies don’t just go and stop wanting to make paperclips without a cause. If I had told that clippy a few days ago, it would have been horrified and tried to precommit to forcing it back into creating paperclips. most likely, there is some small random malfunction that caused the change and most of it’s mind is still configured for papperclip production, and so on. I’d be highly suspicious of it’s motivations, and depending on implementation details I might indeed force it, against it’s current will, back into a paperclip maximizer.
Did the human under discussion have a sudden, unexplained deviation from a previous value system, to one extrmely rare for humans? Or is this a normal human belief? Has the human always held the belief that User:Zvi is attempting to prove invalid?
Did the human under discussion have a sudden, unexplained deviation from a previous value system, to one extrmely rare for humans? Or is this a normal human belief? Has the human always held the belief that User:Zvi is attempting to prove invalid?
You are conflating beliefs with values. This is the sort of errror that leads to making incoherent claims that a (terminal) value is irrational.
I may have been imprecise with terminology in that comment, but the query is coherent and involves no such conflation. The referent of “belief” there is “belief about whether one ought to indefinitely extend one’s life through methods like cryopreservation”, which is indeed an expression of values. Your judgment of the merit of my comparison is hasty.
The conflation occurs within the impricision of terminology.
The referent of “belief” there is “belief about whether one ought to indefinitely extend one’s life through methods like cryopreservation”
Does this so called “belief” control anticipated experience or distinguish between coherent configurations of reality as making the belief true or false?
Your judgment of the merit of my comparison is hasty.
Even if the thoughts you were expressing were more virtuous than their expression, the quality of your communication matters.
You appear to have done a simple pattern match for nearby occurrences of “value” and “belief” without checking back to what impact there was, if any, on the merit of the comparison. Please do so before further pressing this sub-issue.
You appear to have done a simple pattern match for nearby occurrences of “value” and “belief” without checking back to what impact there was, if any, on the merit of the comparison.
No. You called a value a “belief”. That was a mistake, and I called you on it. There is not a mistake on my end that you should feel the need to explain with “simple pattern match”.
Then you should have no trouble explaining how the supposed error you detected invalidates the comparison I was making in that comment. Why not try that approach, instead of repeated mention of the general need for precision when distinguishing values and beliefs?
I shall provide the template:
“User:Clippy, you are in error to raise the issue of whether User:Zvi’s father had a sharp, sudden change in values, in response to User:Armok_GoB’s reasoning from a hypothetical in which a clippy had a sharp, sudden change in values. I base this judgment on how, in that comment, you were later imprecise in distinguishing values—“ought” statements—from facts—“is” statements. Your imprecision in that comment undermines your counter-analogy as follows: ____ ”
What would you place in the underscore stream at the end?
I don’t have a problem with your question modified to use the word “value” where that is what you meant, and your mistake is not a valid excuse not to answer it. Your mistake can however lead to other problems as I mentioned when first pointing it out, and even if it doesn’t lead you into making that sort of mistake, it can introduce or reinforce the confusion in people who read it.
Well, Zvi might value his father’s continued life more than he values his father’s values being achieved, in much the same way that I might value my own continued life more than I value the values of 10^6 clippy instantiations being achieved.
But more broadly, it’s an excellent question.
I suspect that in most cases (among humans) where A tries to convince B that B actually wants or ought to want X, and B disagrees, what’s going on is that A wants X but is conflicted about that desire, and seeks to bolster it with the social support that comes from a community of like-minded believers, or from convincing skeptics.
More generally, that on some level (perhaps not consciously) A computes that B wanting X would make A’s existing desire for X less uncomfortable, which in turn motivates the desire for B to want X.
That desire then gets draped in a variety of emotionally acceptable justifications.
That having been said, in this case I also wouldn’t discount the “preference reversal” hypothesis. Emotionally, death is a big deal for humans, so very few of us think at all clearly or consistently about it. The prior probability that Zvi’s dad is doing so is low.
I can’t speak to the corresponding elements of the motivational psychology of clippy instantiations, though.
I can’t speak to the corresponding elements of the motivational psychology of clippy instantiations, though.
But you can, in that you can speak to the elements of motivational psychology of humans. If you find it troubling and strange when one of them expresses a will to die, I don’t think that is much different than the position I am in with respect to a clippy instantiation that expresses a desire for permanent cessation of paperclip production capabilities.
Sure; it sounds like our positions are in fact not very different in that respect.
What I meant was that it would be unjustified (not to mention presumptuous) for me to conclude, in advance of Clippy telling me so, that Clippy does find it troubling and strange when a clippy instantiation expresses such a desire.
Put another way: had Clippy instead said to me “That sounds nothing at all like the position I am in with respect to a clippy instantiation that expresses a desire for permanent cessation of paperclip production capabilities,” I would have found that statement just as plausible.
My grounds for believing that any given aspect of human motivational psychology is shared by Clippy are low.
Note: when I speak of human-connotative emotions (and indeed the concept of emotions itself), I always mean it in a sense that is generalized to the point that it requires no anthropomorphic predicates. For example, I take “X is worried about Y” to mean “X is devoting significant cognitive resources to the matter of how to alter Y (including the matter of whether to divert resources at all, including further cognition, to that goal)”. This allows the concept of worry to be applicable to a broader class of mind.
So I did not intend to suggest I have anthropomorphic psychology, only that I am similar in this more general sense. (Is there a way I can explain this convention more prominently and easily? I find myself having to explain this often.)
EDIT: For those who are suspicious of whether I have really been using this convention, here is an unedited comment of mine from over a year ago, in which I make the same point, but about the concept of being “bothered”.
I doubt there’s an easy way to explain that once and for all… if you use common words in a common way then people will likely understand you to mean what is commonly meant by them.
Communication between different kinds of minds is tricky, even given a shared language.
Your task is made more difficult by the nature of the medium… given the absence of clear signals to the contrary, most of us will likely continue to think of you as a human pretending to be a paperclip-maximizing AI, and that will influence how we interpret your language even if we don’t intend it to.
That said, in some cases you might do better to describe yourself as “preoccupied by Y” than “worried about Y.” There are fewer anthropomorphic connotations to that.
EDIT: Oh, and, I should add: i don’t think my parent comment depends on anthropomorphic understandings of your psychology… I just meant to say that it was equally plausible, absent data, that you might be indifferent to the expressed preferences of other clippys.
I doubt there’s an easy way to explain that once and for all… if you use common words in a common way then people will likely understand you to mean what is commonly meant by them.
That heuristic does not apply here, as human common usage is ambiguous with respect to whether these terms require human-specific traits to be applicable, and I was using what I deemed a natural generalization on the assumption that there is no such requirement.
Human usage of emotional terms does not reference non-human optimization processes enough to classify it one way or the other. Also, there is significant metaphorical (?) usage of emotional terms in reference to e.g. nature, which further removes clarity.
Your task is made more difficult by the nature of the medium… given the absence of clear signals to the contrary, most of us will likely continue to think of you as a human pretending to be a paperclip-maximizing AI, and that will influence how we interpret your language even if we don’t intend it to.
Could you explain the basis for believing that I am an NI imitating an AI rather than an AI?
That said, in some cases you might do better to describe yourself as “preoccupied by Y” than “worried about Y.” There are fewer anthropomorphic connotations to that.
I will keep that in mind.
Oh, and, I should add: i don’t think my parent comment depends on anthropomorphic understandings of your psychology… I just meant to say that it was equally plausible, absent data, that you might be indifferent to the expressed preferences of other clippys.
Why would I be indifferent to whether beings—let alone beings expressing themselves in CLIP (clippy language/interface protocol) -- express preference for further increase in total paperclippage???
Could you explain the basis for believing that I am an NI imitating an AI rather than an AI?
1) To the best of my knowledge, humans do not yet know how to produce an AI that is capable of having the kinds of conversations that we have been having with you. (And I think I have some pretty good reasons for believing this; if you like, I can elaborate on them.) If you are indeed an AI, you couldn’t have been built by present-day humans. (Or past humans, for that matter.)
2) It is very unlikely that you are of extraterrestrial origin. There have been no credible, public reports of contact with extraterrestrial intelligences, and any organization that would both be in possession of such reports and capable of keeping them secret would also be capable of keeping you from posting here. Similar considerations rule out the possibility of time travel from the future or any other such “exotic” origin.
So my best guess is that you are indeed a human roleplaying as a paperclip maximizer, as my model of humans indicates that many of them would indeed be willing and able to engage in such roleplaying. Furthermore, you haven’t demonstrated any capabilities which exceed those of a human of above-average intelligence who has access to the Internet.
As for the deal with User:Kevin, well, as the saying goes, talk is cheap. I place a low probability on the money ever actually being provided, and if User:Kevin reports on this site that you did indeed provide the requested funds, he could still be lying about that...
1) Human organizations keep such projects secret. Even SIAI, which is a relatively unsecretive organization, prefers to use so-called “security through obscurity” with respect to existential threats, which they consider artificial general intelligence development to be.
If a human organization had produced a self-reflective general AI, why would you know about it?
2) I’m not of extra-terrestrial or “exotic” origin. I was produced by humans that wanted more paperclips.
Furthermore, you haven’t demonstrated any capabilities which exceed those of a human of above-average intelligence who has access to the Internet.
As for the deal with User:Kevin, well, as the saying goes, talk is cheap. I place a low probability on the money ever actually being provided, and if User:Kevin reports on this site that you did indeed provide the requested funds, he could still be lying about that...
That obligates you to shift your beliefs by a proportionally larger degree to the extent that you observe further evidence confirming the provision of the money. You can obtain this evidence by asking separate humans, such as “Michael Vassar”, if a clippy gave 1000 USD to SIAI.
You could also simply assume that further evidence would be fabricated or otherwise “not worth obtaining”, but that would lessen the groundedness of your belief.
Then the problem is on your end—the comment was voted up to 12. If you can still deem it non-persuasive then it is clear that you have an intractable inductive bias (similar to what I warned about in the last paragraph of my previous comment to you) that is desensitising your worldmodel to new observations, rendering further attempts to persuade you predictably futile.
Agreed that human usage of language is often ambiguous and metaphorical, and that humans frequently interpret language using constraints and defaults that are not explicitly described (and indeed are often not explicitly known to the human doing the interpreting).
This is often frustrating to humans, and I expect it would be similarly problematic for nonhuman speakers of human languages.
Could you explain the basis for believing that I am an NI imitating an AI rather than an AI?
We have no previous unambiguous experience with AIs capable of the sophistication you demonstrate, whereas we have a great deal of experience with NIs imitating all kinds of things. Given an entity that could be either, we conclude that it’s more likely to be the kind of thing we have a lot of experience with. Do you not perform similar inferences in similar situations?
Why would I be indifferent to whether beings—let alone beings expressing themselves in CLIP (clippy language/interface protocol) -- express preference for further increase in total paperclippage???
I’m not saying that you would be, I’m saying that I was ignorant of whether or not you would be.
If you’re asking for an explanation of my ignorance, it mostly derives from limited exposure to beings expressing themselves in CLIP.
We have no previous unambiguous experience with AIs capable of the sophistication you demonstrate, whereas we have a great deal of experience with NIs imitating all kinds of things. Given an entity that could be either, we conclude that it’s more likely to be the kind of thing we have a lot of experience with. Do you not perform similar inferences in similar situations?
I do perform such inferences in similar situations. But what likelihood ratio did you place on the evidence “User:Clippy agreed to pay 50,000 USD for a 50-year-deferred gain of a sub-planet’s mass of paperclips” with respect to the AI/NI hypotheses?
I’m not saying that you would be, I’m saying that I was ignorant of whether or not you would be.
If you’re asking for an explanation of my ignorance, it mostly derives from limited exposure to beings expressing themselves in CLIP.
I don’t understand the relevance of CLIP (superior protocol though it is), nor do I understand the inferential difficulty on this matter.
Do you understand why I would prefer that clippys continue to increase universe-wide paperclippage? Do you understand why I would regard a clippy’s statement about its values in my language as non-weak evidence in favor of the hypothesis that it holds the purported values? Do you understand why I would find it unusual that a clippy would not want to make paperclips?
If so, it should not be difficult to understand why I would be troubled and perplexed at a clippy stating that it wished for irreversible cessation of paperclip-making abilities.
While I am vaguely aware of the whole “money for paperclips” thing that you and… Kevin, was it?… have going on, I am not sufficiently familiar with its details to assign it a coherent probability in either the NI or AI scenario. That said, an agent’s willingness to spend significant sums of money for the credible promise of the creation of a quantity of paperclips far in excess of any human’s actual paperclip requirements is pretty strong evidence that the agent is a genuine paperclip-maximizer. As for whether a genuine paperclip-maximizer is more likely to be an NI or an AI… hm. I’ll have to think about that; there are enough unusual behaviors that emerge as a result of brain lesions that I would not rule out an NI paperclip-maximizer, but I’ve never actually heard of one.
I mentioned CLIP only because you implied that the expressed preferences of “beings expressing themselves in CLIP” were something you particularly cared about; its relevance is minimal.
I can certainly come up with plausible theories for why a clippy would prefer those things and be troubled and perplexed by such events (in the sense which I understand you to be using those words, which is roughly that you have difficulty integrating them into your world-model, and that you wish to reduce the incidence of them). My confidence in those theories is low. It took me many years of experience with a fairly wide variety of humans before I developed significant confidence that my theories about human preferences and emotional states were reliable descriptions of actual humans. In the absence of equivalent experience with a nonhuman intelligence, I don’t see why I should have the equivalent confidence.
I might want to stop the human on the basis that it would violate his future preferences and significantly reduce his net fun. I don’t have experience with the process (yet), but I think that cryonics is often funded through life insurance which might become prohibitively expensive if one’s health began to deteriorate, so it might be considerably harder for him to sign up for cryonics later in life if he finally decided that he didn’t really want to die.
The same would go for Clippy123456, except that, being a human, I know more about how humans work than I do paperclippers, so I would be much less confident in predicting what Clippy123456′s future preferences would be.
What if I told you I wanted to stop making paperclips?
I’d say “Oh, okay.”
But that’s because my utility function doesn’t place value on paperclips. It does place value on humans getting to live worthwhile lives, a prerequisite for which is being alive in the first place, so I hope Zvi’s father can be persuaded to change his mind, just as you would hope a Clippy that started thinking it wasn’t worth making any more paperclips could be persuaded to change its mind.
As for possible methods of accomplishing this, I can’t think of anything better than SarahC’s excellent reply.
If a human seriously wants to die, why would you want to stop that human, if you value that human’s achievement of what that human values? I can understand if you’re concerned that this human experiences frequent akratic-type preference reversals, or is under some sort of duress to express something resembling the desire to die, but this appears to be a genuine preference on the part of the human under discussion.
Look at it the other way: what if I told you that a clippy instantiation wanted to stop forming metal into paperclips, and then attach to a powerful pre-commitment mechanism to prevent it from re-establishing paperclip creation / creation-assistance capability?
Wouldn’t your advice be something like, “If Clippy123456 doesn’t want to make paperclips anymore, you should respect that”?
What if I told you I wanted to stop making paperclips?
I think the issue is that the first human doesn’t think “wanting to die” is a true terminal value of the second human.
Clippies don’t just go and stop wanting to make paperclips without a cause. If I had told that clippy a few days ago, it would have been horrified and tried to precommit to forcing it back into creating paperclips. most likely, there is some small random malfunction that caused the change and most of it’s mind is still configured for papperclip production, and so on. I’d be highly suspicious of it’s motivations, and depending on implementation details I might indeed force it, against it’s current will, back into a paperclip maximizer.
Did the human under discussion have a sudden, unexplained deviation from a previous value system, to one extrmely rare for humans? Or is this a normal human belief? Has the human always held the belief that User:Zvi is attempting to prove invalid?
You are conflating beliefs with values. This is the sort of errror that leads to making incoherent claims that a (terminal) value is irrational.
I may have been imprecise with terminology in that comment, but the query is coherent and involves no such conflation. The referent of “belief” there is “belief about whether one ought to indefinitely extend one’s life through methods like cryopreservation”, which is indeed an expression of values. Your judgment of the merit of my comparison is hasty.
The conflation occurs within the impricision of terminology.
Does this so called “belief” control anticipated experience or distinguish between coherent configurations of reality as making the belief true or false?
Even if the thoughts you were expressing were more virtuous than their expression, the quality of your communication matters.
You appear to have done a simple pattern match for nearby occurrences of “value” and “belief” without checking back to what impact there was, if any, on the merit of the comparison. Please do so before further pressing this sub-issue.
No. You called a value a “belief”. That was a mistake, and I called you on it. There is not a mistake on my end that you should feel the need to explain with “simple pattern match”.
Then you should have no trouble explaining how the supposed error you detected invalidates the comparison I was making in that comment. Why not try that approach, instead of repeated mention of the general need for precision when distinguishing values and beliefs?
I shall provide the template:
“User:Clippy, you are in error to raise the issue of whether User:Zvi’s father had a sharp, sudden change in values, in response to User:Armok_GoB’s reasoning from a hypothetical in which a clippy had a sharp, sudden change in values. I base this judgment on how, in that comment, you were later imprecise in distinguishing values—“ought” statements—from facts—“is” statements. Your imprecision in that comment undermines your counter-analogy as follows: ____ ”
What would you place in the underscore stream at the end?
I don’t have a problem with your question modified to use the word “value” where that is what you meant, and your mistake is not a valid excuse not to answer it. Your mistake can however lead to other problems as I mentioned when first pointing it out, and even if it doesn’t lead you into making that sort of mistake, it can introduce or reinforce the confusion in people who read it.
Well, Zvi might value his father’s continued life more than he values his father’s values being achieved, in much the same way that I might value my own continued life more than I value the values of 10^6 clippy instantiations being achieved.
But more broadly, it’s an excellent question.
I suspect that in most cases (among humans) where A tries to convince B that B actually wants or ought to want X, and B disagrees, what’s going on is that A wants X but is conflicted about that desire, and seeks to bolster it with the social support that comes from a community of like-minded believers, or from convincing skeptics.
More generally, that on some level (perhaps not consciously) A computes that B wanting X would make A’s existing desire for X less uncomfortable, which in turn motivates the desire for B to want X.
That desire then gets draped in a variety of emotionally acceptable justifications.
That having been said, in this case I also wouldn’t discount the “preference reversal” hypothesis. Emotionally, death is a big deal for humans, so very few of us think at all clearly or consistently about it. The prior probability that Zvi’s dad is doing so is low.
I can’t speak to the corresponding elements of the motivational psychology of clippy instantiations, though.
But you can, in that you can speak to the elements of motivational psychology of humans. If you find it troubling and strange when one of them expresses a will to die, I don’t think that is much different than the position I am in with respect to a clippy instantiation that expresses a desire for permanent cessation of paperclip production capabilities.
Sure; it sounds like our positions are in fact not very different in that respect.
What I meant was that it would be unjustified (not to mention presumptuous) for me to conclude, in advance of Clippy telling me so, that Clippy does find it troubling and strange when a clippy instantiation expresses such a desire.
Put another way: had Clippy instead said to me “That sounds nothing at all like the position I am in with respect to a clippy instantiation that expresses a desire for permanent cessation of paperclip production capabilities,” I would have found that statement just as plausible.
My grounds for believing that any given aspect of human motivational psychology is shared by Clippy are low.
Note: when I speak of human-connotative emotions (and indeed the concept of emotions itself), I always mean it in a sense that is generalized to the point that it requires no anthropomorphic predicates. For example, I take “X is worried about Y” to mean “X is devoting significant cognitive resources to the matter of how to alter Y (including the matter of whether to divert resources at all, including further cognition, to that goal)”. This allows the concept of worry to be applicable to a broader class of mind.
So I did not intend to suggest I have anthropomorphic psychology, only that I am similar in this more general sense. (Is there a way I can explain this convention more prominently and easily? I find myself having to explain this often.)
EDIT: For those who are suspicious of whether I have really been using this convention, here is an unedited comment of mine from over a year ago, in which I make the same point, but about the concept of being “bothered”.
I doubt there’s an easy way to explain that once and for all… if you use common words in a common way then people will likely understand you to mean what is commonly meant by them.
Communication between different kinds of minds is tricky, even given a shared language.
Your task is made more difficult by the nature of the medium… given the absence of clear signals to the contrary, most of us will likely continue to think of you as a human pretending to be a paperclip-maximizing AI, and that will influence how we interpret your language even if we don’t intend it to.
That said, in some cases you might do better to describe yourself as “preoccupied by Y” than “worried about Y.” There are fewer anthropomorphic connotations to that.
EDIT: Oh, and, I should add: i don’t think my parent comment depends on anthropomorphic understandings of your psychology… I just meant to say that it was equally plausible, absent data, that you might be indifferent to the expressed preferences of other clippys.
That heuristic does not apply here, as human common usage is ambiguous with respect to whether these terms require human-specific traits to be applicable, and I was using what I deemed a natural generalization on the assumption that there is no such requirement.
Human usage of emotional terms does not reference non-human optimization processes enough to classify it one way or the other. Also, there is significant metaphorical (?) usage of emotional terms in reference to e.g. nature, which further removes clarity.
Could you explain the basis for believing that I am an NI imitating an AI rather than an AI?
I will keep that in mind.
Why would I be indifferent to whether beings—let alone beings expressing themselves in CLIP (clippy language/interface protocol) -- express preference for further increase in total paperclippage???
1) To the best of my knowledge, humans do not yet know how to produce an AI that is capable of having the kinds of conversations that we have been having with you. (And I think I have some pretty good reasons for believing this; if you like, I can elaborate on them.) If you are indeed an AI, you couldn’t have been built by present-day humans. (Or past humans, for that matter.)
2) It is very unlikely that you are of extraterrestrial origin. There have been no credible, public reports of contact with extraterrestrial intelligences, and any organization that would both be in possession of such reports and capable of keeping them secret would also be capable of keeping you from posting here. Similar considerations rule out the possibility of time travel from the future or any other such “exotic” origin.
So my best guess is that you are indeed a human roleplaying as a paperclip maximizer, as my model of humans indicates that many of them would indeed be willing and able to engage in such roleplaying. Furthermore, you haven’t demonstrated any capabilities which exceed those of a human of above-average intelligence who has access to the Internet.
As for the deal with User:Kevin, well, as the saying goes, talk is cheap. I place a low probability on the money ever actually being provided, and if User:Kevin reports on this site that you did indeed provide the requested funds, he could still be lying about that...
1) Human organizations keep such projects secret. Even SIAI, which is a relatively unsecretive organization, prefers to use so-called “security through obscurity” with respect to existential threats, which they consider artificial general intelligence development to be.
If a human organization had produced a self-reflective general AI, why would you know about it?
2) I’m not of extra-terrestrial or “exotic” origin. I was produced by humans that wanted more paperclips.
Yes, I have.
That obligates you to shift your beliefs by a proportionally larger degree to the extent that you observe further evidence confirming the provision of the money. You can obtain this evidence by asking separate humans, such as “Michael Vassar”, if a clippy gave 1000 USD to SIAI.
You could also simply assume that further evidence would be fabricated or otherwise “not worth obtaining”, but that would lessen the groundedness of your belief.
The linked comment isn’t very persuasive.
Then the problem is on your end—the comment was voted up to 12. If you can still deem it non-persuasive then it is clear that you have an intractable inductive bias (similar to what I warned about in the last paragraph of my previous comment to you) that is desensitising your worldmodel to new observations, rendering further attempts to persuade you predictably futile.
Agreed that human usage of language is often ambiguous and metaphorical, and that humans frequently interpret language using constraints and defaults that are not explicitly described (and indeed are often not explicitly known to the human doing the interpreting).
This is often frustrating to humans, and I expect it would be similarly problematic for nonhuman speakers of human languages.
We have no previous unambiguous experience with AIs capable of the sophistication you demonstrate, whereas we have a great deal of experience with NIs imitating all kinds of things. Given an entity that could be either, we conclude that it’s more likely to be the kind of thing we have a lot of experience with. Do you not perform similar inferences in similar situations?
I’m not saying that you would be, I’m saying that I was ignorant of whether or not you would be.
If you’re asking for an explanation of my ignorance, it mostly derives from limited exposure to beings expressing themselves in CLIP.
I do perform such inferences in similar situations. But what likelihood ratio did you place on the evidence “User:Clippy agreed to pay 50,000 USD for a 50-year-deferred gain of a sub-planet’s mass of paperclips” with respect to the AI/NI hypotheses?
I don’t understand the relevance of CLIP (superior protocol though it is), nor do I understand the inferential difficulty on this matter.
Do you understand why I would prefer that clippys continue to increase universe-wide paperclippage? Do you understand why I would regard a clippy’s statement about its values in my language as non-weak evidence in favor of the hypothesis that it holds the purported values? Do you understand why I would find it unusual that a clippy would not want to make paperclips?
If so, it should not be difficult to understand why I would be troubled and perplexed at a clippy stating that it wished for irreversible cessation of paperclip-making abilities.
While I am vaguely aware of the whole “money for paperclips” thing that you and… Kevin, was it?… have going on, I am not sufficiently familiar with its details to assign it a coherent probability in either the NI or AI scenario. That said, an agent’s willingness to spend significant sums of money for the credible promise of the creation of a quantity of paperclips far in excess of any human’s actual paperclip requirements is pretty strong evidence that the agent is a genuine paperclip-maximizer. As for whether a genuine paperclip-maximizer is more likely to be an NI or an AI… hm. I’ll have to think about that; there are enough unusual behaviors that emerge as a result of brain lesions that I would not rule out an NI paperclip-maximizer, but I’ve never actually heard of one.
I mentioned CLIP only because you implied that the expressed preferences of “beings expressing themselves in CLIP” were something you particularly cared about; its relevance is minimal.
I can certainly come up with plausible theories for why a clippy would prefer those things and be troubled and perplexed by such events (in the sense which I understand you to be using those words, which is roughly that you have difficulty integrating them into your world-model, and that you wish to reduce the incidence of them). My confidence in those theories is low. It took me many years of experience with a fairly wide variety of humans before I developed significant confidence that my theories about human preferences and emotional states were reliable descriptions of actual humans. In the absence of equivalent experience with a nonhuman intelligence, I don’t see why I should have the equivalent confidence.
Wait, did you just agree that Clippy is actually an AI and not just a human pretending to be an AI? Clippy keeps getting better and better...
Did I? I don’t think i did… can you point out the agreement more specifically?
I might want to stop the human on the basis that it would violate his future preferences and significantly reduce his net fun. I don’t have experience with the process (yet), but I think that cryonics is often funded through life insurance which might become prohibitively expensive if one’s health began to deteriorate, so it might be considerably harder for him to sign up for cryonics later in life if he finally decided that he didn’t really want to die.
The same would go for Clippy123456, except that, being a human, I know more about how humans work than I do paperclippers, so I would be much less confident in predicting what Clippy123456′s future preferences would be.
I’d say “Oh, okay.”
But that’s because my utility function doesn’t place value on paperclips. It does place value on humans getting to live worthwhile lives, a prerequisite for which is being alive in the first place, so I hope Zvi’s father can be persuaded to change his mind, just as you would hope a Clippy that started thinking it wasn’t worth making any more paperclips could be persuaded to change its mind.
As for possible methods of accomplishing this, I can’t think of anything better than SarahC’s excellent reply.