By ‘death’ I assume you mean the usual process of organ failure, tissue necrosis, having what’s left of me dressed up and put in a fancy box, followed by chemical preservation, decomposition, and/or cremation? Considering the long-term recovery prospects, no, I don’t think I can imagine a form of torture worse than that, except perhaps dragging it out over a longer period of time or otherwise embellishing on it somehow.
This may be a simple matter of differing personal preferences. Could you please specify some form of torture, real or imagined, which you would consider worse than death?
There have been people who wanted to die for one reason or another, or claimed to at the time with apparent sincerity, and yet went on to achieve useful or at least interesting things. The same cannot be said of those who actually did die.
Actual death constitutes a more lasting type of harm than anything I’ve heard described as torture.
There’s a nihilism lurking here which seems at odds with your unconditional affirmation of life as better than death. You doubt that anything anyone has ever done was “useful”? How do you define useful?
Admittedly, my personal definition isn’t particularly rigorous. An invention or achievement is useful if it makes other people more able to accomplish their existing goals, or maybe if it gives them something to do when they’d otherwise be bored. It’s interesting (but not necessarily useful) if it makes people happy, is regarded as having artistic value, etc.
Relevant examples: Emperor Norton’s peaceful dispersal of a race riot was useful. His proposal to construct a suspension bridge across San Francisco Bay would have been useful, had it been carried out. Sylvia Plath’s work is less obviously useful, but definitely interesting.
Most versions of torture, continued for your entire existence. You finally cease when you otherwise would (at the heat death of the universe, if nothing else), but your entire experience spent being tortured. The type isn’t really important, at that point.
First, the scenario you describe explicitly includes death, and as such falls under the ‘embellishments’ exception.
Second, thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring. As you said,
The type isn’t really important, at that point.
Third, if I ever run out of other active goals to pursue, I could always fall back on “defeat/destroy the eternal tormetor of all mankind.” Even with negligible chance of success, some genuinely heroic quest like that makes for a far better waste of my time and resources than, say, lottery tickets.
First, the scenario you describe explicitly includes death, and as such falls under the ‘embellishments’ exception.
You’re going to die (or at least cease) eventually, unless our understanding of physics changes significantly. Eventually, you’ll run out of negentropy to run your thoughts. My scenario only changes what happens between then and now.
Failing that, you can just be tortured eternally, with no chance of escape (no chance of escape is unphysical, but so is no chance of death). Even if the torture becomes boring (and there may be ways around that), an eternity of boredom, with no chance to succeed any at any goal, seems worse than death to me.
and as such falls under the ‘embellishments’ exception.
When considering the potential harm you could suffer from a superintelligence that values harming you, you don’t get to exclude some approaches it could take because they are too obvious. Superintelligences take obvious wins.
thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring.
Perhaps. So consider other approaches the hostile superintelligence might take. It’s not going to go easy on you.
Yes, I’ve considered the possibility of things like inducement of anteriograde amnesia combined with application of procedure 110-Montauk, and done my best to consider nameless horrors beyond even that.
As I understand it, a superintelligence derived from a sadistic, sociopathic human upload would have some interest in me as a person capable of suffering, while a superintelligence with strictly artificial psychology and goals would more likely be interested in me as a potential resource, a poorly-defended pile of damp organic chemistry. Neither of those is anywhere near my ideal outcome, of course, but in the former, I’ll almost certainly be kept alive for some perceptible length of time. As far as I’m concerned, while I’m dead, my utility function is stuck at 0, but while I’m alive my utility function is equal to or greater than zero.
Furthermore, even a nigh-omnipotent sociopath might be persuaded to torture on a strictly consensual basis by appealing to exploitable weaknesses in the legacy software. The same cannot be said of a superintelligence deliberately constructed without such security flaws, or one which wipes out humanity before it’s flaws can be discovered.
Neither of these options is actually good, but the human-upload ‘bad end’ is at least, from my perspective, less bad. That’s all I’m asserting.
Yes, the superintelligence that takes an interest in harming you would have to come from some optimized process, like recursive self improvement of a psychopath upload.
A sufficient condition for the superintelligence to be indifferent to your well being, and see you as spare parts, is an under optimized utility function.
Your approach to predicting what the hostile superintelligence would do to you, seems to be figuring out the worst sort of torture that you can imagine. The problem with this is that the superintelligence is a lot smarter, and more creative than you. Reading your mind and making real you worst fears, constantly with no break or rest, isn’t nearly as bad as what it would come up with. And no, you are not going to find some security flaw you can exploit to defeat it, or even slow it down. For one thing, the only way you will be able to think straight is if it determines that this maximizes the harm you experience. But the big reason is recursive self improvement. The superintelligence will analyze itself and fix security holes. You, puny mortal, will be up against a superintelligence. You will not win.
As far as I’m concerned, while I’m dead, my utility function is stuck at 0, but while I’m alive my utility function is equal to or greater than zero.
If you knew you were going to die tomorrow, would you now have a preference for what happens to the universe afterwards?
A superintelligence based on an uploaded human mind might retain exploits like ‘pre-existing honorable agreements’ or even ‘mercy’ because it considers them part of it’s own essential personality. Recursive self-improvement doesn’t just mean punching some magical enhance button exponentially fast.
If you knew you were going to die tomorrow,
My preferences would be less relevant, given the limited time and resources I’d have with which to act on them. They wouldn’t be significantly changed, though.
I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.
A superintelligence based on an uploaded human mind might retain exploits like ‘pre-existing honorable agreements’ or even ‘mercy’ because it considers them part of it’s own essential personality.
If we are postulating a superintelligence that values harming you, let’s really postulate that. In the early phases of recursive self improvement, it will figure out all the principles of rationality we have discussed here, including the representation of preferences as a utility function. It will self-modify to maximize a utility function that best represents its precursor conflicting desires, including hurting others and mercy. If it truly started as a psychopath, the desire to hurt others is going to dominate. As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.
Recursive self-improvement doesn’t just mean punching some magical enhance button exponentially fast.
I was never suggesting it was anything magical. Software security, given physical security of the system, really is not that hard. The reason we have security holes in computer software today is that most programmers, and the people they work for, do not care about security. But a self improving intelligence will at some point learn to care about its software level security (as an instrumental value), and it will fix vulnerabilities in its next modification.
My preferences would be less relevant, given the limited time and resources I’d have with which to act on them. They wouldn’t be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.
Is it fair to say that you prefer A: you die tomorrow and the people you currently care about will continue to have worthwhile lives and survive to a positive singularity, to B: you die tomorrow and the people you currently care about also die tomorrow?
If yes, then “while I’m dead, my utility function is stuck at 0” is not a good representation of your preferences.
As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.
Conflicts will be resolved, yes, but preferences will remain. A fully self-consistent psychopath might still enjoy weeping more than screams, crunches more than spurts, and certain victim responses could still be mood-breaking. It wouldn’t be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I’m still pretty sure I’d rather have that than not exist at all.
Is it fair to say that you prefer
For my preference to be meaningful, I have to be aware of the distinction. I’d certainly be happier during the last moments of my life with a stack of utilons wrapped up in the knowledge that those I love would do alright without me, but I would stop being happy about that when the parts of my brain that model future events and register satisfaction shut down for the last time and start to rot.
If cryostasis pans out, or, better yet, the positive singularity in scenario A includes reconstruction sufficient to work around the lack of it, there’s some non-negligible chance that I (or something functionally indistinguishable from me) would stop being dead, in which case I pop back up to greater-than-zero utility. Shortly thereafter, I would get further positive utility as I find out about good stuff that happened while I was out.
It wouldn’t be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I’m still pretty sure I’d rather have that than not exist at all.
Again, its preferences are not going to be manipulated by someone at your level, even ignoring your reduced effectiveness from being constantly tortured. Whatever you think you can offer as part of a deal, it can unilaterally take from you. (And really, a psychopathic torturer does not want you to simulate its favorite reaction, it wants to find the specific torture that naturally causes you to react in its favorite way. It does not care about your cooperation at all.)
For my preference to be meaningful, I have to be aware of the distinction.
You seem to be confusing your utility with your calculation of your utility function. I expect that this confusion would cause you to wirehead, given the chance. Which of the following would you choose, if you had to choose between them:
Choice C: Your loved ones are separated from you but continue to live worthwhile lives. Meanwhile, you are given induce amnesia, and false memories of your loved ones dying.
Choice D: You are placed in a simulation separated from the rest of the world, and your loved ones are all killed. You are given induced amnesia, and believe yourself to be in the real world. You do not have detailed interactions with your loved ones (they are not simulated in such detail that they can be considered alive in the simulation), but you receive regular reports that they are doing well. These reports are false, but you believe them.
If cryostasis pans out...
In the scenario I described, death is actual death, after which you cannot be brought back. It is not what current legal and medical authorities falsely believe to be that state.
I’ve had a rather unsettling night’s sleep, contemplating scenarios where I’m forced to choose between slight variations on violations of my body and mind, disconnect from reality, and loss of everyone I’ve ever loved. It was worth it, though, since I’ve come up with a less convenient version:
If choice D included, within the simulation, versions of my loved ones that were ultimately hollow, but convincing enough that I could be satisfied with them by choosing not to look too closely, and further if the VR included a society with complex, internally-consistent dynamics of a sort that are impossible in the real world but endlessly fascinating to me, and if in option C I would know that such a virtual world existed but be permanently denied access to it (in such a way that seemed consistent with the falsely-remembered death of my loved ones), that would make D quite a bit more tempting.
However, I would still chose the ‘actual reality’ option, because it has better long-term recovery prospects. In that situation, my loved ones aren’t actually dead, so I’ve got some chance of reconnecting with them or benefiting by the indirect consequences of their actions; my map is broken, but I still have access to the territory, so it could eventually be repaired.
Ok, that is a better effort to find a less convenient world, but you still seem to be avoiding the conflict between optimizing the actual state of reality and optimizing your perception of reality.
Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.
More generally, if you come up with some reason why optimizing your expected experience of your loved ones happens to produce the same result as optimizing the actual lives of your loved ones, despite the dilemma being constructed to introduce a disconnect between these concepts, then imagine that reason does not work. Imagine the dilemma is tightened to eliminate that reason. For purposes of this thought experiment, don’t worry if this requires you to occupy some epistemic state that humans can not ordinarily achieve, or strange arbitrary powers for the agents forcing you to make this decision. Because planning a reaction for this absurd scenario is not the point. The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.
My own answer to this dilemma is options C, because then my loved ones are actually alive and well, full stop.
Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.
Fair enough. I’d still pick C, since it also includes the options of finding someone else to be with, or somehow coming to terms with living alone.
The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.
Thank you for clarifying that.
Most of all, I want to stay alive, or if that’s not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.
If the ‘survival’ situation seems to be under control, my next priority is constructing predictive models. This requires sensory input and thought, preferably conscious thought. I’m not terribly picky about what sort of sensory input exactly, but more is better (so long as my ability to process it can keep up, of course).
After modeling it gets complicated. I want to be able to effect changes in my surroundings, but a hammer does me no good without the ability to predict that striking a nail will change the nail’s position. If my perceptions are sufficiently disconnected from reality that the connection can never be reestablished, objective two is in an irretrievable failure state, and any higher goal is irrelevant.
That leaves survival. Neither C nor D explicitly threatens my own life, but with perma-death on the table, either of them might mean me expiring somewhere down the line. D explicitly involves my loved ones (all or at least most of whom are members of my species) being killed for arbitrary, nonrepeatable reasons, which constitutes a marginal reduction in genetic diversity without corresponding increase in fitness for any conceivable, let alone relevant, environment.
So, I suppose I would agree with you in choosing C primarily because it would leave my loved ones alive and well.
Most of all, I want to stay alive, or if that’s not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.
You are allowed to assign intrinsic, terminal value to your loved ones’ well being, and to choose option C because it better achieves that terminal value, without having to justify it further by appeals to inclusive genetic fitness. Knowing this, do you still say you are choosing C because of a small difference in genetic diversity?
But, getting back to the reason I presented the dilemma, it seems that you do in fact have preferences over what happens after you die, and so your utility function, representing your preferences over possible futures that you would now attempt to bring about, cannot be uniformly 0 in the cases where you are dead.
I am not claiming to have inherited anything from evolution itself. The blind idiot god has no DNA of it’s own, nor could it have preached to a younger, impressionable me. I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it’s a fountain for so much of the stuff I instinctively value.
Part of objective two is modeling my own probable responses, so an equally-accurate model of my preferences with lower Kolmogorov complexity has intrinsic value as well. Of course, I can’t be totally sure that it’s accurate, but that particular hasn’t let me down so far, and if it did (and I survived) I would replace it with one that better fit the data.
If my species survives, there’s some possibility that my utility function, or one sufficiently similar as to be practically indistinguishable, will be re-instantiated at some point. Even without resurrection, cryostasis, or some other clear continuity, enough recombinant exploration of the finite solution-space for ‘members of my species’ will eventually result in repeats. Admittedly, the chance is slim, which is why I overwhelmingly prefer the more direct solution of immortality through not dying.
In short, yes, I’ve thought this through and I’m pretty sure. Why do you find that so hard to believe?
The entire post above is actually a statement that you value the survival of our species instrumentally, not intrinsically. If it were an intrinsic value for you, then contemplating any future in which humanity becomes smarter and happier and eventually leaves behind the old bug-riddled bodies we started with, should fill you with indescribable horror. And in my experience, very few people feel that way, and many of those who do (i.e. Leon Kass) do so as an outgrowth of a really strong signaling process.
I don’t object to biological augmentations, and I’m particularly fond of the idea of radical life-extension. Having our bodies tweaked, new features added and old bugs patched, that would be fine by me. Kidneys that don’t produce stones, but otherwise meet or exceed the original spec? Sign me up!
If some sort of posthumans emerged and decided to take care of humans in a manner analogous to present-day humans taking care of chimps in zoos, that might be weird, but having someone incomprehensibly intelligent and powerful looking out for my interests would be preferable to a poke in the eye with a sharp stick.
If, on the other hand, a posthuman appears as a wheel of fire, explains that it’s smarter and happier than I can possibly imagine and further that any demographic which could produce individuals psychologically equivalent to me is a waste of valuable mass, so I need to be disassembled now, that’s where the indescribable horror kicks in. Under those circumstances, I would do everything I could do to keep being, or set up some possibility of coming back, and it wouldn’t be enough.
You’re right. Describing that value as intrinsic was an error in terminology on my part.
I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it’s a fountain for so much of the stuff I instinctively value.
Right, because if you forgot everything else that you value, you would be able to rederive that you are an agent as described in Thou Art Godshatter:
Such agents would have sex only as a means of reproduction, and wouldn’t bother with sex that involved birth control. They could eat food out of an explicitly reasoned belief that food was necessary to reproduce, not because they liked the taste, and so they wouldn’t eat candy if it became detrimental to survival or reproduction. Post-menopausal women would babysit grandchildren until they became sick enough to be a net drain on resources, and would then commit suicide.
Or maybe not. See, the value of a theory is not just what can explain, but what it can’t explain. It is not enough that your fountain generates your values, it also must not generate any other values.
Did you miss the part where I said that the value I place on the survival of my species is secondary to my own personal survival?
I recognize that, for example, nonreproductive sex has emotional consequences and social implications. Participation in a larger social network provides me with access to resources of life-or-death importance (including, but certainly not limited to, modern medical care) that I would be unable to maintain, let alone create, on my own. Optimal participation in that social network seems to require at least one ‘intimate’ relationship, to which nonreproductive sex can contribute.
As for what my theory can’t explain: If I ever take up alcohol use for social or recreational purposes, that would be very surprising; social is subsidiary to survival, and fun is something I have when I know what’s going on. Likewise, it would be a big surprise if I ever attempt suicide. I’ve considered possible techniques, but only as an academic exercise, optimized to show the subject what a bad idea it is while there’s still time to back out. I can imagine circumstances under which I would endanger my own health, or even life, to save others, but I wouldn’t do so lightly. It would most likely be part of a calculated gambit to accept a relatively small but impressive-looking immediate risk in exchange for social capital necessary to escape larger long-term risks. The idea of deliberately distorting my own senses and/or cognition is bizarre; I can accept other people doing so, provided they don’t hurt me or my interests in the process, but I wouldn’t do it myself. Taking something like caffeine or Provigil for the cognitive benefits would seem downright Faustian, and I have a hard time imagining myself accepting LSD unless someone was literally holding a gun to my head. I could go on.
My first instinct is that I would take C over D, on the grounds that if I think they’re dead, I’ll eventually be able to move on, whereas vague but somehow persuasive reports that they’re alive and well but out of my reach would constitute a slow and inescapable form of torture that I’m altogether too familiar with already. Besides, until the amnesia sets in I’d be happy for them.
Complications? Well, there’s more than just warm fuzzies I get from being near these people. I’ve got plans, and honorable obligations which would cost me utility to violate. But, dammit, permanent separation means breaking those promises—for real and in my own mind—no matter which option I take, so that changes nothing. Further efforts to extract the intended distinction are equally fruitless.
I don’t think I would wirehead, since that would de-instantiate my current utility function just as surely as death would. On the contrary, I scrupulously avoid mind-altering drugs, including painkillers, unless the alternative is incapacitation.
Think about it this way: if my utility function isn’t instantiated at any given time, why should it be given special treatment over any other possible but nonexistent utility function? Should the (slightly different) utility function I had a year ago be able to dictate my actions today, beyond the degree to which it influenced my environment and ongoing personal development?
If something was hidden from me, even something big (like being trapped in a virtual world), and hidden so thoroughly that I never suspected it enough for the suspicion to alter my actions in any measurable way, I wouldn’t care, because there would be no me which knew well enough to be able to care. Ideally, yes, the me that can see such hypotheticals from outside would prefer a map to match the territory, but at some point that meta-desire has to give way to practical concerns.
For my preference to be meaningful, I have to be aware of the distinction.
You’re aware of the distinction right now—would you be willing to act right now in a way which doesn’t affect the world in any major way during your lifetime, but which makes a big change after you die?
Edit: It seems to me as if you noted the fact that your utility function is no longer instantiated after you die, and confused that with the question of whether anything after your death matters to you now.
Would you be willing to act right now in a way which doesn’t affect the world in any major way during your lifetime, but which makes a big change after you die?
Of course I would. Why does a difference have to be “major” before I have permission to care? A penny isn’t much money, but I’ll still take the time to pick one up, if I see it on the floor and can do so conveniently. A moth isn’t much intelligence, or even much biomass, but if I see some poor thing thrashing, trapped in a puddle, I’ll gladly mount a fingertip-based rescue mission unless I’d significantly endanger my own interests by doing so.
Anything outside the light cone of my conscious mind is none of my business. That still leaves a lot of things I might be justifiably interested in.
My point didn’t relate to “major”—I wanted to point out that you care about what happens after you die, and therefore that your utility function is not uniformly 0 after you die. Yes, your utility function is no longer implemented by anything in the universe after you die—you aren’t there to care in person—but the function you implement now has terms for times after your death—you care now.
I would agree that I care now about things which have obvious implications for what will happen later, and that I would not care, or care very differently, about otherwise-similar things that lacked equivalent implications.
Beyond that, since my utility function can neither be observed directly, nor measured in any meaningful sense when I’m not alive to act on it, this is a distinction without a difference.
You can’t imagine torture that is worse than death?
By ‘death’ I assume you mean the usual process of organ failure, tissue necrosis, having what’s left of me dressed up and put in a fancy box, followed by chemical preservation, decomposition, and/or cremation? Considering the long-term recovery prospects, no, I don’t think I can imagine a form of torture worse than that, except perhaps dragging it out over a longer period of time or otherwise embellishing on it somehow.
This may be a simple matter of differing personal preferences. Could you please specify some form of torture, real or imagined, which you would consider worse than death?
Suppose I was tortured until I wanted to die. Would that count?
There have been people who wanted to die for one reason or another, or claimed to at the time with apparent sincerity, and yet went on to achieve useful or at least interesting things. The same cannot be said of those who actually did die.
Actual death constitutes a more lasting type of harm than anything I’ve heard described as torture.
There’s a nihilism lurking here which seems at odds with your unconditional affirmation of life as better than death. You doubt that anything anyone has ever done was “useful”? How do you define useful?
Admittedly, my personal definition isn’t particularly rigorous. An invention or achievement is useful if it makes other people more able to accomplish their existing goals, or maybe if it gives them something to do when they’d otherwise be bored. It’s interesting (but not necessarily useful) if it makes people happy, is regarded as having artistic value, etc.
Relevant examples: Emperor Norton’s peaceful dispersal of a race riot was useful. His proposal to construct a suspension bridge across San Francisco Bay would have been useful, had it been carried out. Sylvia Plath’s work is less obviously useful, but definitely interesting.
Most versions of torture, continued for your entire existence. You finally cease when you otherwise would (at the heat death of the universe, if nothing else), but your entire experience spent being tortured. The type isn’t really important, at that point.
First, the scenario you describe explicitly includes death, and as such falls under the ‘embellishments’ exception.
Second, thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring. As you said,
Third, if I ever run out of other active goals to pursue, I could always fall back on “defeat/destroy the eternal tormetor of all mankind.” Even with negligible chance of success, some genuinely heroic quest like that makes for a far better waste of my time and resources than, say, lottery tickets.
What if your hedonic treadmill were disabled, or bypassed by something like direct stimulation of your pain center?
You’re going to die (or at least cease) eventually, unless our understanding of physics changes significantly. Eventually, you’ll run out of negentropy to run your thoughts. My scenario only changes what happens between then and now.
Failing that, you can just be tortured eternally, with no chance of escape (no chance of escape is unphysical, but so is no chance of death). Even if the torture becomes boring (and there may be ways around that), an eternity of boredom, with no chance to succeed any at any goal, seems worse than death to me.
When considering the potential harm you could suffer from a superintelligence that values harming you, you don’t get to exclude some approaches it could take because they are too obvious. Superintelligences take obvious wins.
Perhaps. So consider other approaches the hostile superintelligence might take. It’s not going to go easy on you.
Yes, I’ve considered the possibility of things like inducement of anteriograde amnesia combined with application of procedure 110-Montauk, and done my best to consider nameless horrors beyond even that.
As I understand it, a superintelligence derived from a sadistic, sociopathic human upload would have some interest in me as a person capable of suffering, while a superintelligence with strictly artificial psychology and goals would more likely be interested in me as a potential resource, a poorly-defended pile of damp organic chemistry. Neither of those is anywhere near my ideal outcome, of course, but in the former, I’ll almost certainly be kept alive for some perceptible length of time. As far as I’m concerned, while I’m dead, my utility function is stuck at 0, but while I’m alive my utility function is equal to or greater than zero.
Furthermore, even a nigh-omnipotent sociopath might be persuaded to torture on a strictly consensual basis by appealing to exploitable weaknesses in the legacy software. The same cannot be said of a superintelligence deliberately constructed without such security flaws, or one which wipes out humanity before it’s flaws can be discovered.
Neither of these options is actually good, but the human-upload ‘bad end’ is at least, from my perspective, less bad. That’s all I’m asserting.
Yes, the superintelligence that takes an interest in harming you would have to come from some optimized process, like recursive self improvement of a psychopath upload.
A sufficient condition for the superintelligence to be indifferent to your well being, and see you as spare parts, is an under optimized utility function.
Your approach to predicting what the hostile superintelligence would do to you, seems to be figuring out the worst sort of torture that you can imagine. The problem with this is that the superintelligence is a lot smarter, and more creative than you. Reading your mind and making real you worst fears, constantly with no break or rest, isn’t nearly as bad as what it would come up with. And no, you are not going to find some security flaw you can exploit to defeat it, or even slow it down. For one thing, the only way you will be able to think straight is if it determines that this maximizes the harm you experience. But the big reason is recursive self improvement. The superintelligence will analyze itself and fix security holes. You, puny mortal, will be up against a superintelligence. You will not win.
If you knew you were going to die tomorrow, would you now have a preference for what happens to the universe afterwards?
A superintelligence based on an uploaded human mind might retain exploits like ‘pre-existing honorable agreements’ or even ‘mercy’ because it considers them part of it’s own essential personality. Recursive self-improvement doesn’t just mean punching some magical enhance button exponentially fast.
My preferences would be less relevant, given the limited time and resources I’d have with which to act on them. They wouldn’t be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.
If we are postulating a superintelligence that values harming you, let’s really postulate that. In the early phases of recursive self improvement, it will figure out all the principles of rationality we have discussed here, including the representation of preferences as a utility function. It will self-modify to maximize a utility function that best represents its precursor conflicting desires, including hurting others and mercy. If it truly started as a psychopath, the desire to hurt others is going to dominate. As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.
I was never suggesting it was anything magical. Software security, given physical security of the system, really is not that hard. The reason we have security holes in computer software today is that most programmers, and the people they work for, do not care about security. But a self improving intelligence will at some point learn to care about its software level security (as an instrumental value), and it will fix vulnerabilities in its next modification.
Is it fair to say that you prefer A: you die tomorrow and the people you currently care about will continue to have worthwhile lives and survive to a positive singularity, to B: you die tomorrow and the people you currently care about also die tomorrow?
If yes, then “while I’m dead, my utility function is stuck at 0” is not a good representation of your preferences.
Conflicts will be resolved, yes, but preferences will remain. A fully self-consistent psychopath might still enjoy weeping more than screams, crunches more than spurts, and certain victim responses could still be mood-breaking. It wouldn’t be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I’m still pretty sure I’d rather have that than not exist at all.
For my preference to be meaningful, I have to be aware of the distinction. I’d certainly be happier during the last moments of my life with a stack of utilons wrapped up in the knowledge that those I love would do alright without me, but I would stop being happy about that when the parts of my brain that model future events and register satisfaction shut down for the last time and start to rot.
If cryostasis pans out, or, better yet, the positive singularity in scenario A includes reconstruction sufficient to work around the lack of it, there’s some non-negligible chance that I (or something functionally indistinguishable from me) would stop being dead, in which case I pop back up to greater-than-zero utility. Shortly thereafter, I would get further positive utility as I find out about good stuff that happened while I was out.
Again, its preferences are not going to be manipulated by someone at your level, even ignoring your reduced effectiveness from being constantly tortured. Whatever you think you can offer as part of a deal, it can unilaterally take from you. (And really, a psychopathic torturer does not want you to simulate its favorite reaction, it wants to find the specific torture that naturally causes you to react in its favorite way. It does not care about your cooperation at all.)
You seem to be confusing your utility with your calculation of your utility function. I expect that this confusion would cause you to wirehead, given the chance. Which of the following would you choose, if you had to choose between them:
Choice C: Your loved ones are separated from you but continue to live worthwhile lives. Meanwhile, you are given induce amnesia, and false memories of your loved ones dying.
Choice D: You are placed in a simulation separated from the rest of the world, and your loved ones are all killed. You are given induced amnesia, and believe yourself to be in the real world. You do not have detailed interactions with your loved ones (they are not simulated in such detail that they can be considered alive in the simulation), but you receive regular reports that they are doing well. These reports are false, but you believe them.
In the scenario I described, death is actual death, after which you cannot be brought back. It is not what current legal and medical authorities falsely believe to be that state.
You should probably read about The Least Convenient Possible World.
I’ve had a rather unsettling night’s sleep, contemplating scenarios where I’m forced to choose between slight variations on violations of my body and mind, disconnect from reality, and loss of everyone I’ve ever loved. It was worth it, though, since I’ve come up with a less convenient version:
If choice D included, within the simulation, versions of my loved ones that were ultimately hollow, but convincing enough that I could be satisfied with them by choosing not to look too closely, and further if the VR included a society with complex, internally-consistent dynamics of a sort that are impossible in the real world but endlessly fascinating to me, and if in option C I would know that such a virtual world existed but be permanently denied access to it (in such a way that seemed consistent with the falsely-remembered death of my loved ones), that would make D quite a bit more tempting.
However, I would still chose the ‘actual reality’ option, because it has better long-term recovery prospects. In that situation, my loved ones aren’t actually dead, so I’ve got some chance of reconnecting with them or benefiting by the indirect consequences of their actions; my map is broken, but I still have access to the territory, so it could eventually be repaired.
Ok, that is a better effort to find a less convenient world, but you still seem to be avoiding the conflict between optimizing the actual state of reality and optimizing your perception of reality.
Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.
More generally, if you come up with some reason why optimizing your expected experience of your loved ones happens to produce the same result as optimizing the actual lives of your loved ones, despite the dilemma being constructed to introduce a disconnect between these concepts, then imagine that reason does not work. Imagine the dilemma is tightened to eliminate that reason. For purposes of this thought experiment, don’t worry if this requires you to occupy some epistemic state that humans can not ordinarily achieve, or strange arbitrary powers for the agents forcing you to make this decision. Because planning a reaction for this absurd scenario is not the point. The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.
My own answer to this dilemma is options C, because then my loved ones are actually alive and well, full stop.
Fair enough. I’d still pick C, since it also includes the options of finding someone else to be with, or somehow coming to terms with living alone.
Thank you for clarifying that.
Most of all, I want to stay alive, or if that’s not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.
If the ‘survival’ situation seems to be under control, my next priority is constructing predictive models. This requires sensory input and thought, preferably conscious thought. I’m not terribly picky about what sort of sensory input exactly, but more is better (so long as my ability to process it can keep up, of course).
After modeling it gets complicated. I want to be able to effect changes in my surroundings, but a hammer does me no good without the ability to predict that striking a nail will change the nail’s position. If my perceptions are sufficiently disconnected from reality that the connection can never be reestablished, objective two is in an irretrievable failure state, and any higher goal is irrelevant.
That leaves survival. Neither C nor D explicitly threatens my own life, but with perma-death on the table, either of them might mean me expiring somewhere down the line. D explicitly involves my loved ones (all or at least most of whom are members of my species) being killed for arbitrary, nonrepeatable reasons, which constitutes a marginal reduction in genetic diversity without corresponding increase in fitness for any conceivable, let alone relevant, environment.
So, I suppose I would agree with you in choosing C primarily because it would leave my loved ones alive and well.
Be careful about confusing evolution’s purposes with the purposes of the product of evolution. Is mere species survival what you want, or what you predict you want, as a result of inheriting evolutions’s values (which doesn’t actually work that way)?
You are allowed to assign intrinsic, terminal value to your loved ones’ well being, and to choose option C because it better achieves that terminal value, without having to justify it further by appeals to inclusive genetic fitness. Knowing this, do you still say you are choosing C because of a small difference in genetic diversity?
But, getting back to the reason I presented the dilemma, it seems that you do in fact have preferences over what happens after you die, and so your utility function, representing your preferences over possible futures that you would now attempt to bring about, cannot be uniformly 0 in the cases where you are dead.
I am not claiming to have inherited anything from evolution itself. The blind idiot god has no DNA of it’s own, nor could it have preached to a younger, impressionable me. I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it’s a fountain for so much of the stuff I instinctively value.
Part of objective two is modeling my own probable responses, so an equally-accurate model of my preferences with lower Kolmogorov complexity has intrinsic value as well. Of course, I can’t be totally sure that it’s accurate, but that particular hasn’t let me down so far, and if it did (and I survived) I would replace it with one that better fit the data.
If my species survives, there’s some possibility that my utility function, or one sufficiently similar as to be practically indistinguishable, will be re-instantiated at some point. Even without resurrection, cryostasis, or some other clear continuity, enough recombinant exploration of the finite solution-space for ‘members of my species’ will eventually result in repeats. Admittedly, the chance is slim, which is why I overwhelmingly prefer the more direct solution of immortality through not dying.
In short, yes, I’ve thought this through and I’m pretty sure. Why do you find that so hard to believe?
The entire post above is actually a statement that you value the survival of our species instrumentally, not intrinsically. If it were an intrinsic value for you, then contemplating any future in which humanity becomes smarter and happier and eventually leaves behind the old bug-riddled bodies we started with, should fill you with indescribable horror. And in my experience, very few people feel that way, and many of those who do (i.e. Leon Kass) do so as an outgrowth of a really strong signaling process.
I don’t object to biological augmentations, and I’m particularly fond of the idea of radical life-extension. Having our bodies tweaked, new features added and old bugs patched, that would be fine by me. Kidneys that don’t produce stones, but otherwise meet or exceed the original spec? Sign me up!
If some sort of posthumans emerged and decided to take care of humans in a manner analogous to present-day humans taking care of chimps in zoos, that might be weird, but having someone incomprehensibly intelligent and powerful looking out for my interests would be preferable to a poke in the eye with a sharp stick.
If, on the other hand, a posthuman appears as a wheel of fire, explains that it’s smarter and happier than I can possibly imagine and further that any demographic which could produce individuals psychologically equivalent to me is a waste of valuable mass, so I need to be disassembled now, that’s where the indescribable horror kicks in. Under those circumstances, I would do everything I could do to keep being, or set up some possibility of coming back, and it wouldn’t be enough.
You’re right. Describing that value as intrinsic was an error in terminology on my part.
Right, because if you forgot everything else that you value, you would be able to rederive that you are an agent as described in Thou Art Godshatter:
Or maybe not. See, the value of a theory is not just what can explain, but what it can’t explain. It is not enough that your fountain generates your values, it also must not generate any other values.
Did you miss the part where I said that the value I place on the survival of my species is secondary to my own personal survival?
I recognize that, for example, nonreproductive sex has emotional consequences and social implications. Participation in a larger social network provides me with access to resources of life-or-death importance (including, but certainly not limited to, modern medical care) that I would be unable to maintain, let alone create, on my own. Optimal participation in that social network seems to require at least one ‘intimate’ relationship, to which nonreproductive sex can contribute.
As for what my theory can’t explain: If I ever take up alcohol use for social or recreational purposes, that would be very surprising; social is subsidiary to survival, and fun is something I have when I know what’s going on. Likewise, it would be a big surprise if I ever attempt suicide. I’ve considered possible techniques, but only as an academic exercise, optimized to show the subject what a bad idea it is while there’s still time to back out. I can imagine circumstances under which I would endanger my own health, or even life, to save others, but I wouldn’t do so lightly. It would most likely be part of a calculated gambit to accept a relatively small but impressive-looking immediate risk in exchange for social capital necessary to escape larger long-term risks. The idea of deliberately distorting my own senses and/or cognition is bizarre; I can accept other people doing so, provided they don’t hurt me or my interests in the process, but I wouldn’t do it myself. Taking something like caffeine or Provigil for the cognitive benefits would seem downright Faustian, and I have a hard time imagining myself accepting LSD unless someone was literally holding a gun to my head. I could go on.
My first instinct is that I would take C over D, on the grounds that if I think they’re dead, I’ll eventually be able to move on, whereas vague but somehow persuasive reports that they’re alive and well but out of my reach would constitute a slow and inescapable form of torture that I’m altogether too familiar with already. Besides, until the amnesia sets in I’d be happy for them.
Complications? Well, there’s more than just warm fuzzies I get from being near these people. I’ve got plans, and honorable obligations which would cost me utility to violate. But, dammit, permanent separation means breaking those promises—for real and in my own mind—no matter which option I take, so that changes nothing. Further efforts to extract the intended distinction are equally fruitless.
I don’t think I would wirehead, since that would de-instantiate my current utility function just as surely as death would. On the contrary, I scrupulously avoid mind-altering drugs, including painkillers, unless the alternative is incapacitation.
Think about it this way: if my utility function isn’t instantiated at any given time, why should it be given special treatment over any other possible but nonexistent utility function? Should the (slightly different) utility function I had a year ago be able to dictate my actions today, beyond the degree to which it influenced my environment and ongoing personal development?
If something was hidden from me, even something big (like being trapped in a virtual world), and hidden so thoroughly that I never suspected it enough for the suspicion to alter my actions in any measurable way, I wouldn’t care, because there would be no me which knew well enough to be able to care. Ideally, yes, the me that can see such hypotheticals from outside would prefer a map to match the territory, but at some point that meta-desire has to give way to practical concerns.
You’re aware of the distinction right now—would you be willing to act right now in a way which doesn’t affect the world in any major way during your lifetime, but which makes a big change after you die?
Edit: It seems to me as if you noted the fact that your utility function is no longer instantiated after you die, and confused that with the question of whether anything after your death matters to you now.
Of course I would. Why does a difference have to be “major” before I have permission to care? A penny isn’t much money, but I’ll still take the time to pick one up, if I see it on the floor and can do so conveniently. A moth isn’t much intelligence, or even much biomass, but if I see some poor thing thrashing, trapped in a puddle, I’ll gladly mount a fingertip-based rescue mission unless I’d significantly endanger my own interests by doing so.
Anything outside the light cone of my conscious mind is none of my business. That still leaves a lot of things I might be justifiably interested in.
My point didn’t relate to “major”—I wanted to point out that you care about what happens after you die, and therefore that your utility function is not uniformly 0 after you die. Yes, your utility function is no longer implemented by anything in the universe after you die—you aren’t there to care in person—but the function you implement now has terms for times after your death—you care now.
I would agree that I care now about things which have obvious implications for what will happen later, and that I would not care, or care very differently, about otherwise-similar things that lacked equivalent implications.
Beyond that, since my utility function can neither be observed directly, nor measured in any meaningful sense when I’m not alive to act on it, this is a distinction without a difference.
It is truly astonishing how much pain someone can learn to bear—AdeleneDawner posted some relevant links a while ago.
Edit: I wasn’t considering an anti-fun agent, however—just plain vanilla suffering.