Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.
While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.
So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.
Thus when you state that
By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself.
I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.
Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.
ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.
I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.
If, unlike most lesswrong stalwarts, you (tenatively) believe likeme that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible.
Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.
Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.
While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.
So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.
Thus when you state that
I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.
Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.
ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.
What odds do you currently give to the “might” in your statement that
? 1 in 10? 1 in a million? 1 in 10^^^10?
I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.
Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.