I’m aware that I’m some three years late on this, but I can’t help but disagree with you here. I’m all for having default safeguards on our Really Powerful Optimization Process to prevent people from condemning themselves to eternal hell or committing suicide on a whim—maybe something along the lines of the doctor’s Do No Harm. You could phrase it, if not formalize it, something like:
‘If a decision will predictably lead to consequences horrifying to those it affects, the system will refuse to help—if you wish to self destruct, you must do so with your own strength.’
Beyond that, though—the advantage to Libertarianism is that you can implement any other system you want in it. If you want the AI to remove low-value choices from your environment, you are welcome to instruct it to do that. If you want it to prohibit you, in the future, from subtler methods of self-destruction, you are free to do that too. These options were available to Phaethon in the series, and he refused to take them—because, frankly, the protagonist of those novels was an idiot.
It is certainly true that human choice is a fragile and whimsical thing—but that doesn’t mean that it has no value. If people are made unhappy by the default behavior of the system, they have the choice to change it, at least where it affects them. It feels wrong to me for you to suggest that we prohibit people from judging otherwise, forever, as part of the design for our superintelligences. Our lives have always been lived on the precipice of disaster, and we have always been given choices that limit those risks.
Just to make sure I understand the system you’re proposing: suppose there’s a Do No Harm rule like the one you propose, and I tell the AI to give me the option of “subtler methods of self-destruction” and the AI predicts that giving me that option is likely to lead to consequences that horrify someone it affects (or some more formal version of that condition).
In that case, the AI refuses to give me that option. Right?
If so, can you clarify how is that different from the OP’s proposed behavior in this case?
I should have clarified: I meant horrifying in a pretty extreme sense. Like, telling the machine to torture you forever, or destroy you completely, or remove your sense of boredom. .
Just doing something that, say, alienates all your friends wouldn’t qualify. Or loses all your money, if money is still a thing that makes sense. I was also including all the things that you CAN do with your own strength but probably shouldn’t. Building a machine to torture your upload forever wouldn’t be disallowed, but you might want to prohibit the system from letting you.
I meant the ‘Do No Harm’ rule to be a bare-minimal safeguard against producing a system with net negative utility because a small minority manage to put themselves into infinitely negative utility situations. Not to be a general-class ‘the system knows what is best’ measure, which is what it sounded to me like EY was proposing. Now, in his defense, this is probably, in the context of strong AI, a discussion of what the CEV of humanity might end up choosing wisely, but I don’t like it.
I don’t know that I agree with the OP’s proposed basis for distinction, but I at least have a reasonable feel for what it would preclude. (I would even agree that, given clients substantially like modern-day humans, precluding that stuff is reasonably ethical. That said, the notion that a system on the scale the OP is discussing would have clients substantially like modern-day humans and relate to them in a fashion substantially like the fictional example given strikes me as incomprehensibly absurd.)
I don’t quite understand the basis for distinction you’re suggesting instead. I mean, I understand the specific examples you’re listing for exclusion, of course (eternal torture, lack of boredom, complete destruction), but not what they have in common or how I might determine whether, for example, choosing to be eternally alienated from friendship should be allowed or disallowed. Is that sufficiently horrifying? How could one tell?
I do understand that you don’t mean the system to prevent, say, my complete self-destruction as long as I can build the tools to destroy myself without the system’s assistance. The OP might agree with you about that, I’m not exactly sure. I suspect I disagree, personally, though I admit it’s a tricky enough question that a lot depends on how I frame it.
I’m aware that I’m some three years late on this, but I can’t help but disagree with you here. I’m all for having default safeguards on our Really Powerful Optimization Process to prevent people from condemning themselves to eternal hell or committing suicide on a whim—maybe something along the lines of the doctor’s Do No Harm. You could phrase it, if not formalize it, something like:
‘If a decision will predictably lead to consequences horrifying to those it affects, the system will refuse to help—if you wish to self destruct, you must do so with your own strength.’
Beyond that, though—the advantage to Libertarianism is that you can implement any other system you want in it. If you want the AI to remove low-value choices from your environment, you are welcome to instruct it to do that. If you want it to prohibit you, in the future, from subtler methods of self-destruction, you are free to do that too. These options were available to Phaethon in the series, and he refused to take them—because, frankly, the protagonist of those novels was an idiot.
It is certainly true that human choice is a fragile and whimsical thing—but that doesn’t mean that it has no value. If people are made unhappy by the default behavior of the system, they have the choice to change it, at least where it affects them. It feels wrong to me for you to suggest that we prohibit people from judging otherwise, forever, as part of the design for our superintelligences. Our lives have always been lived on the precipice of disaster, and we have always been given choices that limit those risks.
Just to make sure I understand the system you’re proposing: suppose there’s a Do No Harm rule like the one you propose, and I tell the AI to give me the option of “subtler methods of self-destruction” and the AI predicts that giving me that option is likely to lead to consequences that horrify someone it affects (or some more formal version of that condition).
In that case, the AI refuses to give me that option. Right?
If so, can you clarify how is that different from the OP’s proposed behavior in this case?
I should have clarified: I meant horrifying in a pretty extreme sense. Like, telling the machine to torture you forever, or destroy you completely, or remove your sense of boredom. .
Just doing something that, say, alienates all your friends wouldn’t qualify. Or loses all your money, if money is still a thing that makes sense. I was also including all the things that you CAN do with your own strength but probably shouldn’t. Building a machine to torture your upload forever wouldn’t be disallowed, but you might want to prohibit the system from letting you.
I meant the ‘Do No Harm’ rule to be a bare-minimal safeguard against producing a system with net negative utility because a small minority manage to put themselves into infinitely negative utility situations. Not to be a general-class ‘the system knows what is best’ measure, which is what it sounded to me like EY was proposing. Now, in his defense, this is probably, in the context of strong AI, a discussion of what the CEV of humanity might end up choosing wisely, but I don’t like it.
I don’t know that I agree with the OP’s proposed basis for distinction, but I at least have a reasonable feel for what it would preclude. (I would even agree that, given clients substantially like modern-day humans, precluding that stuff is reasonably ethical. That said, the notion that a system on the scale the OP is discussing would have clients substantially like modern-day humans and relate to them in a fashion substantially like the fictional example given strikes me as incomprehensibly absurd.)
I don’t quite understand the basis for distinction you’re suggesting instead. I mean, I understand the specific examples you’re listing for exclusion, of course (eternal torture, lack of boredom, complete destruction), but not what they have in common or how I might determine whether, for example, choosing to be eternally alienated from friendship should be allowed or disallowed. Is that sufficiently horrifying? How could one tell?
I do understand that you don’t mean the system to prevent, say, my complete self-destruction as long as I can build the tools to destroy myself without the system’s assistance. The OP might agree with you about that, I’m not exactly sure. I suspect I disagree, personally, though I admit it’s a tricky enough question that a lot depends on how I frame it.