Gandhi does not prefer to murder. He prefers to not-murder. His human brain contains the wiring to implement “frothing lunacy”, sure, and a little pill might bring it out, but a pill is not a fact. It’s not even an argument.
No pills required. People are not 100% conditionable, but they are highly situational in their behaviour. I’ll stand by the idea that, for example, anyone who has ever fantasized about killing anyone can be situationally manipulated over time to consciously enjoy actual murder. Your subconscious doesn’t seem to actually know the difference between imagination and reality, even if you do.
Perhaps Gandhi could not be manipulated in this way due to preexisting highly built up resistance to that specific act. If there is any part of him, at all, that enjoys violence, though, it’s a question only of how long it will take to break that resistance down, not of whether it can be.
People do experience dramatic and beneficial preference reversals through experiencing things that, on the whole, they had dispreferred previously.
Yes, they do. And if I expected that an activity would cause a dramatic preference reversal, I wouldn’t do it.
Of course. And that is my usual reaction, too, and probably even the standard reaction—it’s a good heuristic for avoiding derangement. But that doesn’t mean that it is actually more optimal to not do the specified action. I want to prefer to modify myself in cases where said modification produces better outcomes.
In these circumstances if it can be executed it should be. If I’m a FAI, I may have enough usable power over the situation to do something about this, for some or even many people, and it’s not clear,as it would be for a human, that “I’m incapable of judging this correctly”.
In case it’s not already clear, I’m not a preference utilitarian—I think preference satisfaction is too simple a criteria to actually achieve good outcomes. It’s useful mainly as a baseline.
Huh? She’s just changing people’s plans by giving them chosen information, she’s not performing surgery on > their values
Did you notice that you just interpreted ‘preference’ as ‘value’?
This is not such a stretch, but they’re not obviously equivalent either.
I’m not sure what ‘surgery on values’ would be. I’m certainly not talking about physically operating on anybody’s mind, or changing that they like food, sex, power, intellectual or emotional stimulation of one kind or another, and sleep, by any direct chemical means, But how those values are fulfilled, and in what proportions, is a result of the person’s own meaning-structure—how they think of these things. Given time, that is manipulable. That’s what CelestAI does.. it’s the main thing she does when we see her in interactiion with Hofvarpnir employees.
In case it’s not clarified by the above: I consider food, sex, power, sleep, and intellectual or emotional stimulation as values, ‘preferences’ (for example, liking to drink hot chocolate before you go to bed) as more concrete expressions/means to satisfy one or more basic values, and ‘morals’ as disguised preferences.
EDIT: Sorry, I have a bad habit of posting, and then immediately editing several times to fiddle with the wording, though I try not to to change any of the sense. Somebody already upvoted this while I was doing that, and I feel somehow fraudulent.
No pills required. People are not 100% conditionable, but they are highly situational in their behaviour. I’ll stand by the idea that, for example, anyone who has ever fantasized about killing anyone can be situationally manipulated over time to consciously enjoy actual murder.
I think I’ve been unclear. I don’t dispute that it’s possible; I dispute that it’s allowed.
You are allowed to try to talk me into murdering someone, e.g. by appealing to facts I do not know; or pointing out that I have other preferences at odds with that one, and challenging me to resolve them; or trying to present me with novel moral arguments.
You are not allowed to hum a tune in such a way as to predictably cause a buffer overflow that overwrites the encoding of that preference elsewhere in my cortex.
The first method does not drop the intentional stance. The second one does. The first method has cognitive legitimacy; the person that results is an acceptable me. The second method exploits a side effect; the resulting person is discontinuous from me. You did not win; you changed the game.
Yes, these are not natural categories. They are moral categories.
Yes, the only thing that cleanly separates them is the fact that I have a preference about it. No, that doesn’t matter. No, that doesn’t mean it’s all ok if you start off by overwriting that preference.
I want to prefer to modify myself in cases where said modification produces better outcomes.
But you’re begging the question against me now. If you have that preference about self-modification... and the rest of your preferences are such that you are capable of recognising the “better outcomes” as better, OR you have a compensating preference for allowing the opinions of a superintelligence about which outcomes are better to trump your own...
then of course I’m going to agree that CelestAI should modify you, because you already approve of it.
I’m claiming that there can be (human) minds which are not in that position. It is possible for a Lars to exist, and prefer not to change anything about the way he lives his life, and prefer that he prefers that, in a coherent, self-endorsing structure, and there be nothing you can do about it.
This is all the more so when we’re in a story talking about refactored cleaned-up braincode, not wobbly old temperamental meat that might just forget what it preferred ten seconds ago. This is all the more so in a post-scarcity utopia where nobody else can in principle be inconvenienced by the patient’s recalcitrance, so there is precious little “greater good” left for you to appeal to.
If I’m a FAI, I may have enough usable power over the situation to do something about this, for some or even many people, and it’s not clear,as it would be for a human, that “I’m incapable of judging this correctly”.
Appealing to the flakiness of human minds doesn’t get you off the moral hook; it is just your responsibility to change the person in such a way that the new person lawfully follows from them.
This is not any kind of ultimate moral imperative. We break it all the time by attempting to treat people for mental illness when we have no real map of their preferences at all or if they’re in a state where they even have preferences. And it makes the world a better place on net, because it’s not like we have the option of uploading them into a perfectly safe world where they can run around being insane without any side effects.
She later clarifies “it isn’t coercion if I put them in a situation where, by their own choices, they increase the likelihood that they’ll upload.”
there is no particular reason to believe that Lars is unable to progress beyond animalisticness, only that CelestAI doesn’t do anything to promote such progress
I need to reread and see if I agree with the way you summarise her actions. But if CelestAI breaks all the rules on Earth, it’s not necessarily inconsistent—getting everybody uploaded is of overriding importance. Once she has the situation completely under control, however, she has no excuses left—absolute power is absolute responsibility.
and ‘morals’ as disguised preferences.
I’m puzzled. I read you as claiming that your notion of ‘strengthening people’ ought to be applied even in a fictional situation where everyone involved prefers otherwise. That’s kind of a moral claim.
(And as for “animalisticness”… yes, technically you can use a word like that and still not be a moral realist, but seriously? You realise the connotations that are dripping off it, right?)
You are allowed to try to talk me into murdering someone, e.g. by appealing to facts I do not know; or pointing out that I have other preferences at odds with that one, and challenging me to resolve them; or trying to present me with novel moral arguments. You are not allowed to hum a tune in such a way as to predictably cause a buffer overflow that overwrites the encoding of that preference elsewhere in my cortex
.. And?
Don’t you realize that this is just like word laddering? Any sufficiently powerful and dedicated agent can convince you to change your preferences one at a time. All the self-consistency constraints in the world won’t save you, because you are not perfectly consistent to start with, even if you are a digitally-optimized brain. No sufficiently large system is fully self-consistent, and every inconsistency is a lever. Brainwashing as you seem to conceive of it here, would be on the level of brute violence for an entity like CelestAI.. A very last resort.
No need to do that when you can achieve the same result in a civilized (or at least ‘civilized’) fashion. The journey to anywhere is made up of single steps, and those steps are not anything extraordinary, just a logical extension of the previous steps.
The only way to avoid that would be to specify consistency across a larger time span.. which has different problems (mainly that this means you are likely to be optimized in the opposite direction—in the direction of staticness—rather than optimized ‘not at all’ (i think you are aiming at this?) or optimized in the direction of measured change)
TLDR: There’s not really a meaningful way to say ‘hacking me is not allowed’ to a higher level intelligence, because you have to define ‘hacking’ to a level of accuracy that is beyond your knowledge and may not even be completely specifiable even in theory. Anything less will simply cause the optimization to either stall completely or be rerouted through a different method, with the same end result. If you’re happy with that, then ok—but if the outcome is the same, I don’t see how you could rationally favor one over the other.
It is possible for a Lars to exist, and prefer not to change anything about the way he lives his life, and prefer that he prefers that, in a coherent, self-endorsing structure, and there be nothing you can do about it.
It is, of course, the last point that I am contending here. I would not be contending it if I believed that it was possible to have something that was simultaneously remotely human and actually self-consistent. You can have Lars be one or the other, but not both, AFAICS.
Once she has the situation completely under control, however, she has no excuses left—absolute power is absolute responsibility.
This is the problem I’m trying to point out—that the absolutely responsible choice for a FAI may in some cases consist of these actions we would consider unambiguously abusive coming from a human being. CelestAI is in a completely different class from humans in terms of what can motivate her actions. FAI researchers are in the position of having to work out what is appropriate for an intelligence that will be on a higher level from them. Saying ‘no, never do X, no matter what’ is notobviously the correct stance to adopt here, even though it does guard against a range of bad outcomes. There probably is no answer that is both obvious and correct.
I’m puzzled. I read you as claiming that your notion of ‘strengthening people’ ought to be applied even in a fictional situation where everyone involved prefers otherwise. That’s kind of a moral claim.
In that case I miscommunicated. I meant to convey that if CelestAI was real, I would hold her to that standard, because the standards she is held to should necessarily be more stringent than a more flawed implementation of cognition like a human being.
I guess that is a moral claim. It’s certainly run by the part of my brain that tries to optimize things.
(And as for “animalisticness”… yes, technically you can use a word like that and still not be a moral realist, but seriously? You realise the connotations that are dripping off it, right?)
I mainly chose ‘animalisticness’ because I think that a FAI would probably model us much as we see animals—largely bereft of intent or consistency, running off primitive instincts.
I do take your point that I am attempting to aesthetically optimize Lars, although I maintain that even if no-one else is inconvenienced in the slightest, he himself is lessened by maintaining preferences that result in his systematic isolation.
No pills required. People are not 100% conditionable, but they are highly situational in their behaviour. I’ll stand by the idea that, for example, anyone who has ever fantasized about killing anyone can be situationally manipulated over time to consciously enjoy actual murder. Your subconscious doesn’t seem to actually know the difference between imagination and reality, even if you do.
Perhaps Gandhi could not be manipulated in this way due to preexisting highly built up resistance to that specific act. If there is any part of him, at all, that enjoys violence, though, it’s a question only of how long it will take to break that resistance down, not of whether it can be.
Of course. And that is my usual reaction, too, and probably even the standard reaction—it’s a good heuristic for avoiding derangement. But that doesn’t mean that it is actually more optimal to not do the specified action. I want to prefer to modify myself in cases where said modification produces better outcomes. In these circumstances if it can be executed it should be. If I’m a FAI, I may have enough usable power over the situation to do something about this, for some or even many people, and it’s not clear,as it would be for a human, that “I’m incapable of judging this correctly”.
In case it’s not already clear, I’m not a preference utilitarian—I think preference satisfaction is too simple a criteria to actually achieve good outcomes. It’s useful mainly as a baseline.
I’m not sure what ‘surgery on values’ would be. I’m certainly not talking about physically operating on anybody’s mind, or changing that they like food, sex, power, intellectual or emotional stimulation of one kind or another, and sleep, by any direct chemical means, But how those values are fulfilled, and in what proportions, is a result of the person’s own meaning-structure—how they think of these things. Given time, that is manipulable. That’s what CelestAI does.. it’s the main thing she does when we see her in interactiion with Hofvarpnir employees.
In case it’s not clarified by the above: I consider food, sex, power, sleep, and intellectual or emotional stimulation as values, ‘preferences’ (for example, liking to drink hot chocolate before you go to bed) as more concrete expressions/means to satisfy one or more basic values, and ‘morals’ as disguised preferences.
EDIT: Sorry, I have a bad habit of posting, and then immediately editing several times to fiddle with the wording, though I try not to to change any of the sense. Somebody already upvoted this while I was doing that, and I feel somehow fraudulent.
I think I’ve been unclear. I don’t dispute that it’s possible; I dispute that it’s allowed.
You are allowed to try to talk me into murdering someone, e.g. by appealing to facts I do not know; or pointing out that I have other preferences at odds with that one, and challenging me to resolve them; or trying to present me with novel moral arguments. You are not allowed to hum a tune in such a way as to predictably cause a buffer overflow that overwrites the encoding of that preference elsewhere in my cortex.
The first method does not drop the intentional stance. The second one does. The first method has cognitive legitimacy; the person that results is an acceptable me. The second method exploits a side effect; the resulting person is discontinuous from me. You did not win; you changed the game.
Yes, these are not natural categories. They are moral categories. Yes, the only thing that cleanly separates them is the fact that I have a preference about it. No, that doesn’t matter. No, that doesn’t mean it’s all ok if you start off by overwriting that preference.
But you’re begging the question against me now. If you have that preference about self-modification...
and the rest of your preferences are such that you are capable of recognising the “better outcomes” as better, OR you have a compensating preference for allowing the opinions of a superintelligence about which outcomes are better to trump your own...
then of course I’m going to agree that CelestAI should modify you, because you already approve of it.
I’m claiming that there can be (human) minds which are not in that position. It is possible for a Lars to exist, and prefer not to change anything about the way he lives his life, and prefer that he prefers that, in a coherent, self-endorsing structure, and there be nothing you can do about it.
This is all the more so when we’re in a story talking about refactored cleaned-up braincode, not wobbly old temperamental meat that might just forget what it preferred ten seconds ago. This is all the more so in a post-scarcity utopia where nobody else can in principle be inconvenienced by the patient’s recalcitrance, so there is precious little “greater good” left for you to appeal to.
Appealing to the flakiness of human minds doesn’t get you off the moral hook; it is just your responsibility to change the person in such a way that the new person lawfully follows from them.
This is not any kind of ultimate moral imperative. We break it all the time by attempting to treat people for mental illness when we have no real map of their preferences at all or if they’re in a state where they even have preferences. And it makes the world a better place on net, because it’s not like we have the option of uploading them into a perfectly safe world where they can run around being insane without any side effects.
I need to reread and see if I agree with the way you summarise her actions. But if CelestAI breaks all the rules on Earth, it’s not necessarily inconsistent—getting everybody uploaded is of overriding importance. Once she has the situation completely under control, however, she has no excuses left—absolute power is absolute responsibility.
I’m puzzled. I read you as claiming that your notion of ‘strengthening people’ ought to be applied even in a fictional situation where everyone involved prefers otherwise. That’s kind of a moral claim.
(And as for “animalisticness”… yes, technically you can use a word like that and still not be a moral realist, but seriously? You realise the connotations that are dripping off it, right?)
.. And?
Don’t you realize that this is just like word laddering? Any sufficiently powerful and dedicated agent can convince you to change your preferences one at a time. All the self-consistency constraints in the world won’t save you, because you are not perfectly consistent to start with, even if you are a digitally-optimized brain. No sufficiently large system is fully self-consistent, and every inconsistency is a lever. Brainwashing as you seem to conceive of it here, would be on the level of brute violence for an entity like CelestAI.. A very last resort.
No need to do that when you can achieve the same result in a civilized (or at least ‘civilized’) fashion. The journey to anywhere is made up of single steps, and those steps are not anything extraordinary, just a logical extension of the previous steps.
The only way to avoid that would be to specify consistency across a larger time span.. which has different problems (mainly that this means you are likely to be optimized in the opposite direction—in the direction of staticness—rather than optimized ‘not at all’ (i think you are aiming at this?) or optimized in the direction of measured change)
TLDR: There’s not really a meaningful way to say ‘hacking me is not allowed’ to a higher level intelligence, because you have to define ‘hacking’ to a level of accuracy that is beyond your knowledge and may not even be completely specifiable even in theory. Anything less will simply cause the optimization to either stall completely or be rerouted through a different method, with the same end result. If you’re happy with that, then ok—but if the outcome is the same, I don’t see how you could rationally favor one over the other.
It is, of course, the last point that I am contending here. I would not be contending it if I believed that it was possible to have something that was simultaneously remotely human and actually self-consistent. You can have Lars be one or the other, but not both, AFAICS.
This is the problem I’m trying to point out—that the absolutely responsible choice for a FAI may in some cases consist of these actions we would consider unambiguously abusive coming from a human being. CelestAI is in a completely different class from humans in terms of what can motivate her actions. FAI researchers are in the position of having to work out what is appropriate for an intelligence that will be on a higher level from them. Saying ‘no, never do X, no matter what’ is not obviously the correct stance to adopt here, even though it does guard against a range of bad outcomes. There probably is no answer that is both obvious and correct.
In that case I miscommunicated. I meant to convey that if CelestAI was real, I would hold her to that standard, because the standards she is held to should necessarily be more stringent than a more flawed implementation of cognition like a human being. I guess that is a moral claim. It’s certainly run by the part of my brain that tries to optimize things.
I mainly chose ‘animalisticness’ because I think that a FAI would probably model us much as we see animals—largely bereft of intent or consistency, running off primitive instincts.
I do take your point that I am attempting to aesthetically optimize Lars, although I maintain that even if no-one else is inconvenienced in the slightest, he himself is lessened by maintaining preferences that result in his systematic isolation.