You are allowed to try to talk me into murdering someone, e.g. by appealing to facts I do not know; or pointing out that I have other preferences at odds with that one, and challenging me to resolve them; or trying to present me with novel moral arguments. You are not allowed to hum a tune in such a way as to predictably cause a buffer overflow that overwrites the encoding of that preference elsewhere in my cortex
.. And?
Don’t you realize that this is just like word laddering? Any sufficiently powerful and dedicated agent can convince you to change your preferences one at a time. All the self-consistency constraints in the world won’t save you, because you are not perfectly consistent to start with, even if you are a digitally-optimized brain. No sufficiently large system is fully self-consistent, and every inconsistency is a lever. Brainwashing as you seem to conceive of it here, would be on the level of brute violence for an entity like CelestAI.. A very last resort.
No need to do that when you can achieve the same result in a civilized (or at least ‘civilized’) fashion. The journey to anywhere is made up of single steps, and those steps are not anything extraordinary, just a logical extension of the previous steps.
The only way to avoid that would be to specify consistency across a larger time span.. which has different problems (mainly that this means you are likely to be optimized in the opposite direction—in the direction of staticness—rather than optimized ‘not at all’ (i think you are aiming at this?) or optimized in the direction of measured change)
TLDR: There’s not really a meaningful way to say ‘hacking me is not allowed’ to a higher level intelligence, because you have to define ‘hacking’ to a level of accuracy that is beyond your knowledge and may not even be completely specifiable even in theory. Anything less will simply cause the optimization to either stall completely or be rerouted through a different method, with the same end result. If you’re happy with that, then ok—but if the outcome is the same, I don’t see how you could rationally favor one over the other.
It is possible for a Lars to exist, and prefer not to change anything about the way he lives his life, and prefer that he prefers that, in a coherent, self-endorsing structure, and there be nothing you can do about it.
It is, of course, the last point that I am contending here. I would not be contending it if I believed that it was possible to have something that was simultaneously remotely human and actually self-consistent. You can have Lars be one or the other, but not both, AFAICS.
Once she has the situation completely under control, however, she has no excuses left—absolute power is absolute responsibility.
This is the problem I’m trying to point out—that the absolutely responsible choice for a FAI may in some cases consist of these actions we would consider unambiguously abusive coming from a human being. CelestAI is in a completely different class from humans in terms of what can motivate her actions. FAI researchers are in the position of having to work out what is appropriate for an intelligence that will be on a higher level from them. Saying ‘no, never do X, no matter what’ is notobviously the correct stance to adopt here, even though it does guard against a range of bad outcomes. There probably is no answer that is both obvious and correct.
I’m puzzled. I read you as claiming that your notion of ‘strengthening people’ ought to be applied even in a fictional situation where everyone involved prefers otherwise. That’s kind of a moral claim.
In that case I miscommunicated. I meant to convey that if CelestAI was real, I would hold her to that standard, because the standards she is held to should necessarily be more stringent than a more flawed implementation of cognition like a human being.
I guess that is a moral claim. It’s certainly run by the part of my brain that tries to optimize things.
(And as for “animalisticness”… yes, technically you can use a word like that and still not be a moral realist, but seriously? You realise the connotations that are dripping off it, right?)
I mainly chose ‘animalisticness’ because I think that a FAI would probably model us much as we see animals—largely bereft of intent or consistency, running off primitive instincts.
I do take your point that I am attempting to aesthetically optimize Lars, although I maintain that even if no-one else is inconvenienced in the slightest, he himself is lessened by maintaining preferences that result in his systematic isolation.
.. And?
Don’t you realize that this is just like word laddering? Any sufficiently powerful and dedicated agent can convince you to change your preferences one at a time. All the self-consistency constraints in the world won’t save you, because you are not perfectly consistent to start with, even if you are a digitally-optimized brain. No sufficiently large system is fully self-consistent, and every inconsistency is a lever. Brainwashing as you seem to conceive of it here, would be on the level of brute violence for an entity like CelestAI.. A very last resort.
No need to do that when you can achieve the same result in a civilized (or at least ‘civilized’) fashion. The journey to anywhere is made up of single steps, and those steps are not anything extraordinary, just a logical extension of the previous steps.
The only way to avoid that would be to specify consistency across a larger time span.. which has different problems (mainly that this means you are likely to be optimized in the opposite direction—in the direction of staticness—rather than optimized ‘not at all’ (i think you are aiming at this?) or optimized in the direction of measured change)
TLDR: There’s not really a meaningful way to say ‘hacking me is not allowed’ to a higher level intelligence, because you have to define ‘hacking’ to a level of accuracy that is beyond your knowledge and may not even be completely specifiable even in theory. Anything less will simply cause the optimization to either stall completely or be rerouted through a different method, with the same end result. If you’re happy with that, then ok—but if the outcome is the same, I don’t see how you could rationally favor one over the other.
It is, of course, the last point that I am contending here. I would not be contending it if I believed that it was possible to have something that was simultaneously remotely human and actually self-consistent. You can have Lars be one or the other, but not both, AFAICS.
This is the problem I’m trying to point out—that the absolutely responsible choice for a FAI may in some cases consist of these actions we would consider unambiguously abusive coming from a human being. CelestAI is in a completely different class from humans in terms of what can motivate her actions. FAI researchers are in the position of having to work out what is appropriate for an intelligence that will be on a higher level from them. Saying ‘no, never do X, no matter what’ is not obviously the correct stance to adopt here, even though it does guard against a range of bad outcomes. There probably is no answer that is both obvious and correct.
In that case I miscommunicated. I meant to convey that if CelestAI was real, I would hold her to that standard, because the standards she is held to should necessarily be more stringent than a more flawed implementation of cognition like a human being. I guess that is a moral claim. It’s certainly run by the part of my brain that tries to optimize things.
I mainly chose ‘animalisticness’ because I think that a FAI would probably model us much as we see animals—largely bereft of intent or consistency, running off primitive instincts.
I do take your point that I am attempting to aesthetically optimize Lars, although I maintain that even if no-one else is inconvenienced in the slightest, he himself is lessened by maintaining preferences that result in his systematic isolation.