Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There’s no reason in principle why a three-year-old who was “more the person he wished he was” would necessarily be a moral adult...
CEV does not mean considering the preferences of an agent who is “more moral”. There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer: a) one patterned on the current preferences of a typical three-year-old b) one patterned on the current preferences of a typical thirty-year old c) one that is actually safe to implement (aka “Friendly”)
I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it’s possible.
I would say that the gulf between B and C is similarly enormous, and I’m equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf.
Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver.
That’s why I said I don’t understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It’s only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it’s hard to say much for sure.
(nods) That follows from what you’ve said earlier.
I suspect we have very different understandings of how similar the 30-year-old’s desires are to their volition.
Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old… something that would elicit “Oh, cool, yeah, this is more or less what I had in mind” rather than “Holy Fucking Mother of God what kind of an insane world IS this?!?”?
For my own part, I consider the latter orders of magnitude more likely.
I conclude that I do not understand what you think the CEV-extractor is doing.
Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There’s no reason in principle why a three-year-old who was “more the person he wished he was” would necessarily be a moral adult...
CEV does not mean considering the preferences of an agent who is “more moral”. There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer:
a) one patterned on the current preferences of a typical three-year-old
b) one patterned on the current preferences of a typical thirty-year old
c) one that is actually safe to implement (aka “Friendly”)
I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it’s possible.
I would say that the gulf between B and C is similarly enormous, and I’m equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf.
Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver.
That’s why I said I don’t understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It’s only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it’s hard to say much for sure.
(nods) That follows from what you’ve said earlier.
I suspect we have very different understandings of how similar the 30-year-old’s desires are to their volition.
Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old… something that would elicit “Oh, cool, yeah, this is more or less what I had in mind” rather than “Holy Fucking Mother of God what kind of an insane world IS this?!?”?
For my own part, I consider the latter orders of magnitude more likely.
I’m pretty uncertain.