There is no clear bright line determining who is or is not a fundamentalist Christian.
There is no clear bright line determining what is or isn’t a clear bright line.
I agree that the line separating “human” from “non-human” is much clearer and brighter than that separating “fundamentalist Christian” from “non-fundamentalist Christian”, and I further agree that for minds like mine the difference between those two lines is very important. Something with a mind like mine can work with the first distinction much more easily than with the second.
So what?
A mind like mine doesn’t stand a chance of extrapolating a coherent volition from the contents of a group of target minds. Whatever X is, it isn’t a mind like mine.
If we don’t have such an X available, then it doesn’t matter what defining characteristic we use to determine the target group for CEV extrapolation, because we can’t extrapolate CEV from them anyway.
If we do have such an X available, then it doesn’t matter what lines are clear and bright enough for minds like mine to reliably work with; what matters is what lines are clear and bright enough for systems like X to reliably work with.
Right now, there pretty much is a clear bright line determining who is or is not human. And that clear bright line encompasses everyone we would possibly want to cooperate with.
I have confidence < .1 that either one of us can articulate a specification determining who is human that doesn’t either include or exclude some system that someone included in that specification would contest the inclusion/exclusion of.
I also have confidence < .1 that, using any definition of “human” you care to specify, the universe contains no nonhuman systems I would possibly want to cooperate with.
Your advisory board suggestion ignores the fact that we have to be able to cooperate prior to the invention of CEV deducers.
Sure, but so does your “include all humans” suggestion. We’re both assuming that there’s some way the AI-development team can convincingly commit to a policy P such that other people’s decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying “I’ll include all of humanity” isn’t good enough to ensure cooperation if nobody believes me.
I have confidence that, given a mechanism for getting from someone saying “I’ll include all of humanity” to everyone cooperating, I can work out a way to use the same mechanism to get from someone saying “I’ll include the Advisory Board, which includes anyone with enough power that I care whether they cooperate or not” to everyone I care about cooperating.
And you’re not describing a process for how the advisory board is decided either.
I said: “Then anyone with enough political clout to get in my way, I add to the Advisory Board.” That seems to me as well-defined a process as “I decide to include every human being.”
Different advisory boards may produce different groups of enfranchised minds.
Certainly.
So your suggestion doesn’t resolve the problem.
Can you say again which problem you’re referring to here? I’ve lost track.
In fact, I don’t see how putting a group of minds on the advisory board is any different than just making them the input to the CEV.
Absolutely agreed.
Consider the implications of that, though.
Suppose you have a CEV-extractor and we’re the only two people in the world, just for simplicity. You can either point the CEV-extractor at yourself, or at both of us. If you genuinely want me included, then it doesn’t matter which you choose; the result will be the same. Conversely, if the result is different, that’s evidence that you don’t genuinely want me included, even if you think you do.
Knowing that, why would you choose to point the CEV-extractor at both of us?
One reason for doing so might be that you’d precommitted to doing so (or some UDT equivalent), so as to secure my cooperation. Of course, if you can secure my cooperation without such a precommitment (say, by claiming you would point it at both of us), that’s even better.
Complicated or ambiguous schemes take more time to explain, get more attention, and risk folks spending time trying to gerrymander their way in instead of contributing to FAI.
I think any solution other than “enfranchise humanity” is a potential PR disaster.
Keep in mind that not everyone is that smart, and there are some folks who would make a fuss about disenfranchisement of others even if they themselves were enfranchised (and therefore, by definition, those they were making a fuss about would be enfranchised if they thought it was a good idea).
I agree there are potential ambiguity problems with drawing the line at humans, but I think the potential problems are bigger with other schemes.
Sure, but so does your “include all humans” suggestion. We’re both assuming that there’s some way the AI-development team can convincingly commit to a policy P such that other people’s decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying “I’ll include all of humanity” isn’t good enough to ensure cooperation if nobody believes me.
I agree there are potential problems with credibility, but that seems like a separate argument.
I have confidence that, given a mechanism for getting from someone saying “I’ll include all of humanity” to everyone cooperating, I can work out a way to use the same mechanism to get from someone saying “I’ll include the Advisory Board, which includes anyone with enough power that I care whether they cooperate or not” to everyone I care about cooperating.
It’s not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general.
I said: “Then anyone with enough political clout to get in my way, I add to the Advisory Board.” That seems to me as well-defined a process as “I decide to include every human being.”
With that scheme, you’re incentivizing folks to prove they have enough political clout to get in your way.
Moreover, humans aren’t perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way.
Why do you think that the right to vote in democratic countries is as clearly determined as it is? Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare.
One reason for doing so might be that you’d precommitted to doing so (or some UDT equivalent), so as to secure my cooperation. Of course, if you can secure my cooperation without such a precommitment (say, by claiming you would point it at both of us), that’s even better.
Again, this is a different argument about why people cooperate instead of defect. To a large degree, evolution hardwired us to cooperate, especially when others are trying to cooperate with us.
I agree that if the FAI project seems to be staffed with a lot of untrustworthy, selfish backstabbers, we should cast a suspicious eye on it regardless of what they say about their project.
Ultimately it probably doesn’t matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to what their actual intentions are.
It’s not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general.
That’s not clear to me.
Suppose the Blues and the Greens are political opponents. If I credibly commit to pointing my CEV-extractor at all the Blues, I gain the support of most Blues and the opposition of most Greens. If I say “at all Blues and Greens” instead, I gain the support of some of the Greens, but I lose the support of some of the Blues, who won’t want any part of a utopia patterned even partially on hateful Green ideologies.
This is almost undoubtedly foolish of the Blues, but I nevertheless expect it. As you say, people aren’t all that smart.
The question is, is the support I gain from the Greens by including them worth the support I lose from the Blues by including the Greens? Of course it depends. That said, the strong support of a sufficiently powerful small group is often more valuable than the weak support of a more powerful larger group, so I’m not nearly as convinced as you sound that saying “we’ll incorporate the values of both you and your hated enemies!” will get more net support than picking a side and saying “we’ll incorporate your values and not those of your hated enemies.”
With that scheme, you’re incentivizing folks to prove they have enough political clout to get in your way.
Sure, that’s true. Heck, they don’t have to prove it; if they give me enough evidence to consider it plausible, I’ll include ’em. So what?
Moreover, humans aren’t perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way.
I think you underestimate how threatening egalitarianism sounds to a lot of people, many of whom have a lot of power. Cf including those hateful Greens, above. That said, I suspect there’s probably ways to spin your “include everyone” idea in such a way that even the egalitarianism-haters will not oppose it too strongly. But I also suspect there’s ways to spin my “don’t include everyone” idea in such a way that even the egalitarianism-lovers will not oppose it too strongly.
Why do you think that the right to vote in democratic countries is as clearly determined as it is?
Because many people believe it represents power. That’s also why it’s not significantly more clearly determined. It’s also why that right is not universal.
Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare.
Sure, I agree. Nor would I recommend announcing that we’re restricting the advisory board to people of a certain IQ or higher, for analogous reasons. (Also it would be a silly thing to do, but that’s beside the point, we’re talking about sales and not implementation here.) I’m not sure why you bring it up. I also wouldn’t recommend (in my country) announcing restrictions based on skin color, income, religious affiliation, or a wide variety of other things.
On the other hand, in my country, we successfully exclude people below a certain age from voting, and I correspondingly expect announcing restrictions based on age to not be too big a deal. Mostly this is because young people have minimal political clout. (And as you say, this incentivizes young people to prove they have political clout, and sometimes they even try, but mostly nobody cares because in fact they don’t.)
Conversely, extending voting rights to everyone regardless of age would be a politically unfeasible PR nightmare, and I would not recommend announcing that we’re including everyone regardless of age (which I assume you would recommend, since 2-year-olds are human beings by many people’s bright line test), for similar reasons.
(Somewhat tangentially: extending CEV inclusion, or voting rights, to everyone regardless of age would force us as a matter of logic to either establish a cutoff at birth or not establish a cutoff at birth. Either way we’d then have stepped in the pile of cow manure that is U.S. abortion politics, where the only winning move is not to play. What counts as a human being simply isn’t as politically uncontroversial a question as you’re making it sound.)
Again, this is a different argument about why people cooperate instead of defect.
Sorry, you’ve lost me. Can you clarify what the different arguments you refer to here are, and why the difference between them matters in this context?
Ultimately it probably doesn’t matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to what their actual intentions are.
Once they succeed in building a CEV-extractor and a CEV-implementor, then yeah, their broadcast intentions probably don’t matter much. Until then, they can matter a lot.
What you see as the factors holding back people from cooperating with modern analogues of FAI projects? Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy?
As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism.
If you don’t think someone has enough political clout to bother with, they’ll be incentivized to prove you wrong. Even if you’re right most of the time, you’ll be giving yourself trouble.
I agree that very young humans are a potential difficult gray area. One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included.
Sorry, you’ve lost me. Can you clarify what the different arguments you refer to here are, and why the difference between them matters in this context?
FAI team trustworthiness is a different subject than optimal enfranchisement structure.
What you see as the factors holding back people from cooperating with modern analogues of FAI projects?
I’m not sure what those modern analogues are, but in general here are a few factors I see preventing people from cooperating on projects where both mutual cooperation and unilateral cooperation would be beneficial:
Simple error in calculating the expected value of cooperating.
Perceiving more value in obtaining higher status within my group by defending my group’s wrong beliefs about the project’s value than in defecting from my group by cooperating in the project
Perceiving more value in continuing to defend my previously articulated position against the project (e.g., in being seen as consistent or as capable of discharging earlier commitments) than in changing my position and cooperating in the project
Why do you ask?
Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy?
I suspect that would be an easier question to answer with anything other than “it depends” if I had a specific example to consider. In general, I expect that it depends on who is motivated to support the project now to what degree, and the specific enfranchisement policy under discussion, and what value they perceive in that policy.
As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism.
Sure, that’s probably true, at least for some values of “lean towards” (there’s a lot to be said here about actual support and signaled support but I’m not sure it matters). And it will likely remain true for as long as the FAI project in question only cares about the cooperation of wealthy, intelligent, rational modern folks, which they are well advised to continuing to do for as long as FAI isn’t a subject of particular interest to anyone else, and to stop doing as soon as possible thereafter.
If you don’t think someone has enough political clout to bother with, they’ll be incentivized to prove you wrong. Even if you’re right most of the time, you’ll be giving yourself trouble.
(shrug) Sure, there’s some nonzero expected cost to the brief window between when they start proving their influence and I concede and include them.
One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included.
Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor?
FAI team trustworthiness is a different subject than optimal enfranchisement structure.
It was mainly rhetorical; I tend to think that what holds back today’s FAI efforts is lack of rationality and inability of folks to take highly abstract arguments seriously.
Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor?
Potentially bad things that could happen from implementing the CEV of a two-year-old.
Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There’s no reason in principle why a three-year-old who was “more the person he wished he was” would necessarily be a moral adult...
CEV does not mean considering the preferences of an agent who is “more moral”. There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer: a) one patterned on the current preferences of a typical three-year-old b) one patterned on the current preferences of a typical thirty-year old c) one that is actually safe to implement (aka “Friendly”)
I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it’s possible.
I would say that the gulf between B and C is similarly enormous, and I’m equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf.
Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver.
That’s why I said I don’t understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It’s only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it’s hard to say much for sure.
(nods) That follows from what you’ve said earlier.
I suspect we have very different understandings of how similar the 30-year-old’s desires are to their volition.
Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old… something that would elicit “Oh, cool, yeah, this is more or less what I had in mind” rather than “Holy Fucking Mother of God what kind of an insane world IS this?!?”?
For my own part, I consider the latter orders of magnitude more likely.
There is no clear bright line determining what is or isn’t a clear bright line.
I agree that the line separating “human” from “non-human” is much clearer and brighter than that separating “fundamentalist Christian” from “non-fundamentalist Christian”, and I further agree that for minds like mine the difference between those two lines is very important. Something with a mind like mine can work with the first distinction much more easily than with the second.
So what?
A mind like mine doesn’t stand a chance of extrapolating a coherent volition from the contents of a group of target minds. Whatever X is, it isn’t a mind like mine.
If we don’t have such an X available, then it doesn’t matter what defining characteristic we use to determine the target group for CEV extrapolation, because we can’t extrapolate CEV from them anyway.
If we do have such an X available, then it doesn’t matter what lines are clear and bright enough for minds like mine to reliably work with; what matters is what lines are clear and bright enough for systems like X to reliably work with.
I have confidence < .1 that either one of us can articulate a specification determining who is human that doesn’t either include or exclude some system that someone included in that specification would contest the inclusion/exclusion of.
I also have confidence < .1 that, using any definition of “human” you care to specify, the universe contains no nonhuman systems I would possibly want to cooperate with.
Sure, but so does your “include all humans” suggestion. We’re both assuming that there’s some way the AI-development team can convincingly commit to a policy P such that other people’s decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying “I’ll include all of humanity” isn’t good enough to ensure cooperation if nobody believes me.
I have confidence that, given a mechanism for getting from someone saying “I’ll include all of humanity” to everyone cooperating, I can work out a way to use the same mechanism to get from someone saying “I’ll include the Advisory Board, which includes anyone with enough power that I care whether they cooperate or not” to everyone I care about cooperating.
I said: “Then anyone with enough political clout to get in my way, I add to the Advisory Board.” That seems to me as well-defined a process as “I decide to include every human being.”
Certainly.
Can you say again which problem you’re referring to here? I’ve lost track.
Absolutely agreed.
Consider the implications of that, though.
Suppose you have a CEV-extractor and we’re the only two people in the world, just for simplicity.
You can either point the CEV-extractor at yourself, or at both of us.
If you genuinely want me included, then it doesn’t matter which you choose; the result will be the same.
Conversely, if the result is different, that’s evidence that you don’t genuinely want me included, even if you think you do.
Knowing that, why would you choose to point the CEV-extractor at both of us?
One reason for doing so might be that you’d precommitted to doing so (or some UDT equivalent), so as to secure my cooperation. Of course, if you can secure my cooperation without such a precommitment (say, by claiming you would point it at both of us), that’s even better.
Complicated or ambiguous schemes take more time to explain, get more attention, and risk folks spending time trying to gerrymander their way in instead of contributing to FAI.
I think any solution other than “enfranchise humanity” is a potential PR disaster.
Keep in mind that not everyone is that smart, and there are some folks who would make a fuss about disenfranchisement of others even if they themselves were enfranchised (and therefore, by definition, those they were making a fuss about would be enfranchised if they thought it was a good idea).
I agree there are potential ambiguity problems with drawing the line at humans, but I think the potential problems are bigger with other schemes.
I agree there are potential problems with credibility, but that seems like a separate argument.
It’s not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general.
With that scheme, you’re incentivizing folks to prove they have enough political clout to get in your way.
Moreover, humans aren’t perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way.
Why do you think that the right to vote in democratic countries is as clearly determined as it is? Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare.
Again, this is a different argument about why people cooperate instead of defect. To a large degree, evolution hardwired us to cooperate, especially when others are trying to cooperate with us.
I agree that if the FAI project seems to be staffed with a lot of untrustworthy, selfish backstabbers, we should cast a suspicious eye on it regardless of what they say about their project.
Ultimately it probably doesn’t matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to what their actual intentions are.
That’s not clear to me.
Suppose the Blues and the Greens are political opponents. If I credibly commit to pointing my CEV-extractor at all the Blues, I gain the support of most Blues and the opposition of most Greens. If I say “at all Blues and Greens” instead, I gain the support of some of the Greens, but I lose the support of some of the Blues, who won’t want any part of a utopia patterned even partially on hateful Green ideologies.
This is almost undoubtedly foolish of the Blues, but I nevertheless expect it. As you say, people aren’t all that smart.
The question is, is the support I gain from the Greens by including them worth the support I lose from the Blues by including the Greens? Of course it depends. That said, the strong support of a sufficiently powerful small group is often more valuable than the weak support of a more powerful larger group, so I’m not nearly as convinced as you sound that saying “we’ll incorporate the values of both you and your hated enemies!” will get more net support than picking a side and saying “we’ll incorporate your values and not those of your hated enemies.”
Sure, that’s true.
Heck, they don’t have to prove it; if they give me enough evidence to consider it plausible, I’ll include ’em.
So what?
I think you underestimate how threatening egalitarianism sounds to a lot of people, many of whom have a lot of power. Cf including those hateful Greens, above. That said, I suspect there’s probably ways to spin your “include everyone” idea in such a way that even the egalitarianism-haters will not oppose it too strongly. But I also suspect there’s ways to spin my “don’t include everyone” idea in such a way that even the egalitarianism-lovers will not oppose it too strongly.
Because many people believe it represents power. That’s also why it’s not significantly more clearly determined. It’s also why that right is not universal.
Sure, I agree. Nor would I recommend announcing that we’re restricting the advisory board to people of a certain IQ or higher, for analogous reasons. (Also it would be a silly thing to do, but that’s beside the point, we’re talking about sales and not implementation here.) I’m not sure why you bring it up. I also wouldn’t recommend (in my country) announcing restrictions based on skin color, income, religious affiliation, or a wide variety of other things.
On the other hand, in my country, we successfully exclude people below a certain age from voting, and I correspondingly expect announcing restrictions based on age to not be too big a deal. Mostly this is because young people have minimal political clout. (And as you say, this incentivizes young people to prove they have political clout, and sometimes they even try, but mostly nobody cares because in fact they don’t.)
Conversely, extending voting rights to everyone regardless of age would be a politically unfeasible PR nightmare, and I would not recommend announcing that we’re including everyone regardless of age (which I assume you would recommend, since 2-year-olds are human beings by many people’s bright line test), for similar reasons.
(Somewhat tangentially: extending CEV inclusion, or voting rights, to everyone regardless of age would force us as a matter of logic to either establish a cutoff at birth or not establish a cutoff at birth. Either way we’d then have stepped in the pile of cow manure that is U.S. abortion politics, where the only winning move is not to play. What counts as a human being simply isn’t as politically uncontroversial a question as you’re making it sound.)
Sorry, you’ve lost me. Can you clarify what the different arguments you refer to here are, and why the difference between them matters in this context?
Once they succeed in building a CEV-extractor and a CEV-implementor, then yeah, their broadcast intentions probably don’t matter much. Until then, they can matter a lot.
What you see as the factors holding back people from cooperating with modern analogues of FAI projects? Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy?
As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism.
If you don’t think someone has enough political clout to bother with, they’ll be incentivized to prove you wrong. Even if you’re right most of the time, you’ll be giving yourself trouble.
I agree that very young humans are a potential difficult gray area. One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included.
FAI team trustworthiness is a different subject than optimal enfranchisement structure.
I’m not sure what those modern analogues are, but in general here are a few factors I see preventing people from cooperating on projects where both mutual cooperation and unilateral cooperation would be beneficial:
Simple error in calculating the expected value of cooperating.
Perceiving more value in obtaining higher status within my group by defending my group’s wrong beliefs about the project’s value than in defecting from my group by cooperating in the project
Perceiving more value in continuing to defend my previously articulated position against the project (e.g., in being seen as consistent or as capable of discharging earlier commitments) than in changing my position and cooperating in the project
Why do you ask?
I suspect that would be an easier question to answer with anything other than “it depends” if I had a specific example to consider. In general, I expect that it depends on who is motivated to support the project now to what degree, and the specific enfranchisement policy under discussion, and what value they perceive in that policy.
Sure, that’s probably true, at least for some values of “lean towards” (there’s a lot to be said here about actual support and signaled support but I’m not sure it matters). And it will likely remain true for as long as the FAI project in question only cares about the cooperation of wealthy, intelligent, rational modern folks, which they are well advised to continuing to do for as long as FAI isn’t a subject of particular interest to anyone else, and to stop doing as soon as possible thereafter.
(shrug) Sure, there’s some nonzero expected cost to the brief window between when they start proving their influence and I concede and include them.
Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor?
I agree with this, certainly.
It was mainly rhetorical; I tend to think that what holds back today’s FAI efforts is lack of rationality and inability of folks to take highly abstract arguments seriously.
Potentially bad things that could happen from implementing the CEV of a two-year-old.
I conclude that I do not understand what you think the CEV-extractor is doing.
Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There’s no reason in principle why a three-year-old who was “more the person he wished he was” would necessarily be a moral adult...
CEV does not mean considering the preferences of an agent who is “more moral”. There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer:
a) one patterned on the current preferences of a typical three-year-old
b) one patterned on the current preferences of a typical thirty-year old
c) one that is actually safe to implement (aka “Friendly”)
I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it’s possible.
I would say that the gulf between B and C is similarly enormous, and I’m equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf.
Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver.
That’s why I said I don’t understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It’s only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it’s hard to say much for sure.
(nods) That follows from what you’ve said earlier.
I suspect we have very different understandings of how similar the 30-year-old’s desires are to their volition.
Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old… something that would elicit “Oh, cool, yeah, this is more or less what I had in mind” rather than “Holy Fucking Mother of God what kind of an insane world IS this?!?”?
For my own part, I consider the latter orders of magnitude more likely.
I’m pretty uncertain.