If the SIAI engineers figure out how to construct friendly super-AI, why would they care about making it respect the values of anyone but themselves? What incentive do they have to program an AI that is friendly to humanity, and not just to themselves? What’s stopping LukeProg from appointing himself king of the universe?
You know what they say the modern version of Pascal’s Wager is? Sucking up to as many Transhumanists as possible, just in case one of them turns into God. -- Julie from Crystal Nights by Greg Egan
This is basically what I was asking before. Now, it seems to me highly unlikely that SIAI is playing that game, but I still want a better answer than “Trust us to not be supervillains”.
Lots of incorrect answers in other replies to this one. The real answer is that, from Luke’s perspective, creating Luke-friendly AI and becoming king of the universe isn’t much better than creating regular friendly AI and getting the same share of the universe as any other human. Because it turns out, after the first thousand galaxies worth of resources and trillion trillion millenia of lifespan, you hit such diminishing returns that having another seven-billion times as many resources isn’t a big deal.
This isn’t true for every value—he might assign value to certain things not existing, like powerful people besides him, which other people want to exist. And that last factor of seven billion is worth something. But these are tiny differences in value, utterly dwarfed by the reduced AI-creation success-rate that would happen if the programmers got into a flamewar over who should be king.
The good guys do not write an AI which values a bag of things that the programmers think are good ideas, like libertarianism or socialism or making people happy or whatever. There were multiple Overcoming Bias sequences about this one point, like the Fake Utility Function sequence and the sequence on metaethics. It is dealt with at length in the document Coherent Extrapolated Volition. It is the first thing, the last thing, and the middle thing that I say about Friendly AI.
...
The good guys do not directly impress their personal values onto a Friendly AI.
I understand CEV. What I don’t understand is why the programmers would ask the AI for humanity’s CEV, rather than just their own CEV.
The only (sane) reason is for signalling—it’s hard to create FAI without someone else stopping you. Given a choice, however, CEV is strictly superior. If you actually do want to have FAI then FAI will be equivalent to it. But if you just think you want FAI but it turns out that, for example, FAI gets dominated by jerks in a way you didn’t expect then FAI will end up better than FAI… even from a purely altruistic perspective.
Yeah, I’ve wondered this for a while without getting any closer to an understanding.
It seems that everything that some human “really wants” (and therefore could potentially be included in the CEV target definition) is either something that, if I was sufficiently well-informed about it, I would want for that human (in which case my CEV, properly unpacked by a superintelligence, includes it for them) or is something that, no matter how well informed I was, I would not want for that human (in which case it’s not at all clear that I ought to endorse implementing it).
If CEV-humanity makes any sense at all (which I’m not sure it does), it seems that CEV-arbitrary-subset-of-humanity makes leads to results that are just as good by the standards of anyone whose standards are worth respecting.
My working answer is therefore that it’s valuable to signal the willingness to do so (so nobody feels left out), and one effective way to signal that willingness consistently and compellingly is to precommit to actually doing it.
Sure. For example, if I want other people’s volition to be implemented, that is sufficient to justify altruism. (Not necessary, but sufficient.)
But that doesn’t justify directing an AI to look at other people’s volition to determine its target directly… as has been said elsewhere, I can simply direct an AI to look at my volition, and the extrapolation process will naturally (if CEV works at all) take other people’s volition into account.
I think it would be significantly easier to make FAI than LukeFreindly AI: for the latter, you need to do most of the work involved in the former, but also work out how to get the AI to find you (and not accidentally be freindly to someone else).
If it turns out that there’s a lot of coherance in human values, FAI will resemble LukeFreindlyAI quite closely anyway.
I think it would be significantly easier to make FAI than LukeFreindly AI
Massively backwards! Creating an FAI (presumably ‘friendly to humanity’) requires an AI that can somehow harvest and aggregate preferences over humans in general but an FAI just needs to scan one brain.
If FAI is HumanityFriendly rather than LukeFriendly, you have to work out how to get the AI to find humanity and not accidentally optimize for the extrapolated volition of some other group. It seems easier to me to establish parameters for “finding” Luke than for “finding” humanity.
Of course an arbitrarily chosen human’s values are more similar to to the aggregated values of humanity as a whole than humanity’s values are similar to an arbitrarily chosen point in value-space. Value-space is big.
I don’t see how my point depends on that, though. Your argument here claims that “FAI” is easier than “LukeFriendlyAI” because LFAI requires an additional step of defining the target, and FAI doesn’t require that step. I’m pointing out that FAI does require that step. In fact, target definition for “humanity” is a more difficult problem than target definition for “Luke”
I find it much more likely that it’s the other way around; making one for a single brain that already has an utility function seems much easier than finding out a good compromise between billions. Especially if the form “upload me, then preform this specific type of enchantment to enable me to safely continue self improving.” turns out to be safe enough.
Game theory. If different groups compete in building a “friendly” AI that respects only their personal extrapolated coherent violation (extrapolated sensible desires) then cooperation is no longer an option because the other teams have become “the enemy”. I have a value system that is substantially different from Eliezer’s. I don’t want a friendly AI that is created in some researcher’s personal image (except, of course, if it’s created based on my ideals). This means that we have to sabotage each other’s work to prevent the other researchers to get to friendly AI first. This is because the moment somebody reaches “friendly” AI the game is over and all parties except for one lose. And if we get uFAI everybody loses.
That’s a real problem though. If different fractions in friendly AI research have to destructively compete with each other, then the probability of unfriendly AI will increase. That’s real bad. From a game theory perspective all FAI researchers agree that any version of FAI is preferable to uFAI, and yet they’re working towards a future where uFAI is becoming more and more likely! Luckily, if the FAI researchers take the coherent extrapolated violation of all of humanity the problem disappears. All FAI researchers can work to a common goal that will fairly represent all of humanity, not some specific researcher’s version of “FAI”. It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most “fair”. Cooperation beats defection.
If Luke were to attempt to create a LukeFriendlyAI he knows he’s defecting from the game theoretical optimal strategy and thereby increasing the probability of a world with uFAI. If Luke is aware of this and chooses to continue on that course anyway then he’s just become another uFAI researcher who actively participates in the destruction of the human species (to put it dramatically).
We can’t force all AI programmers to focus on the FAI route. We can try to raise the sanity waterline and try to explain to AI researchers that the optimal (game theoretically speaking) strategy is the one we ought to pursue because it’s most likely to lead to a fair FAI based on all of our human values. We just have to cooperate, despite differences in beliefs and moral values. CEV is the way to accomplish that because it doesn’t privilege the AI researchers who write the code.
Game Theory only helps us if it’s impossible to deceive others. If one is able to engage in deception, the dominant strategy becomes to pretend to support CEV FAI while actually working on your own personal God in a jar. AI development in particular seems an especially susceptible domain for deception. The creation of a working AI is a one time event, it’s not like most stable games in nature which allow one to detect defections of hundreds of iterations. The creation of a working AI (FAI or uFAI) is so complicated that it’s impossible for others to check if any given researcher is defecting or not.
Our best hope then is for the AI project to be so big it cannot be controlled by a single entity and definitely not by a single person. If it only takes guy in a basement getting lucky to make an AI go FOOM, we’re doomed. If it takes ten thousand researchers collaborating in the biggest group coding project ever, we’re probably safe. This is why doing work on CEV is so important. So we can have that piece of the puzzle already built when the rest of AI research catches up and is ready to go FOOM.
As I understand the terminology, AI that only respects some humans’ preferences is uFAI by definition. Thus:
a friendly AI that is created in some researcher’s personal image
is actually unFriendly, as Eliezer uses the term. Thus, the researcher you describe is already an “uFAI researcher”
It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most “fair”.
What do you mean by “representative set of all human values”? Is there any reason to that the resulting moral theory would be acceptable to implement on everyone?
[a “friendly” AI] is actually unFriendly, as Eliezer uses the term
Absolutely. I used “friendly” AI (with scare quotes) to denote it’s not really FAI, but I don’t know if there’s a better term for it. It’s not the same as uFAI because Eliezer’s personal utopia is not likely to be valueless by my standards, whereas a generic uFAI is terrible from any human point of view (paperclip universe, etc).
I guess it just doesn’t bother me that uFAI includes both indifferent AI and malicious AI. I honestly think that indifferent AI is much more likely than malicious (Clippy is malicious, but awfully unlikely), but that’s not good for humanity’s future either.
Right now, and for the foreseeable future, SIAI doesn’t have the funds to actually create FAI. All they’re doing is creating a theory for friendliness, which can be used when someone else has the technology to create AI. And of course, nobody else is going to use the code if it focuses on SIAI.
SIAI doesn’t have the funds to actually create FAI
Funds are not a relevant issue for this particular achievement at present time. It’s not yet possible to create a FAI even given all the money in the world; a pharaoh can’t build a modern computer. (Funds can help with moving the time when (and if) that becomes possible closer, improving the chances that it happens this side of an existential catastrophe.)
Yeah, I was assuming that they were able to create FAI for the sake of responding to the grandparent post. If they weren’t, then there wouldn’t be any trouble with SIAI making AI only friendly to themselves to begin with.
The theory for friendliness is completely separate from the theory of AI. So, assuming they complete one does not mean that they complete the other. Furthermore, for something as big as AI/FAI, the computing power required is likely to be huge, which makes it unlikely that a small company like SIAI will be able to create it.
Though, I suppose it might be possible if they were able to get large enough loans, I don’t have the technical knowledge to say how much computing power is needed or how much that would cost.
The theory for friendliness is completely separate from the theory of AI.
??? Maybe I’m being stupid, but I suspect it’s fairly hard to fully and utterly solve the friendliness problem without, by the end of doing so, AT LEAST solving many of the tricky AI problems in general.
Now that I understand your question better, here’s my answer:
Let’s say the engineers decide to make the AI respect only their values. But if they were the sort of people who were likely to do that, no one would donate money to them. They could offer to make the AI respect the values of themselves and their donors, but that would alienate everyone else and make the lives of themselves and their donors difficult. The species boundary between humans and other living beings is a natural place to stop expanding the circle of enfranchised agents.
This seems to depend on the implicit assumption that their donors (and everyone else powerful enough to make their lives difficult) don’t mind having the values of third parties respected.
If some do mind, then there’s probably some optimally pragmatic balancing point short of all humans.
It seems to me all these questions arise for “include everyone” as well. Somewhere along the line someone is going to suggest “don’t include fundamentalist Christians”, for example, and if I’m committed to the kind of democratic decision process you imply, then we now need to have a vote, or at least decide whether we have a vote, etc. etc, all of that bureaucratic overhead.
Of course, that might not be necessary; I could just unilaterally override that suggestion, mandate “No, we include everyone!”, and if I have enough clout to make that stick, then it sticks, with no bureaucratic overhead. Yay! This seems to more or less be what you have in mind.
It’s just that the same goes for “Include everyone except fundamentalist Christians.”
In any case, I don’t see how any of this cumbersome democratic machinery makes any sense in this scenario. Actually working out CEV implies the existence of something, call it X, that is capable of extrapolating a coherent volition from the state of a group of minds. What’s the point of voting, appeals, etc. when that technology is available? X itself is a better solution to the same problem.
Which implies that it’s possible to identify a smaller group of minds as the Advisory Board and say to X “Work out the Advisory Board’s CEV with respect to whose minds should be included as input to a general-purpose optimizer’s target definition, then work out the CEV of those minds with respect to the desired state of the world.” Then anyone with enough political clout to get in my way, I add to the Advisory Board, thereby ensuring that their values get taken into consideration (including their values regarding whose values get included).
That includes folks who think everyone should get an equal say, folks who think that every human should get an equal say, folks who think that everyone with more than a certain threshold level of intelligence and moral capacity get a say, folks who think that everyone who agrees with them get a say, etc., etc. X works all of that out, and spits out a spec on the other side for who actually gets a say and to what degree, which it then takes as input to the actual CEV-extrapolating process.
This seems kind of absurd to me, but no more so than the idea that X can work out humanity’s CEV at all. If I’m granting that premise for the sake of argument, everything else seems to follow.
It’s just that the same goes for “Include everyone except fundamentalist Christians.”
There is no clear bright line determining who is or is not a fundamentalist Christian. Right now, there pretty much is a clear bright line determining who is or is not human. And that clear bright line encompasses everyone we would possibly want to cooperate with.
Your advisory board suggestion ignores the fact that we have to be able to cooperate prior to the invention of CEV deducers.
And you’re not describing a process for how the advisory board is decided either. Different advisory boards may produce different groups of enfranchised minds. So your suggestion doesn’t resolve the problem.
In fact, I don’t see how putting a group of minds on the advisory board is any different than just making them the input to the CEV. If a person’s CEV is that someone’s mind should contribute to the optimizer’s target, that will be their CEV regardless of whether it’s measured in an advisory board context or not.
There is no clear bright line determining who is or is not a fundamentalist Christian.
There is no clear bright line determining what is or isn’t a clear bright line.
I agree that the line separating “human” from “non-human” is much clearer and brighter than that separating “fundamentalist Christian” from “non-fundamentalist Christian”, and I further agree that for minds like mine the difference between those two lines is very important. Something with a mind like mine can work with the first distinction much more easily than with the second.
So what?
A mind like mine doesn’t stand a chance of extrapolating a coherent volition from the contents of a group of target minds. Whatever X is, it isn’t a mind like mine.
If we don’t have such an X available, then it doesn’t matter what defining characteristic we use to determine the target group for CEV extrapolation, because we can’t extrapolate CEV from them anyway.
If we do have such an X available, then it doesn’t matter what lines are clear and bright enough for minds like mine to reliably work with; what matters is what lines are clear and bright enough for systems like X to reliably work with.
Right now, there pretty much is a clear bright line determining who is or is not human. And that clear bright line encompasses everyone we would possibly want to cooperate with.
I have confidence < .1 that either one of us can articulate a specification determining who is human that doesn’t either include or exclude some system that someone included in that specification would contest the inclusion/exclusion of.
I also have confidence < .1 that, using any definition of “human” you care to specify, the universe contains no nonhuman systems I would possibly want to cooperate with.
Your advisory board suggestion ignores the fact that we have to be able to cooperate prior to the invention of CEV deducers.
Sure, but so does your “include all humans” suggestion. We’re both assuming that there’s some way the AI-development team can convincingly commit to a policy P such that other people’s decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying “I’ll include all of humanity” isn’t good enough to ensure cooperation if nobody believes me.
I have confidence that, given a mechanism for getting from someone saying “I’ll include all of humanity” to everyone cooperating, I can work out a way to use the same mechanism to get from someone saying “I’ll include the Advisory Board, which includes anyone with enough power that I care whether they cooperate or not” to everyone I care about cooperating.
And you’re not describing a process for how the advisory board is decided either.
I said: “Then anyone with enough political clout to get in my way, I add to the Advisory Board.” That seems to me as well-defined a process as “I decide to include every human being.”
Different advisory boards may produce different groups of enfranchised minds.
Certainly.
So your suggestion doesn’t resolve the problem.
Can you say again which problem you’re referring to here? I’ve lost track.
In fact, I don’t see how putting a group of minds on the advisory board is any different than just making them the input to the CEV.
Absolutely agreed.
Consider the implications of that, though.
Suppose you have a CEV-extractor and we’re the only two people in the world, just for simplicity. You can either point the CEV-extractor at yourself, or at both of us. If you genuinely want me included, then it doesn’t matter which you choose; the result will be the same. Conversely, if the result is different, that’s evidence that you don’t genuinely want me included, even if you think you do.
Knowing that, why would you choose to point the CEV-extractor at both of us?
One reason for doing so might be that you’d precommitted to doing so (or some UDT equivalent), so as to secure my cooperation. Of course, if you can secure my cooperation without such a precommitment (say, by claiming you would point it at both of us), that’s even better.
Complicated or ambiguous schemes take more time to explain, get more attention, and risk folks spending time trying to gerrymander their way in instead of contributing to FAI.
I think any solution other than “enfranchise humanity” is a potential PR disaster.
Keep in mind that not everyone is that smart, and there are some folks who would make a fuss about disenfranchisement of others even if they themselves were enfranchised (and therefore, by definition, those they were making a fuss about would be enfranchised if they thought it was a good idea).
I agree there are potential ambiguity problems with drawing the line at humans, but I think the potential problems are bigger with other schemes.
Sure, but so does your “include all humans” suggestion. We’re both assuming that there’s some way the AI-development team can convincingly commit to a policy P such that other people’s decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying “I’ll include all of humanity” isn’t good enough to ensure cooperation if nobody believes me.
I agree there are potential problems with credibility, but that seems like a separate argument.
I have confidence that, given a mechanism for getting from someone saying “I’ll include all of humanity” to everyone cooperating, I can work out a way to use the same mechanism to get from someone saying “I’ll include the Advisory Board, which includes anyone with enough power that I care whether they cooperate or not” to everyone I care about cooperating.
It’s not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general.
I said: “Then anyone with enough political clout to get in my way, I add to the Advisory Board.” That seems to me as well-defined a process as “I decide to include every human being.”
With that scheme, you’re incentivizing folks to prove they have enough political clout to get in your way.
Moreover, humans aren’t perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way.
Why do you think that the right to vote in democratic countries is as clearly determined as it is? Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare.
One reason for doing so might be that you’d precommitted to doing so (or some UDT equivalent), so as to secure my cooperation. Of course, if you can secure my cooperation without such a precommitment (say, by claiming you would point it at both of us), that’s even better.
Again, this is a different argument about why people cooperate instead of defect. To a large degree, evolution hardwired us to cooperate, especially when others are trying to cooperate with us.
I agree that if the FAI project seems to be staffed with a lot of untrustworthy, selfish backstabbers, we should cast a suspicious eye on it regardless of what they say about their project.
Ultimately it probably doesn’t matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to what their actual intentions are.
It’s not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general.
That’s not clear to me.
Suppose the Blues and the Greens are political opponents. If I credibly commit to pointing my CEV-extractor at all the Blues, I gain the support of most Blues and the opposition of most Greens. If I say “at all Blues and Greens” instead, I gain the support of some of the Greens, but I lose the support of some of the Blues, who won’t want any part of a utopia patterned even partially on hateful Green ideologies.
This is almost undoubtedly foolish of the Blues, but I nevertheless expect it. As you say, people aren’t all that smart.
The question is, is the support I gain from the Greens by including them worth the support I lose from the Blues by including the Greens? Of course it depends. That said, the strong support of a sufficiently powerful small group is often more valuable than the weak support of a more powerful larger group, so I’m not nearly as convinced as you sound that saying “we’ll incorporate the values of both you and your hated enemies!” will get more net support than picking a side and saying “we’ll incorporate your values and not those of your hated enemies.”
With that scheme, you’re incentivizing folks to prove they have enough political clout to get in your way.
Sure, that’s true. Heck, they don’t have to prove it; if they give me enough evidence to consider it plausible, I’ll include ’em. So what?
Moreover, humans aren’t perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way.
I think you underestimate how threatening egalitarianism sounds to a lot of people, many of whom have a lot of power. Cf including those hateful Greens, above. That said, I suspect there’s probably ways to spin your “include everyone” idea in such a way that even the egalitarianism-haters will not oppose it too strongly. But I also suspect there’s ways to spin my “don’t include everyone” idea in such a way that even the egalitarianism-lovers will not oppose it too strongly.
Why do you think that the right to vote in democratic countries is as clearly determined as it is?
Because many people believe it represents power. That’s also why it’s not significantly more clearly determined. It’s also why that right is not universal.
Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare.
Sure, I agree. Nor would I recommend announcing that we’re restricting the advisory board to people of a certain IQ or higher, for analogous reasons. (Also it would be a silly thing to do, but that’s beside the point, we’re talking about sales and not implementation here.) I’m not sure why you bring it up. I also wouldn’t recommend (in my country) announcing restrictions based on skin color, income, religious affiliation, or a wide variety of other things.
On the other hand, in my country, we successfully exclude people below a certain age from voting, and I correspondingly expect announcing restrictions based on age to not be too big a deal. Mostly this is because young people have minimal political clout. (And as you say, this incentivizes young people to prove they have political clout, and sometimes they even try, but mostly nobody cares because in fact they don’t.)
Conversely, extending voting rights to everyone regardless of age would be a politically unfeasible PR nightmare, and I would not recommend announcing that we’re including everyone regardless of age (which I assume you would recommend, since 2-year-olds are human beings by many people’s bright line test), for similar reasons.
(Somewhat tangentially: extending CEV inclusion, or voting rights, to everyone regardless of age would force us as a matter of logic to either establish a cutoff at birth or not establish a cutoff at birth. Either way we’d then have stepped in the pile of cow manure that is U.S. abortion politics, where the only winning move is not to play. What counts as a human being simply isn’t as politically uncontroversial a question as you’re making it sound.)
Again, this is a different argument about why people cooperate instead of defect.
Sorry, you’ve lost me. Can you clarify what the different arguments you refer to here are, and why the difference between them matters in this context?
Ultimately it probably doesn’t matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to what their actual intentions are.
Once they succeed in building a CEV-extractor and a CEV-implementor, then yeah, their broadcast intentions probably don’t matter much. Until then, they can matter a lot.
What you see as the factors holding back people from cooperating with modern analogues of FAI projects? Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy?
As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism.
If you don’t think someone has enough political clout to bother with, they’ll be incentivized to prove you wrong. Even if you’re right most of the time, you’ll be giving yourself trouble.
I agree that very young humans are a potential difficult gray area. One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included.
Sorry, you’ve lost me. Can you clarify what the different arguments you refer to here are, and why the difference between them matters in this context?
FAI team trustworthiness is a different subject than optimal enfranchisement structure.
What you see as the factors holding back people from cooperating with modern analogues of FAI projects?
I’m not sure what those modern analogues are, but in general here are a few factors I see preventing people from cooperating on projects where both mutual cooperation and unilateral cooperation would be beneficial:
Simple error in calculating the expected value of cooperating.
Perceiving more value in obtaining higher status within my group by defending my group’s wrong beliefs about the project’s value than in defecting from my group by cooperating in the project
Perceiving more value in continuing to defend my previously articulated position against the project (e.g., in being seen as consistent or as capable of discharging earlier commitments) than in changing my position and cooperating in the project
Why do you ask?
Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy?
I suspect that would be an easier question to answer with anything other than “it depends” if I had a specific example to consider. In general, I expect that it depends on who is motivated to support the project now to what degree, and the specific enfranchisement policy under discussion, and what value they perceive in that policy.
As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism.
Sure, that’s probably true, at least for some values of “lean towards” (there’s a lot to be said here about actual support and signaled support but I’m not sure it matters). And it will likely remain true for as long as the FAI project in question only cares about the cooperation of wealthy, intelligent, rational modern folks, which they are well advised to continuing to do for as long as FAI isn’t a subject of particular interest to anyone else, and to stop doing as soon as possible thereafter.
If you don’t think someone has enough political clout to bother with, they’ll be incentivized to prove you wrong. Even if you’re right most of the time, you’ll be giving yourself trouble.
(shrug) Sure, there’s some nonzero expected cost to the brief window between when they start proving their influence and I concede and include them.
One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included.
Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor?
FAI team trustworthiness is a different subject than optimal enfranchisement structure.
It was mainly rhetorical; I tend to think that what holds back today’s FAI efforts is lack of rationality and inability of folks to take highly abstract arguments seriously.
Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor?
Potentially bad things that could happen from implementing the CEV of a two-year-old.
Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There’s no reason in principle why a three-year-old who was “more the person he wished he was” would necessarily be a moral adult...
CEV does not mean considering the preferences of an agent who is “more moral”. There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer: a) one patterned on the current preferences of a typical three-year-old b) one patterned on the current preferences of a typical thirty-year old c) one that is actually safe to implement (aka “Friendly”)
I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it’s possible.
I would say that the gulf between B and C is similarly enormous, and I’m equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf.
Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver.
That’s why I said I don’t understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It’s only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it’s hard to say much for sure.
(nods) That follows from what you’ve said earlier.
I suspect we have very different understandings of how similar the 30-year-old’s desires are to their volition.
Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old… something that would elicit “Oh, cool, yeah, this is more or less what I had in mind” rather than “Holy Fucking Mother of God what kind of an insane world IS this?!?”?
For my own part, I consider the latter orders of magnitude more likely.
There is no clear bright line determining who is or is not a fundamentalist Christian. Right now, there pretty much is a clear bright line determining who is or is not human.
Is there? What about unborn babies? What about IVF fetuses? People in comas? Cryo-presevered bodies? Sufficiently-detailed brain scans?
Short answer is that they’re nice people, and they understand that power corrupts, so they can’t even rationalize wanting to be king of the universe for altruistic reasons.
Also, a post-Singularity future will probably (hopefully) be absolutely fantastic for everyone, so it doesn’t matter whether you selfishly get the AI to prefer you or not.
If the SIAI engineers figure out how to construct friendly super-AI, why would they care about making it respect the values of anyone but themselves? What incentive do they have to program an AI that is friendly to humanity, and not just to themselves? What’s stopping LukeProg from appointing himself king of the universe?
Not an answer, but a solution:
:-p
Personal abhorrence at the thought, and lack of AI programming abilities. :)
(But, your question deserves a more serious answer than this.)
Too late—Eliezer and Will Newsome are already dual kings of the universe. They balance each other’s reigns in a Ying/Yang kind of way.
This is basically what I was asking before. Now, it seems to me highly unlikely that SIAI is playing that game, but I still want a better answer than “Trust us to not be supervillains”.
Serious or not, it seems correct. There might be some advanced game thoery that says otherwise, but it only aplies to those who know the game theory.
Lots of incorrect answers in other replies to this one. The real answer is that, from Luke’s perspective, creating Luke-friendly AI and becoming king of the universe isn’t much better than creating regular friendly AI and getting the same share of the universe as any other human. Because it turns out, after the first thousand galaxies worth of resources and trillion trillion millenia of lifespan, you hit such diminishing returns that having another seven-billion times as many resources isn’t a big deal.
This isn’t true for every value—he might assign value to certain things not existing, like powerful people besides him, which other people want to exist. And that last factor of seven billion is worth something. But these are tiny differences in value, utterly dwarfed by the reduced AI-creation success-rate that would happen if the programmers got into a flamewar over who should be king.
http://lesswrong.com/lw/wp/what_i_think_if_not_why/
The rest of your question has the same answer as “why is anyone altruist to begin with”, I think.
I understand CEV. What I don’t understand is why the programmers would ask the AI for humanity’s CEV, rather than just their own CEV.
The only (sane) reason is for signalling—it’s hard to create FAI without someone else stopping you. Given a choice, however, CEV is strictly superior. If you actually do want to have FAI then FAI will be equivalent to it. But if you just think you want FAI but it turns out that, for example, FAI gets dominated by jerks in a way you didn’t expect then FAI will end up better than FAI… even from a purely altruistic perspective.
Yeah, I’ve wondered this for a while without getting any closer to an understanding.
It seems that everything that some human “really wants” (and therefore could potentially be included in the CEV target definition) is either something that, if I was sufficiently well-informed about it, I would want for that human (in which case my CEV, properly unpacked by a superintelligence, includes it for them) or is something that, no matter how well informed I was, I would not want for that human (in which case it’s not at all clear that I ought to endorse implementing it).
If CEV-humanity makes any sense at all (which I’m not sure it does), it seems that CEV-arbitrary-subset-of-humanity makes leads to results that are just as good by the standards of anyone whose standards are worth respecting.
My working answer is therefore that it’s valuable to signal the willingness to do so (so nobody feels left out), and one effective way to signal that willingness consistently and compellingly is to precommit to actually doing it.
Is this question any different from the question of why there are altruists?
Sure. For example, if I want other people’s volition to be implemented, that is sufficient to justify altruism. (Not necessary, but sufficient.)
But that doesn’t justify directing an AI to look at other people’s volition to determine its target directly… as has been said elsewhere, I can simply direct an AI to look at my volition, and the extrapolation process will naturally (if CEV works at all) take other people’s volition into account.
I think it would be significantly easier to make FAI than LukeFreindly AI: for the latter, you need to do most of the work involved in the former, but also work out how to get the AI to find you (and not accidentally be freindly to someone else).
If it turns out that there’s a lot of coherance in human values, FAI will resemble LukeFreindlyAI quite closely anyway.
Massively backwards! Creating an FAI (presumably ‘friendly to humanity’) requires an AI that can somehow harvest and aggregate preferences over humans in general but an FAI just needs to scan one brain.
Scanning is unlikely to be the bottleneck for a GAI, and it seems most of the difficulty with CEV is from the Extrapolation part, not the Coherence.
It doesn’t matter how easy the parts may be, scanning, extrapolating and cohering all of humanity is harder than scanning and extrapolating Luke.
Not if Luke’s values contain pointers to all those other humans.
If FAI is HumanityFriendly rather than LukeFriendly, you have to work out how to get the AI to find humanity and not accidentally optimize for the extrapolated volition of some other group. It seems easier to me to establish parameters for “finding” Luke than for “finding” humanity.
Yes, it depends on whether you think Luke is more different from humanity than humanity is from StuffWeCareNotOf
Of course an arbitrarily chosen human’s values are more similar to to the aggregated values of humanity as a whole than humanity’s values are similar to an arbitrarily chosen point in value-space. Value-space is big.
I don’t see how my point depends on that, though. Your argument here claims that “FAI” is easier than “LukeFriendlyAI” because LFAI requires an additional step of defining the target, and FAI doesn’t require that step. I’m pointing out that FAI does require that step. In fact, target definition for “humanity” is a more difficult problem than target definition for “Luke”
I find it much more likely that it’s the other way around; making one for a single brain that already has an utility function seems much easier than finding out a good compromise between billions. Especially if the form “upload me, then preform this specific type of enchantment to enable me to safely continue self improving.” turns out to be safe enough.
Game theory. If different groups compete in building a “friendly” AI that respects only their personal extrapolated coherent violation (extrapolated sensible desires) then cooperation is no longer an option because the other teams have become “the enemy”. I have a value system that is substantially different from Eliezer’s. I don’t want a friendly AI that is created in some researcher’s personal image (except, of course, if it’s created based on my ideals). This means that we have to sabotage each other’s work to prevent the other researchers to get to friendly AI first. This is because the moment somebody reaches “friendly” AI the game is over and all parties except for one lose. And if we get uFAI everybody loses.
That’s a real problem though. If different fractions in friendly AI research have to destructively compete with each other, then the probability of unfriendly AI will increase. That’s real bad. From a game theory perspective all FAI researchers agree that any version of FAI is preferable to uFAI, and yet they’re working towards a future where uFAI is becoming more and more likely! Luckily, if the FAI researchers take the coherent extrapolated violation of all of humanity the problem disappears. All FAI researchers can work to a common goal that will fairly represent all of humanity, not some specific researcher’s version of “FAI”. It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most “fair”. Cooperation beats defection.
If Luke were to attempt to create a LukeFriendlyAI he knows he’s defecting from the game theoretical optimal strategy and thereby increasing the probability of a world with uFAI. If Luke is aware of this and chooses to continue on that course anyway then he’s just become another uFAI researcher who actively participates in the destruction of the human species (to put it dramatically).
We can’t force all AI programmers to focus on the FAI route. We can try to raise the sanity waterline and try to explain to AI researchers that the optimal (game theoretically speaking) strategy is the one we ought to pursue because it’s most likely to lead to a fair FAI based on all of our human values. We just have to cooperate, despite differences in beliefs and moral values. CEV is the way to accomplish that because it doesn’t privilege the AI researchers who write the code.
Game Theory only helps us if it’s impossible to deceive others. If one is able to engage in deception, the dominant strategy becomes to pretend to support CEV FAI while actually working on your own personal God in a jar. AI development in particular seems an especially susceptible domain for deception. The creation of a working AI is a one time event, it’s not like most stable games in nature which allow one to detect defections of hundreds of iterations. The creation of a working AI (FAI or uFAI) is so complicated that it’s impossible for others to check if any given researcher is defecting or not.
Our best hope then is for the AI project to be so big it cannot be controlled by a single entity and definitely not by a single person. If it only takes guy in a basement getting lucky to make an AI go FOOM, we’re doomed. If it takes ten thousand researchers collaborating in the biggest group coding project ever, we’re probably safe. This is why doing work on CEV is so important. So we can have that piece of the puzzle already built when the rest of AI research catches up and is ready to go FOOM.
This doesn’t apply to all of humanity, just to AI researchers good enough to pose a threat.
As I understand the terminology, AI that only respects some humans’ preferences is uFAI by definition. Thus:
is actually unFriendly, as Eliezer uses the term. Thus, the researcher you describe is already an “uFAI researcher”
What do you mean by “representative set of all human values”? Is there any reason to that the resulting moral theory would be acceptable to implement on everyone?
Absolutely. I used “friendly” AI (with scare quotes) to denote it’s not really FAI, but I don’t know if there’s a better term for it. It’s not the same as uFAI because Eliezer’s personal utopia is not likely to be valueless by my standards, whereas a generic uFAI is terrible from any human point of view (paperclip universe, etc).
I guess it just doesn’t bother me that uFAI includes both indifferent AI and malicious AI. I honestly think that indifferent AI is much more likely than malicious (Clippy is malicious, but awfully unlikely), but that’s not good for humanity’s future either.
Right now, and for the foreseeable future, SIAI doesn’t have the funds to actually create FAI. All they’re doing is creating a theory for friendliness, which can be used when someone else has the technology to create AI. And of course, nobody else is going to use the code if it focuses on SIAI.
Funds are not a relevant issue for this particular achievement at present time. It’s not yet possible to create a FAI even given all the money in the world; a pharaoh can’t build a modern computer. (Funds can help with moving the time when (and if) that becomes possible closer, improving the chances that it happens this side of an existential catastrophe.)
Yeah, I was assuming that they were able to create FAI for the sake of responding to the grandparent post. If they weren’t, then there wouldn’t be any trouble with SIAI making AI only friendly to themselves to begin with.
If they have all the threory and coded it and whatnot, where is the cost coming from?
The theory for friendliness is completely separate from the theory of AI. So, assuming they complete one does not mean that they complete the other. Furthermore, for something as big as AI/FAI, the computing power required is likely to be huge, which makes it unlikely that a small company like SIAI will be able to create it.
Though, I suppose it might be possible if they were able to get large enough loans, I don’t have the technical knowledge to say how much computing power is needed or how much that would cost.
??? Maybe I’m being stupid, but I suspect it’s fairly hard to fully and utterly solve the friendliness problem without, by the end of doing so, AT LEAST solving many of the tricky AI problems in general.
Now that I understand your question better, here’s my answer:
Let’s say the engineers decide to make the AI respect only their values. But if they were the sort of people who were likely to do that, no one would donate money to them. They could offer to make the AI respect the values of themselves and their donors, but that would alienate everyone else and make the lives of themselves and their donors difficult. The species boundary between humans and other living beings is a natural place to stop expanding the circle of enfranchised agents.
This seems to depend on the implicit assumption that their donors (and everyone else powerful enough to make their lives difficult) don’t mind having the values of third parties respected.
If some do mind, then there’s probably some optimally pragmatic balancing point short of all humans.
Probably, but defining that balancing point would mean a lot of bureaucratic overhead to determine who to exclude or include.
Can you expand on what you mean by “bureaucratic” here?
Are people going to vote on whether someone should be included? Is there an appeals process? Are all decisions final?
OK, thanks.
It seems to me all these questions arise for “include everyone” as well. Somewhere along the line someone is going to suggest “don’t include fundamentalist Christians”, for example, and if I’m committed to the kind of democratic decision process you imply, then we now need to have a vote, or at least decide whether we have a vote, etc. etc, all of that bureaucratic overhead.
Of course, that might not be necessary; I could just unilaterally override that suggestion, mandate “No, we include everyone!”, and if I have enough clout to make that stick, then it sticks, with no bureaucratic overhead. Yay! This seems to more or less be what you have in mind.
It’s just that the same goes for “Include everyone except fundamentalist Christians.”
In any case, I don’t see how any of this cumbersome democratic machinery makes any sense in this scenario. Actually working out CEV implies the existence of something, call it X, that is capable of extrapolating a coherent volition from the state of a group of minds. What’s the point of voting, appeals, etc. when that technology is available? X itself is a better solution to the same problem.
Which implies that it’s possible to identify a smaller group of minds as the Advisory Board and say to X “Work out the Advisory Board’s CEV with respect to whose minds should be included as input to a general-purpose optimizer’s target definition, then work out the CEV of those minds with respect to the desired state of the world.”
Then anyone with enough political clout to get in my way, I add to the Advisory Board, thereby ensuring that their values get taken into consideration (including their values regarding whose values get included).
That includes folks who think everyone should get an equal say, folks who think that every human should get an equal say, folks who think that everyone with more than a certain threshold level of intelligence and moral capacity get a say, folks who think that everyone who agrees with them get a say, etc., etc. X works all of that out, and spits out a spec on the other side for who actually gets a say and to what degree, which it then takes as input to the actual CEV-extrapolating process.
This seems kind of absurd to me, but no more so than the idea that X can work out humanity’s CEV at all. If I’m granting that premise for the sake of argument, everything else seems to follow.
There is no clear bright line determining who is or is not a fundamentalist Christian. Right now, there pretty much is a clear bright line determining who is or is not human. And that clear bright line encompasses everyone we would possibly want to cooperate with.
Your advisory board suggestion ignores the fact that we have to be able to cooperate prior to the invention of CEV deducers.
And you’re not describing a process for how the advisory board is decided either. Different advisory boards may produce different groups of enfranchised minds. So your suggestion doesn’t resolve the problem.
In fact, I don’t see how putting a group of minds on the advisory board is any different than just making them the input to the CEV. If a person’s CEV is that someone’s mind should contribute to the optimizer’s target, that will be their CEV regardless of whether it’s measured in an advisory board context or not.
There is no clear bright line determining what is or isn’t a clear bright line.
I agree that the line separating “human” from “non-human” is much clearer and brighter than that separating “fundamentalist Christian” from “non-fundamentalist Christian”, and I further agree that for minds like mine the difference between those two lines is very important. Something with a mind like mine can work with the first distinction much more easily than with the second.
So what?
A mind like mine doesn’t stand a chance of extrapolating a coherent volition from the contents of a group of target minds. Whatever X is, it isn’t a mind like mine.
If we don’t have such an X available, then it doesn’t matter what defining characteristic we use to determine the target group for CEV extrapolation, because we can’t extrapolate CEV from them anyway.
If we do have such an X available, then it doesn’t matter what lines are clear and bright enough for minds like mine to reliably work with; what matters is what lines are clear and bright enough for systems like X to reliably work with.
I have confidence < .1 that either one of us can articulate a specification determining who is human that doesn’t either include or exclude some system that someone included in that specification would contest the inclusion/exclusion of.
I also have confidence < .1 that, using any definition of “human” you care to specify, the universe contains no nonhuman systems I would possibly want to cooperate with.
Sure, but so does your “include all humans” suggestion. We’re both assuming that there’s some way the AI-development team can convincingly commit to a policy P such that other people’s decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying “I’ll include all of humanity” isn’t good enough to ensure cooperation if nobody believes me.
I have confidence that, given a mechanism for getting from someone saying “I’ll include all of humanity” to everyone cooperating, I can work out a way to use the same mechanism to get from someone saying “I’ll include the Advisory Board, which includes anyone with enough power that I care whether they cooperate or not” to everyone I care about cooperating.
I said: “Then anyone with enough political clout to get in my way, I add to the Advisory Board.” That seems to me as well-defined a process as “I decide to include every human being.”
Certainly.
Can you say again which problem you’re referring to here? I’ve lost track.
Absolutely agreed.
Consider the implications of that, though.
Suppose you have a CEV-extractor and we’re the only two people in the world, just for simplicity.
You can either point the CEV-extractor at yourself, or at both of us.
If you genuinely want me included, then it doesn’t matter which you choose; the result will be the same.
Conversely, if the result is different, that’s evidence that you don’t genuinely want me included, even if you think you do.
Knowing that, why would you choose to point the CEV-extractor at both of us?
One reason for doing so might be that you’d precommitted to doing so (or some UDT equivalent), so as to secure my cooperation. Of course, if you can secure my cooperation without such a precommitment (say, by claiming you would point it at both of us), that’s even better.
Complicated or ambiguous schemes take more time to explain, get more attention, and risk folks spending time trying to gerrymander their way in instead of contributing to FAI.
I think any solution other than “enfranchise humanity” is a potential PR disaster.
Keep in mind that not everyone is that smart, and there are some folks who would make a fuss about disenfranchisement of others even if they themselves were enfranchised (and therefore, by definition, those they were making a fuss about would be enfranchised if they thought it was a good idea).
I agree there are potential ambiguity problems with drawing the line at humans, but I think the potential problems are bigger with other schemes.
I agree there are potential problems with credibility, but that seems like a separate argument.
It’s not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general.
With that scheme, you’re incentivizing folks to prove they have enough political clout to get in your way.
Moreover, humans aren’t perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way.
Why do you think that the right to vote in democratic countries is as clearly determined as it is? Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare.
Again, this is a different argument about why people cooperate instead of defect. To a large degree, evolution hardwired us to cooperate, especially when others are trying to cooperate with us.
I agree that if the FAI project seems to be staffed with a lot of untrustworthy, selfish backstabbers, we should cast a suspicious eye on it regardless of what they say about their project.
Ultimately it probably doesn’t matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to what their actual intentions are.
That’s not clear to me.
Suppose the Blues and the Greens are political opponents. If I credibly commit to pointing my CEV-extractor at all the Blues, I gain the support of most Blues and the opposition of most Greens. If I say “at all Blues and Greens” instead, I gain the support of some of the Greens, but I lose the support of some of the Blues, who won’t want any part of a utopia patterned even partially on hateful Green ideologies.
This is almost undoubtedly foolish of the Blues, but I nevertheless expect it. As you say, people aren’t all that smart.
The question is, is the support I gain from the Greens by including them worth the support I lose from the Blues by including the Greens? Of course it depends. That said, the strong support of a sufficiently powerful small group is often more valuable than the weak support of a more powerful larger group, so I’m not nearly as convinced as you sound that saying “we’ll incorporate the values of both you and your hated enemies!” will get more net support than picking a side and saying “we’ll incorporate your values and not those of your hated enemies.”
Sure, that’s true.
Heck, they don’t have to prove it; if they give me enough evidence to consider it plausible, I’ll include ’em.
So what?
I think you underestimate how threatening egalitarianism sounds to a lot of people, many of whom have a lot of power. Cf including those hateful Greens, above. That said, I suspect there’s probably ways to spin your “include everyone” idea in such a way that even the egalitarianism-haters will not oppose it too strongly. But I also suspect there’s ways to spin my “don’t include everyone” idea in such a way that even the egalitarianism-lovers will not oppose it too strongly.
Because many people believe it represents power. That’s also why it’s not significantly more clearly determined. It’s also why that right is not universal.
Sure, I agree. Nor would I recommend announcing that we’re restricting the advisory board to people of a certain IQ or higher, for analogous reasons. (Also it would be a silly thing to do, but that’s beside the point, we’re talking about sales and not implementation here.) I’m not sure why you bring it up. I also wouldn’t recommend (in my country) announcing restrictions based on skin color, income, religious affiliation, or a wide variety of other things.
On the other hand, in my country, we successfully exclude people below a certain age from voting, and I correspondingly expect announcing restrictions based on age to not be too big a deal. Mostly this is because young people have minimal political clout. (And as you say, this incentivizes young people to prove they have political clout, and sometimes they even try, but mostly nobody cares because in fact they don’t.)
Conversely, extending voting rights to everyone regardless of age would be a politically unfeasible PR nightmare, and I would not recommend announcing that we’re including everyone regardless of age (which I assume you would recommend, since 2-year-olds are human beings by many people’s bright line test), for similar reasons.
(Somewhat tangentially: extending CEV inclusion, or voting rights, to everyone regardless of age would force us as a matter of logic to either establish a cutoff at birth or not establish a cutoff at birth. Either way we’d then have stepped in the pile of cow manure that is U.S. abortion politics, where the only winning move is not to play. What counts as a human being simply isn’t as politically uncontroversial a question as you’re making it sound.)
Sorry, you’ve lost me. Can you clarify what the different arguments you refer to here are, and why the difference between them matters in this context?
Once they succeed in building a CEV-extractor and a CEV-implementor, then yeah, their broadcast intentions probably don’t matter much. Until then, they can matter a lot.
What you see as the factors holding back people from cooperating with modern analogues of FAI projects? Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy?
As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism.
If you don’t think someone has enough political clout to bother with, they’ll be incentivized to prove you wrong. Even if you’re right most of the time, you’ll be giving yourself trouble.
I agree that very young humans are a potential difficult gray area. One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included.
FAI team trustworthiness is a different subject than optimal enfranchisement structure.
I’m not sure what those modern analogues are, but in general here are a few factors I see preventing people from cooperating on projects where both mutual cooperation and unilateral cooperation would be beneficial:
Simple error in calculating the expected value of cooperating.
Perceiving more value in obtaining higher status within my group by defending my group’s wrong beliefs about the project’s value than in defecting from my group by cooperating in the project
Perceiving more value in continuing to defend my previously articulated position against the project (e.g., in being seen as consistent or as capable of discharging earlier commitments) than in changing my position and cooperating in the project
Why do you ask?
I suspect that would be an easier question to answer with anything other than “it depends” if I had a specific example to consider. In general, I expect that it depends on who is motivated to support the project now to what degree, and the specific enfranchisement policy under discussion, and what value they perceive in that policy.
Sure, that’s probably true, at least for some values of “lean towards” (there’s a lot to be said here about actual support and signaled support but I’m not sure it matters). And it will likely remain true for as long as the FAI project in question only cares about the cooperation of wealthy, intelligent, rational modern folks, which they are well advised to continuing to do for as long as FAI isn’t a subject of particular interest to anyone else, and to stop doing as soon as possible thereafter.
(shrug) Sure, there’s some nonzero expected cost to the brief window between when they start proving their influence and I concede and include them.
Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor?
I agree with this, certainly.
It was mainly rhetorical; I tend to think that what holds back today’s FAI efforts is lack of rationality and inability of folks to take highly abstract arguments seriously.
Potentially bad things that could happen from implementing the CEV of a two-year-old.
I conclude that I do not understand what you think the CEV-extractor is doing.
Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There’s no reason in principle why a three-year-old who was “more the person he wished he was” would necessarily be a moral adult...
CEV does not mean considering the preferences of an agent who is “more moral”. There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer:
a) one patterned on the current preferences of a typical three-year-old
b) one patterned on the current preferences of a typical thirty-year old
c) one that is actually safe to implement (aka “Friendly”)
I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it’s possible.
I would say that the gulf between B and C is similarly enormous, and I’m equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf.
Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver.
That’s why I said I don’t understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It’s only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it’s hard to say much for sure.
(nods) That follows from what you’ve said earlier.
I suspect we have very different understandings of how similar the 30-year-old’s desires are to their volition.
Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old… something that would elicit “Oh, cool, yeah, this is more or less what I had in mind” rather than “Holy Fucking Mother of God what kind of an insane world IS this?!?”?
For my own part, I consider the latter orders of magnitude more likely.
I’m pretty uncertain.
Is there? What about unborn babies? What about IVF fetuses? People in comas? Cryo-presevered bodies? Sufficiently-detailed brain scans?
Short answer is that they’re nice people, and they understand that power corrupts, so they can’t even rationalize wanting to be king of the universe for altruistic reasons.
Also, a post-Singularity future will probably (hopefully) be absolutely fantastic for everyone, so it doesn’t matter whether you selfishly get the AI to prefer you or not.
I for one welcome our new singularitarian overlords!
I should hope most intelligent people realize they just want to be king of their own sensory inputs.
That’s not actually the consensus here at LW: most people would rather not be delusional.
It was tongue in cheek. I realize that cypher complex (the matrix) is not common. I also think the rest of you are insane.