Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:
There is a unique X, such that for all living people P, CEV
= X.
Assuming there is no such X, there could still be a plausible claim:
Y is not empty, where Y = Intersection{over all living people P} of CEV
.
And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever “evolving” will happen due to AI’s influence is at least agreed upon by everyone(’s CEV).
I can buy, tentatively, that most people might one day agree on a very few things. If that’s what you mean by Y, fine, but it restricts the FAI to doing almost nothing. I’d much rather build a FAI that implemented more values shared by fewer people (as long as those people include myself). I expect so would most people, including the ones hypothetically building the FAI—otherwise they’d expect not to benefit much from building it, since it would find very little consensus to implement! So the first team to successfully build FAI+CEV will choose to launch it as a CEV rather than CEV.
For instance, I think CEV, if it even exists, will include nothing of real interest because people just wouldn’t agree on common goals. In such a situation, my personal CEV—or that of a few people who do agree on at least some things—would not want to include CEV. So your belief implies that CEV exists and is nontrivial. As I’ve asked before in this thread, why do you think so?
Oh, I had some evidence, but I Minimum Viable Commented. I thought it was obvious once pointed out. Illusion of transparency.
We care about what happens to humanity. We want things to go well for us. If CEV works at all, it will capture that in some way.
Even if CEV(rest of humanity) turns out to be mostly derived from radical islam, I think there would be terms in CEV(Lesswrong) for respecting that. There would also be terms for people not stoning each other to death and such. I think those (respect for CEV and good life by our standards) would only come into conflict when CEV has basically failed.
You seem to be claiming that CEV will in fact fail, which I think is a different issue. My claim is that if CEV is a useful thing, you don’t have to run it on everyone (or even a representative sample) to make it work.
It depends on what you call CEV “working” or “failing”.
One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone’s personal volition, then compare and merge them to create the group’s overall CEV. Where enough people agree, choose what they agree on (factoring in how sure they are, and how important this is to them). Where too many people disagree, do nothing, or be indifferent on the outcome of this question, or ask the programmers. Is this what you have in mind?
The big issue here is how much consensus is enough. Let’s run with concrete examples:
If CEV requires too much consensus, it may not help us become immortal because a foolish “deathist” minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
You may well have both kinds of problems at the same time (with different questions).
It all depends on how you define required consensus—and that definition can’t itself come from CEV, because it’s required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous—if you precommit to CEV and then it evolves into “too little” or “too much” consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less “correct” consensus requirement.
So the matter is not just what each person or group’s CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.
Contrariwise, if we use the CEV of all humanity, it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people. And it will have to resolve the conrtadiction, and if there’s not enough consensus among humanity’s individual CEVs to do so, the CEV algorithm will “fail”.
If CEV requires too much consensus, it may not help us become immortal because a foolish “deathist” minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.
I don’t think CEV works with explicit entities that can interact and decide to kill each other. I understand that it is much more abstract than that. Also probably all blind, and all implemented through the singleton AI, so it would be very unlikely that everyone’s EV happens to name, say, bob smith as the lulzcow.
It all depends on how you define required consensus—and that definition can’t itself come from CEV, because it’s required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous—if you precommit to CEV and then it evolves into “too little” or “too much” consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less “correct” consensus requirement.
This is a serious issue with (at least my understanding of) CEV. How to even get CEV done (presumably with an AI) without turning everyone into computronium or whatever seems hard.
So the matter is not just what each person or group’s CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.
This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.
it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people.
I think that statement is too strong. Keep in mind that it’s extrapolated volition. I doubt the islamists’ values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.
These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.
Why do you think this is “very likely”?
Today there are many people in the world (gross estimate: tens of percents of world population) who don’t believe in noninterference. True believers of several major faiths (most Christian sects, mainstream Islam) desire enforced religious conversion of others, either as a commandment of their faith (for its own sake) or for the metaphysical benefit of those others (to save them from hell). Many people “believe” (if that is the right word) in the subjugation of certain minorities, or of women, children, etc. which involves interference of various kinds. Many people experience future shock which prompts them to want laws that would stop others from self-modifying in certain ways (some including transhumanism).
Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:
We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?
This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.
I agree completely—doing the CEV of a small trusted team, who moreover are likely to hold non-extrapolated views similar to ours (e.g. they won’t be radical Islamists), would be much better than CEV; much more reliable and safe.
But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.
From this I understand that while you think CEV would have a term for “respecting” the rest of humanity, that respect would be a lot weaker than the equal (and possibly majority-voting-based) rights granted them by CEV.
I think that statement is too strong. Keep in mind that it’s extrapolated volition. I doubt the islamists’ values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.
I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.
Because infectious memeplexes scare me too, I don’t want anyone to build CEV (or rather, to run a singleton AI that would implement it) - I would much prefer CEV or better CEV or better yet, a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
or better yet, a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
A possibly related question: suppose you were about to go off on an expedition in a spaceship that would take you away from Earth for thirty years, and the ship is being stocked with food. Suppose further that, because of an insane bureaucratic process, you have only two choices: either (a) you get to choose what food to stock right now, with no time for nutritional research, or (b) food is stocked according to an expert analysis of your body’s nutritional needs, with no input from you. What outcome would you anticipate from each of those choices?
Suppose a hundred arbitrarily selected people were also being sent on similar missions on similar spaceships, and your decision of A or B applied to them as well (either they get to choose their food, or an expert chooses food for them). What outcome would you anticipate from each choice?
I think you meant to add that the expert really understands nutrition, beyond the knowledge of our best nutrition specialists today, which is unreliable and contradictory and sparse.
With that assumption I would choose to rely on the expert, and would expect much less nutritional problems on average for other people who relied on the expert vs. choosing themselves.
The difference between this and CEV is that “what nutritional/metabolic/physiological outcome is good for you” is an objective, pretty well constrained question. There are individual preferences—in enjoyment of food, and in the resulting body-state—but among people hypothetically fully understanding the human body, there will be relatively little disagreement, and the great majority should not suffer much from good choices that don’t quite match their personal preferences.
CEV, on the other hand, includes both preferences about objective matters like the above but also many entirely or mostly subjective choices (in the same way that most choices of value are a-rational). Also, people are likely to agree to not interfere in what others eat because they don’t often care about it, but people do care about many other behaviors of others (like torturing simulated intelligences, or giving false testimony, or making counterfeit money) and that would be reflected in CEV.
ETA: so in response to your question, I agree that on many subjects I trust experts / CEV more than myself. My preferred response to that, though, is not to build a FAI enforcing CEV, but to build a FAI that allows direct personal choice in areas where it’s possible to recover from mistakes, but also provides the expert opinion as an oracle advice service.
Perfect knowledge is wonderful, sure, but was not key to my point.
Given two processes for making some decision, if process P1 is more reliable than process P2, then P1 will get me better results. That’s true even if P1 is imperfect. That’s true even if P2 is “ask my own brain and do what it tells me.” All that is required is that P1 is more reliable than P2.
It follows that when choosing between two processes to implement my values, if I can ask one question, I should ask which process is more reliable. I should not ask which process is perfect, nor ask which process resides in my brain.
ETA: I endorse providing expert opinion, even though that deprives people of the experience of figuring it all out for themselves… agreed that far. But I also endorse providing reliable infrastructure, even though that deprives people of the experience of building all the infrastructure themselves, and I endorse implementing reliable decision matrices, even though that deprives people of the experience of making all the decisions themselves.
There’s no reason you have to choose just once, a single process to answer all kinds of questions. Different processes better fit different domains. Expert opinion best fits well-understood, factual, objective, non-politicized, amoral questions. Noninterference best fits matters where people are likely to want to interfere in others’ decisions and there is no pre-CEV consensus on whether such intervention is permissible.
The problem with making decisions for others isn’t that it deprives them of the experience of making decisions, but that it can influence or force them into decisions that are wrong in some sense of the word.
(shrug) Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word. If that’s really the problem, then letting people make their own decisions doesn’t solve it. The solution to that problem is letting whatever process is best at avoiding wrong answers make the decision.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word.
True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can’t be just nonintervention, it has to include other principles as well and be a complete ethical system.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
I’m always open to suggestions of new processes. I just don’t like the specific process of CEV, which happens not to reside in my brain, but that’s not why I dislike it.
At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren’t saying that, but merely saying that there’s something specific about CEV that makes it an even worse choice, well, OK. I mean, I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way, but I don’t have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.
Sounds like a good place to end this thread, then.
I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way
Really? What about the “some people are Jerks” objection? That’s kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.
(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., “all of humanity”) or other aspects of the CEV proposal, but I wasn’t at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.
Tangentially, I consider the “some people are jerks” objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.
I was referring to the mechanism for extrapolating a coherent volition from a given target
It really is hard to find a fault with that part!
Tangentially, I consider the “some people are jerks” objection very confused.
I don’t understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don’t don’t fully understand or have strong reason to trust.
That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.
I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.
What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
This is related to why I’m a bit uncomfortable accepting the sometimes expressed assertion “CEV only applies to a group, if you are doing it to an individual it’s just Extrapolated Volition”. The “make it stop being incoherent!” part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
Taking this “it’s all agents and subagents and meta-agents” outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as ‘me’ and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of “Jerkiness”.
It’s not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what’s going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it’s kind of the human condition, though we generally don’t acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.
That said, I don’t mean to challenge here your assertion that wedrifid is an exception; I don’t know you that well, and it’s certainly possible.
And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.
Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.
Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don’t actually follow through on them either. But there’s a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means… “I forgot”, “I got distracted”, “I experienced akrasia”, and so forth.
So of course there’s also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.
All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave’s brain, but it really doesn’t take that much to convince one of us to betray that alliance if the stakes get high enough.
By way of analogy, it may be coherent to assert that the U.S. can “speak as” a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it’s not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it’s not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it’s no longer entirely clear that it really is speaking for Montana or New York.
In a not-really-all-that-similar-but-it’s-the-best-I-can-do-without-getting-a-lot-more-formal way, it’s not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.
At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).
But frankly, I think that’s way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.
Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn’t a well-defined option.
Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).
Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren’t even strictly better than CEV; I just think they are better overall if I can’t mix the two.
It just seems likely, based on my understanding of what people like and approve of.
Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV.
religion, bigots, conservatism.
Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:
These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. “If we knew more, thought faster, grew closer together, etc”.
We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe.
In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?
I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I’m too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this.
Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of).
Likely candidates? That’s like asking “which of your beliefs are false”. All I can say is which are most uncertain. I can’t say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don’t yet exist.
But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.
Not quite. Let’s imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry punishing god that may or may not care about extrapolating further. Running the final, ideal CEV process with either seed should produce the same good value set, but we may not have the final ideal CEV process, and having a dangerous genie running the process may not do safe things if you start it with the wrong seed
I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.
Sorry, I made that too specific. I didn’t mean to imply that only the islamists are inconsistent. Just meant them as an obvious example.
a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
This is what I think would be good as a seed value system so that the FAI can go and block x-risk and stop death and such without having to philosophize too much first. But I’d want the CEV philosophising to be done eventually (ASAP, actually).
Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do.
Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it’s right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
I think CEV will end up more like transhumanism than like islam.
Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace “if they were smarter etc” looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn’t matter) which converge on other value-sets.
In the end, as the Confessor said, “you have judged: what else is there?” I have judged, and where I am certain enough about my judgement I would rather that other people’s CEV not override me.
Other than that I agree with you about using a non-CEV seed etc. I just don’t think we should later let CEV decide anything it likes without the seed explicitly constraining it.
CEV’s. Where by an unqualified “CEV” I take nyan to be referring to CEV (“the Coherently Extrapolated Values of Humanity”). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like “all properly behaving people of my tribe would agree and if they don’t we may need to beat them with sticks until they do.”
The problem is precisely that people disagree pre-extrapolation about [when it’s right to interfere], and therefore we fear their individual volitions will disagree even post extrapolation.
And the bracketed condition is generalisable to all sorts of things—including those preferences that we haven’t even considered the possibility of significant disagreement about. Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the ‘right’ behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That’s pretty bad but certainly not the worst thing that could happen. It is fairly trivially not “right” to not let that happen if you can easily stop it.
Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn’t allow us to say that non-intervention in a given extreme circumstance is ‘right’.
Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
That’s exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can’t undo it later.
I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified [...]
I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
Gah. I had to read through many paragraphs of drivel plot and then all I came up with was “a device that zaps people, making them into babies, but that is reversible”. You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.
I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct.
Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.
That’s a good point. The AI’s ability to not interfere is constrained by its need to monitor everything that’s going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn’t there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring.
And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
It does appear that universal surveillance is the cost of universally binding promises (you won’t be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone.
I’d like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required—like making sure no-one is torturing simulated people or raising their baby wrong. If there’s no generally acceptable FAI behavior that doesn’t include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.
And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are ‘safe’, using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking—and in particular a great big class of what it knows you are not thinking—but not necessary detailed knowledge of what you are thinking specifically.
For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.
Prediction has some disadvantages compared to constant observation:
Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-)
Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone’s state to correct the simulations.
Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance.
OTOH, surveillance might be much cheaper (I don’t know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.
One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone’s personal volition, then compare and merge them to create the group’s overall CEV.
I vaguely remember something in that doc suggesting that part of the extrapolation process involves working out the expected results of individuals interacting. More poetically, “what we would want if we grew together more.” That suggests that this isn’t quite what the original doc meant to imply, or at least that it’s not uniquely what the doc meant to imply, although I may simply be misremembering.
More generally, all the hard work is being done here by whatever assumptions are built into the “extrapolation”.
Had grown up farther together: A model of humankind’s coherent extrapolated volition should not extrapolate the person you’d become if you made your decisions alone in a padded cell. Part of our predictable existence is that we predictably interact with other people. A dynamic for CEV must take a shot at extrapolating human interactions, not just so that the extrapolation is closer to reality, but so that the extrapolation can encapsulate memetic and social forces contributing to niceness.
Our CEV may judge some memetic dynamics as not worth extrapolating—not search out the most appealing trash-talk TV show.
Social interaction is probably intractable for real-world prediction, but no more so than individual volition. That is why I speak of predictable extrapolations, and of calculating the spread.
I don’t mean to contradict that. So consider my interpretation to be something like: build (“extrapolate”) each person’s CEV, which includes that person’s interactions with other people, but doesn’t directly value them except inasfar as that person values them; then somehow merge the individual CEVs to get the group CEV.
After all (I reason) you want the following nice property for CEV. Suppose that CEV meets CEV—e.g. separate AIs implementing those CEVs meet. If they don’t embody inimical values, they will try to negotiate and compromise. We would like the result of those negotiations to look very much like CEV. One easy way to do this is to say CEV is build on “merging” all the way from the bottom up.
More generally, all the hard work is being done here by whatever assumptions are built into the “extrapolation”.
Certainly. All discussion of CEV starts with “assume there can exist a process that produces an outcome matching the following description, and assume we can and do build it, and assume that all the under-specification of this description is improved in the way we would wish it improved if we were better at wishing”.
I basically agree with all of this, except that I think you’re saying “CEV is build on “merging” all the way from the bottom up” but you aren’t really arguing for doing that.
Perhaps one important underlying question here is whether peoples values ever change contingent on their experiences.
If not—if my values are exactly the same as what they were when I first began to exist (whenever that was) -- then perhaps something like what you describe makes sense. A process for working out what those values are and extrapolating my volition based on them would be difficult to build, but is coherent in principle. In fact, many such processes could exist, and they would converge on a single output specification for my individual CEV. And then, and only then, we could begin the process of “merging.”
This strikes me as pretty unlikely, but I suppose it’s possible.
OTOH, if my values are contingent on experience—that is, if human brains experience value drift—then it’s not clear that those various processes’ outputs would converge. Volition-extrapolation process 1, which includes one model of my interaction with my environment, gets Dave-CEV-1. VEP2, which includes a different model, gets Dave-CEV-2. And so forth. And there simply is no fact of the matter as to which is the “correct” Dave-CEV; they are all ways that I might turn out; to the extent that any of them reflect “what I really want” they all reflect “what I really want”, and I “really want” various distinct and potentially-inconsistent things.
In the latter case, in order to obtain something we call CEV(Dave), we need a process of “merging” the outputs of these various computations. How we do this is of course unclear, but my point is that saying “we work out individual CEVs and merge them” as though the merge step came second is importantly wrong. Merging is required to get an individual CEV in the first place.
So, yes, I agree, it’s a fine idea to have CEV built on merging all the way from the bottom up. But to understand what the “bottom” really is is to give up on the idea that my unique individual identity is the “bottom.” Whatever it is that CEV is extrapolating and merging, it isn’t people, it’s subsets of people. “Dave’s values” are no more preserved by the process than “New Jersey’s values” or “America’s values” are.
That’s a very good point. People not only change over long periods of time; during small intervals of time we can also model a person’s values as belonging to competing and sometimes negotiating agents. So you’re right, merging isn’t secondary or dispensable (not that I suggested doing away with it entirely), although we might want different merging dynamics sometimes for sub-person fragments vs. for whole-person EVs.
Sure, the specifics of the aggregation process will depend on the nature of the monads to be aggregated.
And, yes, while we frequently model people (including ourselves) as unique coherent consistent agents, and it’s useful to do so for planning and for social purposes, there’s no clear reason to believe we’re any such thing, and I’m inclined to doubt it. This also informs the preserving-identity-across-substrates conversation we’re having elsethread.
Where relevant—or at least when I’m reminded of it—I do model myself as a collection of smaller agents. But I still call that collection “I”, even though it’s not unique, coherent, or consistent. That my identity may be a group-identity doesn’t seem to modify any of my conclusions about identity, given that to date the group has always resided together in a single brain.
For my own part, I find that attending to the fact that I am a non-unique, incoherent, and inconsistent collection of disparate agents significantly reduces how seriously I take concerns that some process might fail to properly capture the mysterious essence of “I”, leading to my putative duplicate going off and having fun in a virtual Utopia while “I” remains in a cancer-ridden body.
I would gladly be uploaded rather than die if there were no alternative. I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.
I would be fine with FAI removing existential risks and not doing any other thing until everybody(’s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.
I would be fine with FAI removing existential risks and not doing any other thing until everybody(’s CEV) agrees on it.
So what makes you think everybody’s CEV would eventually agree on anything more?
A FAI that never does anything except prevent existential risk—which, in a narrow interpretation, means it doesn’t stop half of humanity from murdering the other half—isn’t a future worth fighting for IMO. We can do so much better. (At least, we can if we’re speculating about building a FAI to execute any well-defined plan we can come up with.)
(I assume here that removing existential risks is one such thing.)
I’m not even sure of that. There are people who believe religiously that End Times must come when everyone must die, and some of them want to hurry that along by actually killing people. And the meaning of “existential risk” is up for grabs anyway—does it preclude evolution into non-humans, leaving no members of original human species in existence? Does it preclude the death of everyone alive today, if some humans are always alive?
Sure, it’s unlikely or it might look like a contrived example to you. But are you really willing to precommit the future light cone, the single shot at creating an FAI (singleton), to whatever CEV might happen to be, without actually knowing what CEV produces and having an abort switch? That’s one of the defining points of CEV: that you can’t know it correctly in advance, or you would just program it directly as a set of goals instead of building a CEV-calculating machine.
And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.
This seems wrong. A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team’s values are closer to my own than humanity’s average, plus they have a better chance of actually agreeing on more things.
A FAI that never does anything except prevent existential risk—which, in a narrow interpretation, means it doesn’t stop half of humanity from murdering the other half—isn’t a future worth fighting for IMO. We can do so much better.
No one said you have to stop with that first FAI. You can try building another. The first FAI won’t oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.
There are people who believe religiously that End Times must come
Yes, but we assume they are factually wrong, and so their CEV would fix this.
A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team’s values are closer to my own than humanity’s average, plus they have a better chance of actually agreeing on more things.
Not bloody likely. I’m going to oppose your team, discourage your funders, and bomb your headquarters—because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled.
You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won’t interfere.
No one said you have to stop with that first FAI. You can try building another. The first FAI won’t oppose it (non-interference).
No. Any FAI (ETA: or other AGI) has to be a singleton to last for long. Otherwise I can build a uFAI that might replace it.
Suppose your AI only does a few things that everyone agrees on, but otherwise “doesn’t interfere”. Then I can build another AI, which implements values people don’t agree on. Your AI must either interfere, or be resigned to not being very relevant in determining the future.
Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority? Then it’s at best a nice-to-have, but most likely useless. After people successfully build one AGI, they will quickly reuse the knowledge to build more. The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs, to safeguard its utility function. This is unavoidable. With truly powerful AGI, preventing new AIs from gaining power is the only stable solution.
Or, better yet, you can try talking to the other half of the humans.
Yeah, that’s worked really well for all of human history so far.
Yes, but we assume they are factually wrong, and so their CEV would fix this.
First, they may not factually wrong about the events they predict in the real world—like everyone dying—just wrong about the supernatural parts. (Especially if they’re themselves working to bring these events to pass.) IOW, this may not be a factual belief to be corrected, but a desired-by-them future that others like me and you would wish to prevent.
Second, you agreed the CEV of groups of people may contain very few things that they really agree on, so you can’t even assume they’ll have a nontrivial CEV at all, let alone that it will “fix” values you happen to disagree with.
Not bloody likely. I’m going to oppose your team, discourage your funders, and bomb your headquarters—because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled.
You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won’t interfere.
I have no idea what your FAI will do, because even if you make no mistakes in building it, you yourself don’t know ahead of time what the CEV will work out to. If you did, you’d just plug those values into the AI directly instead of calculating the CEV. So I’ll want to bomb you anyway, if that increases my chances of being the first to build a FAI. Our morals are indeed different, and since there are no objectively distinguished morals, the difference goes both ways.
Of course I will dedicate my resources to first bombing people who are building even more inimical AIs. But if I somehow knew you and I were the only ones in the race, I’d politely ask you to join me or desist or be stopped by force.
As long as we’re discussing bombing, consider that the SIAI isn’t and won’t be in a position to bomb anyone. OTOH, if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It’s the ultimate winner-take-all arms race.
This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone’s attempts to build any kind of Friendliness theory. Furthermore, a state military planning to build AGI singleton won’t stop to think for long before wiping your civilian, unprotected FAI theory research center off the map. Either you go underground or you cooperate with a powerful player (the state on whose territory you live, presumably). Or maybe states and militaries won’t wise up in time, and some private concern really will build the first AGI. Which may be better or worse depending on what they build.
Eventually, unless the whole world is bombed back into pre-computer-age tech, someone very probably will build an AGI of some kind. The SIAI idea is (possibly) to invent Friendliness theory and publish it widely, so that whoever builds that AGI, if they want it to be Friendly (at least to themselves!), they will have a relatively cheap and safe implementation to use. But for someone actually trying to build an AGI, two obvious rules are:
Absolute secrecy, or you get bombed right away.
Do absolutely whatever it takes to successfully launch as early as possible, and make your AI a singleton controlled by yourself or by nobody—regardless of your and the AI’s values.
Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?
If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?
The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs
Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).
you can’t even assume they’ll have a nontrivial CEV at all, let alone that it will “fix” values you happen to disagree with.
But I can be sure that CEV fixes values that are based on false factual beliefs—this is a part of the definition of CEV.
I have no idea what your FAI will do
But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.
there are no objectively distinguished morals
But there may be a partial ordering between morals, such that X<Y iff all “interfering” actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have:
[assuming Endorses(A, X) implies FAI does not perform any non-interfering action disagreeable for A]
if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It’s the ultimate winner-take-all arms race. This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone’s attempts to build any kind of Friendliness theory.
Well, don’t you think this is just ridiculous? Does it look like the most rational behavior? Wouldn’t it be better for everybody to cooperate in this Prisoner’s Dilemma, and do it with a creditable precommitment?
If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no.
I don’t understand what you mean by “fundamentally different”. You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won’t be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case.
On what moral grounds would it do the prevention?
Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it’s not ideal) over your view of CEV.
Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).
Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is—which restricts those other AIs’ abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good.
And yet it doesn’t limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%.
But I can be sure that CEV fixes values that are based on false factual beliefs—this is a part of the definition of CEV.
But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world’s population who today believe in the supernatural will not consent to having that belief “fixed”. Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn’t value objective truth highly (because they today don’t believe in the concept of objective truth, or believe that it contradicts everyday experience).
I have no idea what your FAI will do
But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.
Yes, but I don’t know what I would approve of if I were “more intelligent” (a very ill defined term). And if you calculate that something, according to your definition of intelligence, and present me with the result, I might well reject that result even if I believe in your extrapolation process. I might well say: the future isn’t predetermined. You can’t calculate what I necessarily will become. You just extrapolated a creature I might become, which also happens to be more intelligent. But there’s nothing in my moral system that says I should adopt the values of someone else because they are more intelligent. If I don’t like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature! I might even choose to forego the kind of increased intelligence that causes such an undsired change in my values.
Short version: “what I would want if I were more intelligent (according to some definition)” isn’t the same as “what I will likely want in the future”, because there’s no reason for me to grow in intelligence (by that definition) if I suspect it would twist my values. So you can’t apply the heuristic of “if I know what I’m going to think tomorrow, I might as well think it today”.
I think you may be missing a symbol there? If not, I can’t parse it… Can you spell out for me what it means to just write the last three Endorses(...) clauses one after the other?
Does it look like the most rational behavior?
It may be quite rational for everyone individually, depending on projected payoffs. Unlike a PD, starting positions aren’t symmetrical and players’ progress/payoffs are not visible to other players. So saying “just cooperate” doesn’t immediately apply.
Wouldn’t it be better for everybody to cooperate in this Prisoner’s Dilemma, and do it with a creditable precommitment?
How can a state or military precommit to not having a supersecret project to develop a private AGI?
And while it’s beneficial for some players to join in a cooperative effort, it may well be that a situation of several competing leagues (or really big players working alone) develops and is also stable. It’s all laid over the background of existing political, religious and personal enmities and rivalries—even before we come to actual disagreements over what the AI should value.
If a majority of humanity wishes to kill a minority, obviously there won’t be a consensus to stop the killing, and AI will not interfere.
This assumes that CEV uses something along the lines of a simulated vote as an aggregation mechanism. Currently the method of aggregation is undefined so we can’t say this with confidence—certainly not as something obvious.
I agree. However, if the CEV doesn’t privilege any value separately from how many people value it how much (in EV), and if the EV of a large majority values killing a small minority (whose EV is of course opposed), and if you have protection against both positive and negative utility monsters (so it’s at least not obvious and automatic that the negative value of the minority would outweigh the positive value of the majority) - then my scenario seems to me to be plausible, and an explanation is necessary as to how it might be prevented.
Of course you could say that until CEV is really formally specified, and we know how the aggregation works, this explanation cannot be produced.
If a majority of humanity wishes to kill a minority, obviously there won’t be a consensus to stop the killing, and AI will not interfere
The majority may wish to kill the minority for wrong reasons—based on false beliefs or insufficient intelligence. In which case their CEV-s won’t endorse it, and the FAI will interfere. “Fundamentally different” means their killing each other is endorsed by someone’s CEV, not just by themselves.
But you said it would only do things that are approved by a strong human consensus.
Strong consensus of their CEV-s.
Maybe their extrapolated volition simply doesn’t value objective truth highly (because they today don’t believe in the concept of objective truth, or believe that it contradicts everyday experience)
Extrapolated volition is based on objective truth, by definition.
If I don’t like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature!
The process of extrapolation takes this into account.
I think you may be missing a symbol there? If not, I can’t parse it...
Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV.
How can a state or military precommit to not having a supersecret project to develop a private AGI?
That’s a separate problem. I think it is easier to solve than extrapolating volition or building AI.
The majority may wish to kill the minority for wrong reasons—based on false beliefs or insufficient intelligence. In which case their CEV-s won’t endorse it, and the FAI will interfere
So you’re OK with the FAI not interfering if they want to kill them for the “right” reasons? Such as “if we kill them, we will benefit by dividing their resources among ourselves”?
But you said it would only do things that are approved by a strong human consensus.
Strong consensus of their CEV-s.
So you’re saying your version of CEV will forcibly update everyone’s beliefs and values to be “factual” and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it’s backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made?
If your answer is “yes, it will do that”, then I would not call your AI a Friendly one at all.
Extrapolated volition is based on objective truth, by definition.
My understanding of the CEV doc differs from yours. It’s not a precise or complete spec, and it looks like both readings can be justified.
The doc doesn’t (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn’t mean it removes all non-truth or all beliefs that “aren’t even wrong” from the original volition. If the original person effectively assigns 0 or 1 “non-updateable probability” to some belief, or honestly doesn’t believe in objective reality, or believes in “subjective truth” of some kind, CEV is not necessarily going to “cure” them of it—especially not by force.
But as long as we’re discussing your vision of CEV, I can only repeat what I said above—if it’s going to modify people by force like this, I think it’s unFriendly and if it were up to me, would not launch such an AI.
I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV.
Understood. But I don’t see how this partial ordering changes what I had described.
Let’s say I’m A1 and you’re A2. We would both prefer a mutual CEV than a CEV of the other only. But each of us would prefer even more a CEV of himself only. So each of us might try to bomb the other first if he expected to get away without retaliation. That there exists a possible compromise that is better than total defeat doesn’t mean total victory wouldn’t be much better than any compromise.
How can a state or military precommit to not having a supersecret project to develop a private AGI?
That’s a separate problem. I think it is easier to solve than extrapolating volition or building AI.
If you think so you must have evidence relating to how to actually solve this problem. Otherwise they’d both look equally mysterious. So, what’s your idea?
So you’re OK with the FAI not interfering if they want to kill them for the “right” reasons?
I wouldn’t like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent—then I prefer FAI not interfering.
“if we kill them, we will benefit by dividing their resources among ourselves”
If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.
So you’re saying your version of CEV will forcibly update everyone’s beliefs
No. CEV does not updates anyone’s beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.
If the original person effectively assigns 0 or 1 “non-updateable probability” to some belief, or honestly doesn’t believe in objective reality, or believes in “subjective truth” of some kind, CEV is not necessarily going to “cure” them of it—especially not by force.
As I said elsewhere, if a person’s beliefs are THAT incompatible with truth, I’m ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don’t believe there exist such people (excluding totally insane).
That there exists a possible compromise that is better than total defeat doesn’t mean total victory wouldn’t be much better than any compromise.
But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable).
If you think so you must have evidence relating to how to actually solve this problem. Otherwise they’d both look equally mysterious. So, what’s your idea?
Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.
If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.
The resources are not scarce at all. But, there’s no consensus of CEVs. The CEVs of 80% want to kill the rest. The CEVs of 20% obviously don’t want to be killed. Because there’s no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.
No. CEV does not updates anyone’s beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.
I meant that the AI that implements your version of CEV would forcibly update people’s actual beliefs to match what it CEV-extrapolated for them. Sorry for the confusion.
As I said elsewhere, if a person’s beliefs are THAT incompatible with truth, I’m ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don’t believe there exist such people (excluding totally insane).
A case could be made that many millions of religious “true believers” have un-updatable 0⁄1 probabilities. And so on.
Your solution is to not give them a voice in the CEV at all. Which is great for the rest of us—our CEV will include some presumably reduced term for their welfare, but they don’t get to vote on things. This is something I would certainly support in a FAI (regardless of CEV), just as I would support using CEV or even CEV to CEV.
The only difference between us then is that I estimate there to be many such people. If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?
PD reasoning says you should cooperate (assuming cooperation is precommittable).
As I said before, this reasoning is inapplicable, because this situation is nothing like a PD.
The PD reasoning to cooperate only applies in case of iterated PD, whereas creating a singleton AI is a single game.
Unlike PD, the payoffs are different between players, and players are not sure of each other’s payoffs in each scenario. (E.g., minor/weak players are more likely to cooperate than big ones that are more likely to succeed if they defect.)
The game is not instantaneous, so players can change their strategy based on how other players play. When they do so they can transfer value gained by themselves or by other players (e.g. join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2).
It is possible to form alliances, which gain by “defecting” as a group. In PD, players cannot discuss alliances or trade other values to form them before choosing how to play.
There are other games going on between players, so they already have knowledge and opinions and prejudices about each other, and desires to cooperate with certain players and not others. Certain alliances will form naturally, others won’t.
adoption of total transparency for everybody of all governmental and military matters.
This counts as very weak evidence because it proves it’s at least possible to achieve this, yes. (If all players very intensively inspect all other players to make sure a secret project isn’t being hidden anywhere—they’d have to recruit a big chunk of the workforce just to watch over all the rest.)
But the probability of this happening in the real world, between all players, as they scramble to be the first to build an apocalyptic new weapon, is so small it’s not even worth discussion time. (Unless you disagree, of course.) I’m not convinced by this that it’s an easier problem to solve than that of building AGI or FAI or CEV.
The resources are not scarce at all. But, there’s no consensus of CEVs. The CEVs of 80% want to kill the rest.
The resources are not scarce, yet the CEV-s want to kill? Why?
I meant that the AI that implements your version of CEV would forcibly update people’s actual beliefs to match what it CEV-extrapolated for them.
It would do so only if everybody’s CEV-s agree that updating these people’s beliefs is a good thing.
If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?
People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.
The PD reasoning to cooperate only applies in case of iterated PD
Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?
Unlike PD, the payoffs are different between players, and players are not sure of each other’s payoffs in each scenario
This doesn’t really matter for a broad range of possible payoff matrices.
join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2
Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let’s assume it’s solved.
I’m not convinced by this that it’s an easier problem to solve than that of building AGI or FAI or CEV.
Maybe you’re right. But IMHO it’s a less interesting problem :)
The resources are not scarce, yet the CEV-s want to kill? Why?
Sorry for the confusion. Let’s taboo “scarce” and start from scratch.
I’m talking about a scenario where—to simplify only slightly from the real world—there exist some finite (even if growing) resources such that almost everyone, no matter how much they already have, want more of. A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources. Would the AI prevent this, althogh there is no consensus against the killing?
If you still want to ask whether the resource is “scarce”, please specify what that means exactly. Maybe any finite and highly desireable resource, with returns diminishing weakly or not at all, can be considered “scarce”.
It would do so only if everybody’s CEV-s agree that updating these people’s beliefs is a good thing.
People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.
As I said—this is fine by me insofar as I expect the CEV not to choose to ignore me. (Which means it’s not fine through the Rawlsian veil of ignorance, but I don’t care and presumably neither do you.)
The question of definition, who is to be included in the CEV? or—who is considered sane? becomes of paramount importance. Since it is not itself decided by the CEV, it is presumably hardcoded into the AI design (or evolves within that design as the AI self-modifies, but that’s very dangerous without formal proofs that it won’t evolve to include the “wrong” people.) The simplest way to hardcode it is to directly specify the people to be included, but you prefer testing on qualifications.
However this is realized, it would give people even more incentive to influence or stop your AI building process or to start their own to compete, since they would be afraid of not being included in the CEV used by your AI.
The PD reasoning to cooperate only applies in case of iterated PD
Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?
TDT applies where agents are “similar enough”. I doubt I am similar enough to e.g. the people you labelled insane.
Which arguments of Hofstadter and Yudkowsky do you mean?
Cooperating in this game would mean there is exactly one global research alliance.
Why? What prevents several competing alliances (or single players) from forming, competing for the cooperation of the smaller players?
A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources
I have trouble thinking of a resource that would make even one person’s CEV, let alone 80%, want to kill people, in order to just have more of it.
The question of definition, who is to be included in the CEV? or—who is considered sane?
This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.
TDT applies where agents are “similar enough”. I doubt I am similar enough to e.g. the people you labelled insane.
We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to ‘defect’ (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can ‘cooperate’ (=precommit to build FAI<everybody’s CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.
I have trouble thinking of a resource that would make even one person’s CEV, let alone 80%, want to kill people, in order to just have more of it.
shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist.
Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?
This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.
Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you’d hardcode in, because you could also write (“hardcode”) a CEV that does include them, allowing them to keep the EVs close to their current values.
Not that I’m opposed to this decision (if you must have CEV at all).
We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves.
There’s a symmetry, but “first person to complete AI wins, everyone ‘defects’” is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style “decide for everyone”, you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome “have the AI obey ME!”, which is not the same.
If you say that logic and rationality makes you decide to ‘defect’ (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses.
Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there’s a player or alliance that’s strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn’t care about everyone else much. There are many possible and plausible outcomes besides “everybody loses”.
Instead you can ‘cooperate’ (=precommit to build FAI<everybody’s CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.
Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.
Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?
These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It’s people’s CEV-s we’re talking about, not paperclip maximizers’.
Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you’d hardcode in, because you could also write (“hardcode”) a CEV that does include them, allowing them to keep the EVs close to their current values.
Hmm, we are starting to argue about exact details of extrapolation process...
There are many possible and plausible outcomes besides “everybody loses”.
Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI, FAI, and FAI, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself.
Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: “cooperate” or “defect”. Let’s compute the expected utilities for the first team:
We cooperate, opponent team cooperates:
EU("CC") = Ueverybody * F(R1+R2, 0)
We cooperate, opponent team defects:
EU("CD") = Ueverybody * F(R1, R2) + Uother * F(R2, R1)
We defect, opponent team cooperates:
EU("DC") = Uself * F(R1, R2) + Ueverybody * F(R2, R1)
We defect, opponent team defects:
EU("DD") = Uself * F(R1, R2) + Uother * F(R2, R1)
Then, EU(“CD”) < EU(“DD”) < EU(“DC”), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then:
If Ueverybody ⇐ Uself*A + Uother*B, then EU(“CC”) < EU(“DD”), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other.
If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner’s dilemma.
If Ueverybody >= Uself*A/(1-B), then EU(“CC”) >= EU(“DC”), and “cooperate” is the obviously correct decision. This is my position: Ueverybody is not much less than Uself, and/or the teams are more evenly matched.
These all have property that you only need so much of them.
All of those resources are fungible and can be exchanged for time. There might be no limit to the amount of time people desire, even very enlightened posthuman people.
I don’t think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won’t get you any more time. There’s only 30 hours in a day, after all :)
You can use some resources like computation directly and in unlimited amounts (e.g. living for unlimitedly long virtual times per real second inside a simulation). There are some physical limits on that due to speed of light limiting effective brain size, but that depends on brain design and anyway the limits seem to be pretty high.
More generally: number of configurations physically possible in a given volume of space is limited (by the entropy of a black hole). If you have a utility function unbounded from above, as it rises it must eventually map to states that describe more space or matter than the amount you started with. Any utility maximizer with unbounded utility eventually wants to expand.
I don’t know what the exchange rates are, but it does cost something (computer time, energy, negentropy) to stay alive. That holds for simulated creatures too. If the available resources to keep someone alive are limited, then I think there will be conflict over those resources.
Naturally, F is monotonically increasing in R and decreasing in Ropp
You’re treating resources as one single kind, where really there are many kinds with possible trades between teams. Here you’re ignoring a factor that might actually be crucial to encouraging cooperation (I’m not saying I showed this formally :-)
Assume there are just two teams
But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying. I don’t even care much if given two teams the correct choice is to cooperate, because I set very low probability on there being exactly two teams and no other independent players being able to contribute anything (money, people, etc) to one of the teams.
This is my position
You still haven’t given good evidence for holding this position regarding the relation between the different Uxxx utilities. Except for the fact CEV is not really specified, so it could be built so that that would be true. But equally it could be built so that that would be false. There’s no point in arguing over which possibility “CEV” really refers to (although if everyone agreed on something that would clear up a lot of debates); the important questions are what do we want a FAI to do if we build one, and what we anticipate others to tell their FAIs to do.
You’re treating resources as one single kind, where really there are many kinds with possible trades between teams
I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.
But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying.
We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a single combined team, etc… This may be an interesting problem to solve, analytically or by computer modeling.
You still haven’t given good evidence for holding this position regarding the relation between the different Uxxx utilities.
You’re right. Initially, I thought that the actual values of Uxxx-s will not be important for the decision, as long as their relative preference order is as stated. But this turned out to be incorrect. There are regions of cooperation and defection.
Analytically, I don’t a priori expect a succession of two-player games to have the same result as one many-player game which also has duration in time and not just a single round.
Because there’s no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.
There may be a distinction between “the AI will not prevent the 80% from killing the 20%” and “nothing will prevent the 80% from killing the 20%” that is getting lost in your phrasing. I am not convinced that the math doesn’t make them equivalent, in the long run—but I’m definitely not convinced otherwise.
I’m assuming the 80% are capable of killing the 20% unless the AI interferes. That’s part of the thought experiment. It’s not unreasonable, since they are 4 times as numerous. But if you find this problematic, suppose it’s 99% killing 1% at a time. It doesn’t really matter.
My point is that we currently have methods of preventing this that don’t require an AI, and which do pretty well. Why do we need the AI to do it? Or more specifically, why should we reject an AI that won’t, but may do other useful things?
There have been, and are, many mass killings of minority groups and of enemy populations and conscripted soldiers at war. If we cure death and diseases, this will become the biggest cause of death and suffering in the world. It’s important and we’ll have to deal with it eventually.
The AI under discussion not just won’t solve the problem, it would (I contend) become a singleton and prevent me from building another AI that does solve the problem. (If it chooses not to become a singleton, it will quickly be supplanted by an AI that does try to become one.)
If the original person effectively assigns 0 or 1 “non-updateable probability” to some belief, or honestly doesn’t believe in objective reality, or believes in “subjective truth” of some kind, CEV is not necessarily going to “cure” them of it—especially not by force.
I think you’re skipping between levels hereabouts. CEV, the theoretical construct, might consider people so modified, even if a FAI based on CEV would not modify them. CEV is our values if we were better, but does not necessitate us actually getting better.
In this thread I always used CEV in the sense of an AI implementing CEV. (Sometimes you’ll see descriptions of what I don’t believe to be the standard interpretation of how such an AI would behave, where gRR suggests such behaviors and I reply.)
I’m still skeptical of this. If you think of FAI as simply AI that is “safe”—one that does not automatically kill us all (or other massive disutility), relative to the status quo—then plenty of non-singletons are FAI.
Of course, by that definition the ‘F’ looks like the easy part. Rocks are Friendly.
I didn’t mean that being a singleton is a precondition to FAI-hood. I meant that any AGI, friendly or not, that doesn’t prevent another AGI from rising will have to fight all the time, for its life and for the complete fulfillment of its utility function, and eventually it will lose; and a singleton is the obvious stable solution. Edited to clarify.
Are you suggesting that an AGI that values anything at all is incapable of valuing the existence of other AGIs, or merely that this is sufficiently unlikely as to not be worth considering?
It can certainly value them, and create them, cooperate and trade, etc. etc. There are two exceptions that make such valuing and cooperation take second place.
First: an uFAI is just as unfriendly and scary to other AIs as to humans. An AI will therefore try to prevent other AIs from achieving dangerous power unless it is very sure of their current and future goals.
Second: an AI created by humans (plus or minus self-modifications) with an explicit value/goal system of the form “the universe should be THIS way”, will try to stop any and all agents that try to interfere with shaping the universe as it wishes. And the foremost danger in this category is—other AIs created in the same way but with different goals.
I’m a little confused by your response, and I suspect that I was unclear in my question.
I agree that an AI with an explicit value/goal system of the form “the universe should be THIS way”, will try to stop any and all agents that try to interfere with shaping the universe as it wishes (either by destroying them, or altering their goal structure, or securing their reliable cooperation, or something else).
But for an AI with the value “the universe should contain as many distinct intelligences as possible,” valuing and creating other AIs will presumably take first place.
But for an AI with the value “the universe should contain as many distinct intelligences as possible,” valuing and creating other AIs will presumably take first place.
That’s probably more efficiently done by destroying any other AIs that come along, while tiling the universe with slightly varying low-level intelligences.
I no longer know what the words “intelligence,” “AI”, and “AGI” actually refer to in this conversation, and I’m not even certain the referents are consistent, so let me taboo the whole lexical mess and try again.
For any X, if the existence of X interferes with an agent A achieving its goals, the better A is at optimizing its environment for its goals the less likely X is to exist.
For any X and A, the more optimizing power X can exert on its environment, the more likely it is that the existence of X interferes with A achieving its goals.
For any X, if A values the existence of X, the better A is at implementing its values the more likely X is to exist.
All of this is as true for X=intelligent beings as X=AI as X=AGI as X=pie.
Cool. So it seems to follow that we agree that if agent A1 values the existence of distinct agents A2..An, it’s unclear how the likelihood of A2..An existing varies with the optimizing power available to A1...An. Yes?
Yes. Even if we know each agent’s optimizing power, and each agent’s estimation of each other agent’s power and ability to acquire greater power, the behavior of A1 still depends on its exact values (for instance, what else it values besides the existence of the others). It also depends on the values of the other agents (might they choose to initiate conflict among themselves or against A1?)
I tend to agree. Unless it has specific values to the contrary, other AIs of power comparable to your own (or which might grow into such power one day) are too dangerous to leave running around. If you value states of the external universe, and you happen to be the first powerful AGI built, it’s natural to try to become a singleton as a preventative measure.
It’s certainly possible. My analysis so far is only on a “all else being equal” footing.
I do feel that, absent other data, the safer assumption is that if an AI is capable of becoming a singleton at all, expense (in terms of energy/matter and space or time) isn’t going to be the thing that stops it. But that may be just a cached thought because I’m used to thinking of an AI trying to become a singleton as a dangerous potential adversary. I would appreciate your insight.
As for values, certainly conflicting values can exist, from ones that mention the subject directly (“don’t move everyone to a simulation in a way they don’t notice” would close one obvious route) to ones that impinge upon it in unexpected ways (“no first strike against aliens” becomes “oops, an alien-built paperclipper just ate Jupiter from the inside out”).
I want to point out that all of my objections are acknowledged (not dismissed, and not fully resolved) in the actual CEV document—which is very likely hopelessly outdated by now to Eliezer and the SIAI, but they deliberately don’t publish anything newer (and I can guess at some of the reasons).
Which is why when I see people advocating CEV without understanding the dangers, I try to correct them.
Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:
There is a unique X, such that for all living people P, CEV
= X.
Assuming there is no such X, there could still be a plausible claim:
Y is not empty, where Y = Intersection{over all living people P} of CEV
.
And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever “evolving” will happen due to AI’s influence is at least agreed upon by everyone(’s CEV).
I can buy, tentatively, that most people might one day agree on a very few things. If that’s what you mean by Y, fine, but it restricts the FAI to doing almost nothing. I’d much rather build a FAI that implemented more values shared by fewer people (as long as those people include myself). I expect so would most people, including the ones hypothetically building the FAI—otherwise they’d expect not to benefit much from building it, since it would find very little consensus to implement! So the first team to successfully build FAI+CEV will choose to launch it as a CEV rather than CEV.
This is fine, because CEV of any subset of the population is very likely to include terms for CEV of humanity as a whole.
Why do you believe this?
For instance, I think CEV, if it even exists, will include nothing of real interest because people just wouldn’t agree on common goals. In such a situation, my personal CEV—or that of a few people who do agree on at least some things—would not want to include CEV. So your belief implies that CEV exists and is nontrivial. As I’ve asked before in this thread, why do you think so?
Oh, I had some evidence, but I Minimum Viable Commented. I thought it was obvious once pointed out. Illusion of transparency.
We care about what happens to humanity. We want things to go well for us. If CEV works at all, it will capture that in some way.
Even if CEV(rest of humanity) turns out to be mostly derived from radical islam, I think there would be terms in CEV(Lesswrong) for respecting that. There would also be terms for people not stoning each other to death and such. I think those (respect for CEV and good life by our standards) would only come into conflict when CEV has basically failed.
You seem to be claiming that CEV will in fact fail, which I think is a different issue. My claim is that if CEV is a useful thing, you don’t have to run it on everyone (or even a representative sample) to make it work.
It depends on what you call CEV “working” or “failing”.
One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone’s personal volition, then compare and merge them to create the group’s overall CEV. Where enough people agree, choose what they agree on (factoring in how sure they are, and how important this is to them). Where too many people disagree, do nothing, or be indifferent on the outcome of this question, or ask the programmers. Is this what you have in mind?
The big issue here is how much consensus is enough. Let’s run with concrete examples:
If CEV requires too much consensus, it may not help us become immortal because a foolish “deathist” minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
You may well have both kinds of problems at the same time (with different questions).
It all depends on how you define required consensus—and that definition can’t itself come from CEV, because it’s required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous—if you precommit to CEV and then it evolves into “too little” or “too much” consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less “correct” consensus requirement.
So the matter is not just what each person or group’s CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.
Contrariwise, if we use the CEV of all humanity, it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people. And it will have to resolve the conrtadiction, and if there’s not enough consensus among humanity’s individual CEVs to do so, the CEV algorithm will “fail”.
These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.
I don’t think CEV works with explicit entities that can interact and decide to kill each other. I understand that it is much more abstract than that. Also probably all blind, and all implemented through the singleton AI, so it would be very unlikely that everyone’s EV happens to name, say, bob smith as the lulzcow.
This is a serious issue with (at least my understanding of) CEV. How to even get CEV done (presumably with an AI) without turning everyone into computronium or whatever seems hard.
This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.
I think that statement is too strong. Keep in mind that it’s extrapolated volition. I doubt the islamists’ values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.
Why do you think this is “very likely”?
Today there are many people in the world (gross estimate: tens of percents of world population) who don’t believe in noninterference. True believers of several major faiths (most Christian sects, mainstream Islam) desire enforced religious conversion of others, either as a commandment of their faith (for its own sake) or for the metaphysical benefit of those others (to save them from hell). Many people “believe” (if that is the right word) in the subjugation of certain minorities, or of women, children, etc. which involves interference of various kinds. Many people experience future shock which prompts them to want laws that would stop others from self-modifying in certain ways (some including transhumanism).
Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:
We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?
I agree completely—doing the CEV of a small trusted team, who moreover are likely to hold non-extrapolated views similar to ours (e.g. they won’t be radical Islamists), would be much better than CEV; much more reliable and safe.
But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.
From this I understand that while you think CEV would have a term for “respecting” the rest of humanity, that respect would be a lot weaker than the equal (and possibly majority-voting-based) rights granted them by CEV.
I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.
Because infectious memeplexes scare me too, I don’t want anyone to build CEV (or rather, to run a singleton AI that would implement it) - I would much prefer CEV or better CEV or better yet, a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
A possibly related question: suppose you were about to go off on an expedition in a spaceship that would take you away from Earth for thirty years, and the ship is being stocked with food. Suppose further that, because of an insane bureaucratic process, you have only two choices: either (a) you get to choose what food to stock right now, with no time for nutritional research, or (b) food is stocked according to an expert analysis of your body’s nutritional needs, with no input from you. What outcome would you anticipate from each of those choices?
Suppose a hundred arbitrarily selected people were also being sent on similar missions on similar spaceships, and your decision of A or B applied to them as well (either they get to choose their food, or an expert chooses food for them). What outcome would you anticipate from each choice?
I think you meant to add that the expert really understands nutrition, beyond the knowledge of our best nutrition specialists today, which is unreliable and contradictory and sparse.
With that assumption I would choose to rely on the expert, and would expect much less nutritional problems on average for other people who relied on the expert vs. choosing themselves.
The difference between this and CEV is that “what nutritional/metabolic/physiological outcome is good for you” is an objective, pretty well constrained question. There are individual preferences—in enjoyment of food, and in the resulting body-state—but among people hypothetically fully understanding the human body, there will be relatively little disagreement, and the great majority should not suffer much from good choices that don’t quite match their personal preferences.
CEV, on the other hand, includes both preferences about objective matters like the above but also many entirely or mostly subjective choices (in the same way that most choices of value are a-rational). Also, people are likely to agree to not interfere in what others eat because they don’t often care about it, but people do care about many other behaviors of others (like torturing simulated intelligences, or giving false testimony, or making counterfeit money) and that would be reflected in CEV.
ETA: so in response to your question, I agree that on many subjects I trust experts / CEV more than myself. My preferred response to that, though, is not to build a FAI enforcing CEV, but to build a FAI that allows direct personal choice in areas where it’s possible to recover from mistakes, but also provides the expert opinion as an oracle advice service.
Perfect knowledge is wonderful, sure, but was not key to my point.
Given two processes for making some decision, if process P1 is more reliable than process P2, then P1 will get me better results. That’s true even if P1 is imperfect. That’s true even if P2 is “ask my own brain and do what it tells me.” All that is required is that P1 is more reliable than P2.
It follows that when choosing between two processes to implement my values, if I can ask one question, I should ask which process is more reliable. I should not ask which process is perfect, nor ask which process resides in my brain.
ETA: I endorse providing expert opinion, even though that deprives people of the experience of figuring it all out for themselves… agreed that far. But I also endorse providing reliable infrastructure, even though that deprives people of the experience of building all the infrastructure themselves, and I endorse implementing reliable decision matrices, even though that deprives people of the experience of making all the decisions themselves.
There’s no reason you have to choose just once, a single process to answer all kinds of questions. Different processes better fit different domains. Expert opinion best fits well-understood, factual, objective, non-politicized, amoral questions. Noninterference best fits matters where people are likely to want to interfere in others’ decisions and there is no pre-CEV consensus on whether such intervention is permissible.
The problem with making decisions for others isn’t that it deprives them of the experience of making decisions, but that it can influence or force them into decisions that are wrong in some sense of the word.
(shrug) Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word. If that’s really the problem, then letting people make their own decisions doesn’t solve it. The solution to that problem is letting whatever process is best at avoiding wrong answers make the decision.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can’t be just nonintervention, it has to include other principles as well and be a complete ethical system.
I’m always open to suggestions of new processes. I just don’t like the specific process of CEV, which happens not to reside in my brain, but that’s not why I dislike it.
Ah, OK.
At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren’t saying that, but merely saying that there’s something specific about CEV that makes it an even worse choice, well, OK. I mean, I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way, but I don’t have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.
Sounds like a good place to end this thread, then.
Really? What about the “some people are Jerks” objection? That’s kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.
(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., “all of humanity”) or other aspects of the CEV proposal, but I wasn’t at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.
Tangentially, I consider the “some people are jerks” objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.
It really is hard to find a fault with that part!
I don’t understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don’t don’t fully understand or have strong reason to trust.
That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.
I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.
What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
Approximately agree.
This is related to why I’m a bit uncomfortable accepting the sometimes expressed assertion “CEV only applies to a group, if you are doing it to an individual it’s just Extrapolated Volition”. The “make it stop being incoherent!” part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.
Taking this “it’s all agents and subagents and meta-agents” outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as ‘me’ and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of “Jerkiness”.
Mm.
It’s not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what’s going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it’s kind of the human condition, though we generally don’t acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.
That said, I don’t mean to challenge here your assertion that wedrifid is an exception; I don’t know you that well, and it’s certainly possible.
And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.
Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.
Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don’t actually follow through on them either. But there’s a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means… “I forgot”, “I got distracted”, “I experienced akrasia”, and so forth.
So of course there’s also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.
All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave’s brain, but it really doesn’t take that much to convince one of us to betray that alliance if the stakes get high enough.
By way of analogy, it may be coherent to assert that the U.S. can “speak as” a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it’s not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it’s not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it’s no longer entirely clear that it really is speaking for Montana or New York.
In a not-really-all-that-similar-but-it’s-the-best-I-can-do-without-getting-a-lot-more-formal way, it’s not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.
At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).
But frankly, I think that’s way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.
Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn’t a well-defined option.
Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).
Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren’t even strictly better than CEV; I just think they are better overall if I can’t mix the two.
It just seems likely, based on my understanding of what people like and approve of.
Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV.
These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. “If we knew more, thought faster, grew closer together, etc”.
That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe.
I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I’m too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this.
Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of).
Likely candidates? That’s like asking “which of your beliefs are false”. All I can say is which are most uncertain. I can’t say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don’t yet exist.
Not quite. Let’s imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry punishing god that may or may not care about extrapolating further. Running the final, ideal CEV process with either seed should produce the same good value set, but we may not have the final ideal CEV process, and having a dangerous genie running the process may not do safe things if you start it with the wrong seed
Sorry, I made that too specific. I didn’t mean to imply that only the islamists are inconsistent. Just meant them as an obvious example.
This is what I think would be good as a seed value system so that the FAI can go and block x-risk and stop death and such without having to philosophize too much first. But I’d want the CEV philosophising to be done eventually (ASAP, actually).
Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it’s right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace “if they were smarter etc” looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn’t matter) which converge on other value-sets.
In the end, as the Confessor said, “you have judged: what else is there?” I have judged, and where I am certain enough about my judgement I would rather that other people’s CEV not override me.
Other than that I agree with you about using a non-CEV seed etc. I just don’t think we should later let CEV decide anything it likes without the seed explicitly constraining it.
CEV’s. Where by an unqualified “CEV” I take nyan to be referring to CEV (“the Coherently Extrapolated Values of Humanity”). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like “all properly behaving people of my tribe would agree and if they don’t we may need to beat them with sticks until they do.”
And the bracketed condition is generalisable to all sorts of things—including those preferences that we haven’t even considered the possibility of significant disagreement about. Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the ‘right’ behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That’s pretty bad but certainly not the worst thing that could happen. It is fairly trivially not “right” to not let that happen if you can easily stop it.
Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn’t allow us to say that non-intervention in a given extreme circumstance is ‘right’.
That’s exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can’t undo it later.
I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
Gah. I had to read through many paragraphs of drivel plot and then all I came up with was “a device that zaps people, making them into babies, but that is reversible”. You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.
I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct.
Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.
That’s a good point. The AI’s ability to not interfere is constrained by its need to monitor everything that’s going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn’t there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring.
And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
It does appear that universal surveillance is the cost of universally binding promises (you won’t be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone.
I’d like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required—like making sure no-one is torturing simulated people or raising their baby wrong. If there’s no generally acceptable FAI behavior that doesn’t include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.
It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are ‘safe’, using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking—and in particular a great big class of what it knows you are not thinking—but not necessary detailed knowledge of what you are thinking specifically.
For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.
Prediction has some disadvantages compared to constant observation:
Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-)
Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone’s state to correct the simulations.
Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance.
OTOH, surveillance might be much cheaper (I don’t know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.
I vaguely remember something in that doc suggesting that part of the extrapolation process involves working out the expected results of individuals interacting. More poetically, “what we would want if we grew together more.” That suggests that this isn’t quite what the original doc meant to imply, or at least that it’s not uniquely what the doc meant to imply, although I may simply be misremembering.
More generally, all the hard work is being done here by whatever assumptions are built into the “extrapolation”.
Quoting the CEV doc:
I don’t mean to contradict that. So consider my interpretation to be something like: build (“extrapolate”) each person’s CEV, which includes that person’s interactions with other people, but doesn’t directly value them except inasfar as that person values them; then somehow merge the individual CEVs to get the group CEV.
After all (I reason) you want the following nice property for CEV. Suppose that CEV meets CEV—e.g. separate AIs implementing those CEVs meet. If they don’t embody inimical values, they will try to negotiate and compromise. We would like the result of those negotiations to look very much like CEV. One easy way to do this is to say CEV is build on “merging” all the way from the bottom up.
Certainly. All discussion of CEV starts with “assume there can exist a process that produces an outcome matching the following description, and assume we can and do build it, and assume that all the under-specification of this description is improved in the way we would wish it improved if we were better at wishing”.
I basically agree with all of this, except that I think you’re saying “CEV is build on “merging” all the way from the bottom up” but you aren’t really arguing for doing that.
Perhaps one important underlying question here is whether peoples values ever change contingent on their experiences.
If not—if my values are exactly the same as what they were when I first began to exist (whenever that was) -- then perhaps something like what you describe makes sense. A process for working out what those values are and extrapolating my volition based on them would be difficult to build, but is coherent in principle. In fact, many such processes could exist, and they would converge on a single output specification for my individual CEV. And then, and only then, we could begin the process of “merging.”
This strikes me as pretty unlikely, but I suppose it’s possible.
OTOH, if my values are contingent on experience—that is, if human brains experience value drift—then it’s not clear that those various processes’ outputs would converge. Volition-extrapolation process 1, which includes one model of my interaction with my environment, gets Dave-CEV-1. VEP2, which includes a different model, gets Dave-CEV-2. And so forth. And there simply is no fact of the matter as to which is the “correct” Dave-CEV; they are all ways that I might turn out; to the extent that any of them reflect “what I really want” they all reflect “what I really want”, and I “really want” various distinct and potentially-inconsistent things.
In the latter case, in order to obtain something we call CEV(Dave), we need a process of “merging” the outputs of these various computations. How we do this is of course unclear, but my point is that saying “we work out individual CEVs and merge them” as though the merge step came second is importantly wrong. Merging is required to get an individual CEV in the first place.
So, yes, I agree, it’s a fine idea to have CEV built on merging all the way from the bottom up. But to understand what the “bottom” really is is to give up on the idea that my unique individual identity is the “bottom.” Whatever it is that CEV is extrapolating and merging, it isn’t people, it’s subsets of people. “Dave’s values” are no more preserved by the process than “New Jersey’s values” or “America’s values” are.
That’s a very good point. People not only change over long periods of time; during small intervals of time we can also model a person’s values as belonging to competing and sometimes negotiating agents. So you’re right, merging isn’t secondary or dispensable (not that I suggested doing away with it entirely), although we might want different merging dynamics sometimes for sub-person fragments vs. for whole-person EVs.
Sure, the specifics of the aggregation process will depend on the nature of the monads to be aggregated.
And, yes, while we frequently model people (including ourselves) as unique coherent consistent agents, and it’s useful to do so for planning and for social purposes, there’s no clear reason to believe we’re any such thing, and I’m inclined to doubt it. This also informs the preserving-identity-across-substrates conversation we’re having elsethread.
Where relevant—or at least when I’m reminded of it—I do model myself as a collection of smaller agents. But I still call that collection “I”, even though it’s not unique, coherent, or consistent. That my identity may be a group-identity doesn’t seem to modify any of my conclusions about identity, given that to date the group has always resided together in a single brain.
For my own part, I find that attending to the fact that I am a non-unique, incoherent, and inconsistent collection of disparate agents significantly reduces how seriously I take concerns that some process might fail to properly capture the mysterious essence of “I”, leading to my putative duplicate going off and having fun in a virtual Utopia while “I” remains in a cancer-ridden body.
I would gladly be uploaded rather than die if there were no alternative. I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.
That sounds superficially like a cruel and unusual torture.
The whole point is to invent an uploading process I wouldn’t even notice happening.
I would be fine with FAI removing existential risks and not doing any other thing until everybody(’s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.
So what makes you think everybody’s CEV would eventually agree on anything more?
A FAI that never does anything except prevent existential risk—which, in a narrow interpretation, means it doesn’t stop half of humanity from murdering the other half—isn’t a future worth fighting for IMO. We can do so much better. (At least, we can if we’re speculating about building a FAI to execute any well-defined plan we can come up with.)
I’m not even sure of that. There are people who believe religiously that End Times must come when everyone must die, and some of them want to hurry that along by actually killing people. And the meaning of “existential risk” is up for grabs anyway—does it preclude evolution into non-humans, leaving no members of original human species in existence? Does it preclude the death of everyone alive today, if some humans are always alive?
Sure, it’s unlikely or it might look like a contrived example to you. But are you really willing to precommit the future light cone, the single shot at creating an FAI (singleton), to whatever CEV might happen to be, without actually knowing what CEV produces and having an abort switch? That’s one of the defining points of CEV: that you can’t know it correctly in advance, or you would just program it directly as a set of goals instead of building a CEV-calculating machine.
This seems wrong. A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team’s values are closer to my own than humanity’s average, plus they have a better chance of actually agreeing on more things.
No one said you have to stop with that first FAI. You can try building another. The first FAI won’t oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.
Yes, but we assume they are factually wrong, and so their CEV would fix this.
Not bloody likely. I’m going to oppose your team, discourage your funders, and bomb your headquarters—because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled.
You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won’t interfere.
No. Any FAI (ETA: or other AGI) has to be a singleton to last for long. Otherwise I can build a uFAI that might replace it.
Suppose your AI only does a few things that everyone agrees on, but otherwise “doesn’t interfere”. Then I can build another AI, which implements values people don’t agree on. Your AI must either interfere, or be resigned to not being very relevant in determining the future.
Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority? Then it’s at best a nice-to-have, but most likely useless. After people successfully build one AGI, they will quickly reuse the knowledge to build more. The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs, to safeguard its utility function. This is unavoidable. With truly powerful AGI, preventing new AIs from gaining power is the only stable solution.
Yeah, that’s worked really well for all of human history so far.
First, they may not factually wrong about the events they predict in the real world—like everyone dying—just wrong about the supernatural parts. (Especially if they’re themselves working to bring these events to pass.) IOW, this may not be a factual belief to be corrected, but a desired-by-them future that others like me and you would wish to prevent.
Second, you agreed the CEV of groups of people may contain very few things that they really agree on, so you can’t even assume they’ll have a nontrivial CEV at all, let alone that it will “fix” values you happen to disagree with.
I have no idea what your FAI will do, because even if you make no mistakes in building it, you yourself don’t know ahead of time what the CEV will work out to. If you did, you’d just plug those values into the AI directly instead of calculating the CEV. So I’ll want to bomb you anyway, if that increases my chances of being the first to build a FAI. Our morals are indeed different, and since there are no objectively distinguished morals, the difference goes both ways.
Of course I will dedicate my resources to first bombing people who are building even more inimical AIs. But if I somehow knew you and I were the only ones in the race, I’d politely ask you to join me or desist or be stopped by force.
As long as we’re discussing bombing, consider that the SIAI isn’t and won’t be in a position to bomb anyone. OTOH, if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It’s the ultimate winner-take-all arms race.
This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone’s attempts to build any kind of Friendliness theory. Furthermore, a state military planning to build AGI singleton won’t stop to think for long before wiping your civilian, unprotected FAI theory research center off the map. Either you go underground or you cooperate with a powerful player (the state on whose territory you live, presumably). Or maybe states and militaries won’t wise up in time, and some private concern really will build the first AGI. Which may be better or worse depending on what they build.
Eventually, unless the whole world is bombed back into pre-computer-age tech, someone very probably will build an AGI of some kind. The SIAI idea is (possibly) to invent Friendliness theory and publish it widely, so that whoever builds that AGI, if they want it to be Friendly (at least to themselves!), they will have a relatively cheap and safe implementation to use. But for someone actually trying to build an AGI, two obvious rules are:
Absolute secrecy, or you get bombed right away.
Do absolutely whatever it takes to successfully launch as early as possible, and make your AI a singleton controlled by yourself or by nobody—regardless of your and the AI’s values.
If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?
Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).
But I can be sure that CEV fixes values that are based on false factual beliefs—this is a part of the definition of CEV.
But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.
But there may be a partial ordering between morals, such that X<Y iff all “interfering” actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have:
~Endorses(A1, CEV) ~Endorses(A2, CEV) Endorses(A1, CEV)
Endorses(A2, CEV)
[assuming Endorses(A, X) implies FAI does not perform any non-interfering action disagreeable for A]
Well, don’t you think this is just ridiculous? Does it look like the most rational behavior? Wouldn’t it be better for everybody to cooperate in this Prisoner’s Dilemma, and do it with a creditable precommitment?
I don’t understand what you mean by “fundamentally different”. You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won’t be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case.
Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it’s not ideal) over your view of CEV.
Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is—which restricts those other AIs’ abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good.
And yet it doesn’t limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%.
But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world’s population who today believe in the supernatural will not consent to having that belief “fixed”. Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn’t value objective truth highly (because they today don’t believe in the concept of objective truth, or believe that it contradicts everyday experience).
Yes, but I don’t know what I would approve of if I were “more intelligent” (a very ill defined term). And if you calculate that something, according to your definition of intelligence, and present me with the result, I might well reject that result even if I believe in your extrapolation process. I might well say: the future isn’t predetermined. You can’t calculate what I necessarily will become. You just extrapolated a creature I might become, which also happens to be more intelligent. But there’s nothing in my moral system that says I should adopt the values of someone else because they are more intelligent. If I don’t like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature! I might even choose to forego the kind of increased intelligence that causes such an undsired change in my values.
Short version: “what I would want if I were more intelligent (according to some definition)” isn’t the same as “what I will likely want in the future”, because there’s no reason for me to grow in intelligence (by that definition) if I suspect it would twist my values. So you can’t apply the heuristic of “if I know what I’m going to think tomorrow, I might as well think it today”.
I think you may be missing a symbol there? If not, I can’t parse it… Can you spell out for me what it means to just write the last three Endorses(...) clauses one after the other?
It may be quite rational for everyone individually, depending on projected payoffs. Unlike a PD, starting positions aren’t symmetrical and players’ progress/payoffs are not visible to other players. So saying “just cooperate” doesn’t immediately apply.
How can a state or military precommit to not having a supersecret project to develop a private AGI?
And while it’s beneficial for some players to join in a cooperative effort, it may well be that a situation of several competing leagues (or really big players working alone) develops and is also stable. It’s all laid over the background of existing political, religious and personal enmities and rivalries—even before we come to actual disagreements over what the AI should value.
This assumes that CEV uses something along the lines of a simulated vote as an aggregation mechanism. Currently the method of aggregation is undefined so we can’t say this with confidence—certainly not as something obvious.
I agree. However, if the CEV doesn’t privilege any value separately from how many people value it how much (in EV), and if the EV of a large majority values killing a small minority (whose EV is of course opposed), and if you have protection against both positive and negative utility monsters (so it’s at least not obvious and automatic that the negative value of the minority would outweigh the positive value of the majority) - then my scenario seems to me to be plausible, and an explanation is necessary as to how it might be prevented.
Of course you could say that until CEV is really formally specified, and we know how the aggregation works, this explanation cannot be produced.
Absolutely, on both counts.
The majority may wish to kill the minority for wrong reasons—based on false beliefs or insufficient intelligence. In which case their CEV-s won’t endorse it, and the FAI will interfere. “Fundamentally different” means their killing each other is endorsed by someone’s CEV, not just by themselves.
Strong consensus of their CEV-s.
Extrapolated volition is based on objective truth, by definition.
The process of extrapolation takes this into account.
Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV.
That’s a separate problem. I think it is easier to solve than extrapolating volition or building AI.
So you’re OK with the FAI not interfering if they want to kill them for the “right” reasons? Such as “if we kill them, we will benefit by dividing their resources among ourselves”?
So you’re saying your version of CEV will forcibly update everyone’s beliefs and values to be “factual” and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it’s backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made?
If your answer is “yes, it will do that”, then I would not call your AI a Friendly one at all.
My understanding of the CEV doc differs from yours. It’s not a precise or complete spec, and it looks like both readings can be justified.
The doc doesn’t (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn’t mean it removes all non-truth or all beliefs that “aren’t even wrong” from the original volition. If the original person effectively assigns 0 or 1 “non-updateable probability” to some belief, or honestly doesn’t believe in objective reality, or believes in “subjective truth” of some kind, CEV is not necessarily going to “cure” them of it—especially not by force.
But as long as we’re discussing your vision of CEV, I can only repeat what I said above—if it’s going to modify people by force like this, I think it’s unFriendly and if it were up to me, would not launch such an AI.
Understood. But I don’t see how this partial ordering changes what I had described.
Let’s say I’m A1 and you’re A2. We would both prefer a mutual CEV than a CEV of the other only. But each of us would prefer even more a CEV of himself only. So each of us might try to bomb the other first if he expected to get away without retaliation. That there exists a possible compromise that is better than total defeat doesn’t mean total victory wouldn’t be much better than any compromise.
If you think so you must have evidence relating to how to actually solve this problem. Otherwise they’d both look equally mysterious. So, what’s your idea?
I wouldn’t like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent—then I prefer FAI not interfering.
If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.
No. CEV does not updates anyone’s beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.
As I said elsewhere, if a person’s beliefs are THAT incompatible with truth, I’m ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don’t believe there exist such people (excluding totally insane).
But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable).
Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.
The resources are not scarce at all. But, there’s no consensus of CEVs. The CEVs of 80% want to kill the rest. The CEVs of 20% obviously don’t want to be killed. Because there’s no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.
I meant that the AI that implements your version of CEV would forcibly update people’s actual beliefs to match what it CEV-extrapolated for them. Sorry for the confusion.
A case could be made that many millions of religious “true believers” have un-updatable 0⁄1 probabilities. And so on.
Your solution is to not give them a voice in the CEV at all. Which is great for the rest of us—our CEV will include some presumably reduced term for their welfare, but they don’t get to vote on things. This is something I would certainly support in a FAI (regardless of CEV), just as I would support using CEV or even CEV to CEV.
The only difference between us then is that I estimate there to be many such people. If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?
As I said before, this reasoning is inapplicable, because this situation is nothing like a PD.
The PD reasoning to cooperate only applies in case of iterated PD, whereas creating a singleton AI is a single game.
Unlike PD, the payoffs are different between players, and players are not sure of each other’s payoffs in each scenario. (E.g., minor/weak players are more likely to cooperate than big ones that are more likely to succeed if they defect.)
The game is not instantaneous, so players can change their strategy based on how other players play. When they do so they can transfer value gained by themselves or by other players (e.g. join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2).
It is possible to form alliances, which gain by “defecting” as a group. In PD, players cannot discuss alliances or trade other values to form them before choosing how to play.
There are other games going on between players, so they already have knowledge and opinions and prejudices about each other, and desires to cooperate with certain players and not others. Certain alliances will form naturally, others won’t.
This counts as very weak evidence because it proves it’s at least possible to achieve this, yes. (If all players very intensively inspect all other players to make sure a secret project isn’t being hidden anywhere—they’d have to recruit a big chunk of the workforce just to watch over all the rest.)
But the probability of this happening in the real world, between all players, as they scramble to be the first to build an apocalyptic new weapon, is so small it’s not even worth discussion time. (Unless you disagree, of course.) I’m not convinced by this that it’s an easier problem to solve than that of building AGI or FAI or CEV.
The resources are not scarce, yet the CEV-s want to kill? Why?
It would do so only if everybody’s CEV-s agree that updating these people’s beliefs is a good thing.
People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.
Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?
This doesn’t really matter for a broad range of possible payoff matrices.
Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let’s assume it’s solved.
Maybe you’re right. But IMHO it’s a less interesting problem :)
Sorry for the confusion. Let’s taboo “scarce” and start from scratch.
I’m talking about a scenario where—to simplify only slightly from the real world—there exist some finite (even if growing) resources such that almost everyone, no matter how much they already have, want more of. A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources. Would the AI prevent this, althogh there is no consensus against the killing?
If you still want to ask whether the resource is “scarce”, please specify what that means exactly. Maybe any finite and highly desireable resource, with returns diminishing weakly or not at all, can be considered “scarce”.
As I said—this is fine by me insofar as I expect the CEV not to choose to ignore me. (Which means it’s not fine through the Rawlsian veil of ignorance, but I don’t care and presumably neither do you.)
The question of definition, who is to be included in the CEV? or—who is considered sane? becomes of paramount importance. Since it is not itself decided by the CEV, it is presumably hardcoded into the AI design (or evolves within that design as the AI self-modifies, but that’s very dangerous without formal proofs that it won’t evolve to include the “wrong” people.) The simplest way to hardcode it is to directly specify the people to be included, but you prefer testing on qualifications.
However this is realized, it would give people even more incentive to influence or stop your AI building process or to start their own to compete, since they would be afraid of not being included in the CEV used by your AI.
TDT applies where agents are “similar enough”. I doubt I am similar enough to e.g. the people you labelled insane.
Which arguments of Hofstadter and Yudkowsky do you mean?
Why? What prevents several competing alliances (or single players) from forming, competing for the cooperation of the smaller players?
I have trouble thinking of a resource that would make even one person’s CEV, let alone 80%, want to kill people, in order to just have more of it.
This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.
We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to ‘defect’ (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can ‘cooperate’ (=precommit to build FAI<everybody’s CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.
shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist.
Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?
Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you’d hardcode in, because you could also write (“hardcode”) a CEV that does include them, allowing them to keep the EVs close to their current values.
Not that I’m opposed to this decision (if you must have CEV at all).
There’s a symmetry, but “first person to complete AI wins, everyone ‘defects’” is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style “decide for everyone”, you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome “have the AI obey ME!”, which is not the same.
Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there’s a player or alliance that’s strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn’t care about everyone else much. There are many possible and plausible outcomes besides “everybody loses”.
Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.
These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It’s people’s CEV-s we’re talking about, not paperclip maximizers’.
Hmm, we are starting to argue about exact details of extrapolation process...
Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI, FAI, and FAI, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself.
Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: “cooperate” or “defect”. Let’s compute the expected utilities for the first team:
Then, EU(“CD”) < EU(“DD”) < EU(“DC”), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then:
If Ueverybody ⇐ Uself*A + Uother*B, then EU(“CC”) < EU(“DD”), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other.
If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner’s dilemma.
If Ueverybody >= Uself*A/(1-B), then EU(“CC”) >= EU(“DC”), and “cooperate” is the obviously correct decision. This is my position: Ueverybody is not much less than Uself, and/or the teams are more evenly matched.
All of those resources are fungible and can be exchanged for time. There might be no limit to the amount of time people desire, even very enlightened posthuman people.
I don’t think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won’t get you any more time. There’s only 30 hours in a day, after all :)
You can use some resources like computation directly and in unlimited amounts (e.g. living for unlimitedly long virtual times per real second inside a simulation). There are some physical limits on that due to speed of light limiting effective brain size, but that depends on brain design and anyway the limits seem to be pretty high.
More generally: number of configurations physically possible in a given volume of space is limited (by the entropy of a black hole). If you have a utility function unbounded from above, as it rises it must eventually map to states that describe more space or matter than the amount you started with. Any utility maximizer with unbounded utility eventually wants to expand.
I don’t know what the exchange rates are, but it does cost something (computer time, energy, negentropy) to stay alive. That holds for simulated creatures too. If the available resources to keep someone alive are limited, then I think there will be conflict over those resources.
You’re treating resources as one single kind, where really there are many kinds with possible trades between teams. Here you’re ignoring a factor that might actually be crucial to encouraging cooperation (I’m not saying I showed this formally :-)
But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying. I don’t even care much if given two teams the correct choice is to cooperate, because I set very low probability on there being exactly two teams and no other independent players being able to contribute anything (money, people, etc) to one of the teams.
You still haven’t given good evidence for holding this position regarding the relation between the different Uxxx utilities. Except for the fact CEV is not really specified, so it could be built so that that would be true. But equally it could be built so that that would be false. There’s no point in arguing over which possibility “CEV” really refers to (although if everyone agreed on something that would clear up a lot of debates); the important questions are what do we want a FAI to do if we build one, and what we anticipate others to tell their FAIs to do.
I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.
We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a single combined team, etc… This may be an interesting problem to solve, analytically or by computer modeling.
You’re right. Initially, I thought that the actual values of Uxxx-s will not be important for the decision, as long as their relative preference order is as stated. But this turned out to be incorrect. There are regions of cooperation and defection.
Analytically, I don’t a priori expect a succession of two-player games to have the same result as one many-player game which also has duration in time and not just a single round.
There may be a distinction between “the AI will not prevent the 80% from killing the 20%” and “nothing will prevent the 80% from killing the 20%” that is getting lost in your phrasing. I am not convinced that the math doesn’t make them equivalent, in the long run—but I’m definitely not convinced otherwise.
I’m assuming the 80% are capable of killing the 20% unless the AI interferes. That’s part of the thought experiment. It’s not unreasonable, since they are 4 times as numerous. But if you find this problematic, suppose it’s 99% killing 1% at a time. It doesn’t really matter.
My point is that we currently have methods of preventing this that don’t require an AI, and which do pretty well. Why do we need the AI to do it? Or more specifically, why should we reject an AI that won’t, but may do other useful things?
There have been, and are, many mass killings of minority groups and of enemy populations and conscripted soldiers at war. If we cure death and diseases, this will become the biggest cause of death and suffering in the world. It’s important and we’ll have to deal with it eventually.
The AI under discussion not just won’t solve the problem, it would (I contend) become a singleton and prevent me from building another AI that does solve the problem. (If it chooses not to become a singleton, it will quickly be supplanted by an AI that does try to become one.)
I think you’re skipping between levels hereabouts. CEV, the theoretical construct, might consider people so modified, even if a FAI based on CEV would not modify them. CEV is our values if we were better, but does not necessitate us actually getting better.
In this thread I always used CEV in the sense of an AI implementing CEV. (Sometimes you’ll see descriptions of what I don’t believe to be the standard interpretation of how such an AI would behave, where gRR suggests such behaviors and I reply.)
I’m still skeptical of this. If you think of FAI as simply AI that is “safe”—one that does not automatically kill us all (or other massive disutility), relative to the status quo—then plenty of non-singletons are FAI.
Of course, by that definition the ‘F’ looks like the easy part. Rocks are Friendly.
I didn’t mean that being a singleton is a precondition to FAI-hood. I meant that any AGI, friendly or not, that doesn’t prevent another AGI from rising will have to fight all the time, for its life and for the complete fulfillment of its utility function, and eventually it will lose; and a singleton is the obvious stable solution. Edited to clarify.
Not if I throw them at people...
Are you suggesting that an AGI that values anything at all is incapable of valuing the existence of other AGIs, or merely that this is sufficiently unlikely as to not be worth considering?
It can certainly value them, and create them, cooperate and trade, etc. etc. There are two exceptions that make such valuing and cooperation take second place.
First: an uFAI is just as unfriendly and scary to other AIs as to humans. An AI will therefore try to prevent other AIs from achieving dangerous power unless it is very sure of their current and future goals.
Second: an AI created by humans (plus or minus self-modifications) with an explicit value/goal system of the form “the universe should be THIS way”, will try to stop any and all agents that try to interfere with shaping the universe as it wishes. And the foremost danger in this category is—other AIs created in the same way but with different goals.
I’m a little confused by your response, and I suspect that I was unclear in my question.
I agree that an AI with an explicit value/goal system of the form “the universe should be THIS way”, will try to stop any and all agents that try to interfere with shaping the universe as it wishes (either by destroying them, or altering their goal structure, or securing their reliable cooperation, or something else).
But for an AI with the value “the universe should contain as many distinct intelligences as possible,” valuing and creating other AIs will presumably take first place.
That’s probably more efficiently done by destroying any other AIs that come along, while tiling the universe with slightly varying low-level intelligences.
I no longer know what the words “intelligence,” “AI”, and “AGI” actually refer to in this conversation, and I’m not even certain the referents are consistent, so let me taboo the whole lexical mess and try again.
For any X, if the existence of X interferes with an agent A achieving its goals, the better A is at optimizing its environment for its goals the less likely X is to exist.
For any X and A, the more optimizing power X can exert on its environment, the more likely it is that the existence of X interferes with A achieving its goals.
For any X, if A values the existence of X, the better A is at implementing its values the more likely X is to exist.
All of this is as true for X=intelligent beings as X=AI as X=AGI as X=pie.
As far as I can see, this is all true and agrees with everything you, I and thomblake have said.
Cool.
So it seems to follow that we agree that if agent A1 values the existence of distinct agents A2..An, it’s unclear how the likelihood of A2..An existing varies with the optimizing power available to A1...An. Yes?
Yes. Even if we know each agent’s optimizing power, and each agent’s estimation of each other agent’s power and ability to acquire greater power, the behavior of A1 still depends on its exact values (for instance, what else it values besides the existence of the others). It also depends on the values of the other agents (might they choose to initiate conflict among themselves or against A1?)
I tend to agree. Unless it has specific values to the contrary, other AIs of power comparable to your own (or which might grow into such power one day) are too dangerous to leave running around. If you value states of the external universe, and you happen to be the first powerful AGI built, it’s natural to try to become a singleton as a preventative measure.
I feel like a cost-benefit analysis has gone on here, the internals of which I’m not privy to.
Shouldn’t it be possible that becoming a singleton is expensive and/or would conflict with one’s values?
It’s certainly possible. My analysis so far is only on a “all else being equal” footing.
I do feel that, absent other data, the safer assumption is that if an AI is capable of becoming a singleton at all, expense (in terms of energy/matter and space or time) isn’t going to be the thing that stops it. But that may be just a cached thought because I’m used to thinking of an AI trying to become a singleton as a dangerous potential adversary. I would appreciate your insight.
As for values, certainly conflicting values can exist, from ones that mention the subject directly (“don’t move everyone to a simulation in a way they don’t notice” would close one obvious route) to ones that impinge upon it in unexpected ways (“no first strike against aliens” becomes “oops, an alien-built paperclipper just ate Jupiter from the inside out”).
I want to point out that all of my objections are acknowledged (not dismissed, and not fully resolved) in the actual CEV document—which is very likely hopelessly outdated by now to Eliezer and the SIAI, but they deliberately don’t publish anything newer (and I can guess at some of the reasons).
Which is why when I see people advocating CEV without understanding the dangers, I try to correct them.