It depends on what you call CEV “working” or “failing”.
One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone’s personal volition, then compare and merge them to create the group’s overall CEV. Where enough people agree, choose what they agree on (factoring in how sure they are, and how important this is to them). Where too many people disagree, do nothing, or be indifferent on the outcome of this question, or ask the programmers. Is this what you have in mind?
The big issue here is how much consensus is enough. Let’s run with concrete examples:
If CEV requires too much consensus, it may not help us become immortal because a foolish “deathist” minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
You may well have both kinds of problems at the same time (with different questions).
It all depends on how you define required consensus—and that definition can’t itself come from CEV, because it’s required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous—if you precommit to CEV and then it evolves into “too little” or “too much” consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less “correct” consensus requirement.
So the matter is not just what each person or group’s CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.
Contrariwise, if we use the CEV of all humanity, it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people. And it will have to resolve the conrtadiction, and if there’s not enough consensus among humanity’s individual CEVs to do so, the CEV algorithm will “fail”.
If CEV requires too much consensus, it may not help us become immortal because a foolish “deathist” minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.
I don’t think CEV works with explicit entities that can interact and decide to kill each other. I understand that it is much more abstract than that. Also probably all blind, and all implemented through the singleton AI, so it would be very unlikely that everyone’s EV happens to name, say, bob smith as the lulzcow.
It all depends on how you define required consensus—and that definition can’t itself come from CEV, because it’s required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous—if you precommit to CEV and then it evolves into “too little” or “too much” consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less “correct” consensus requirement.
This is a serious issue with (at least my understanding of) CEV. How to even get CEV done (presumably with an AI) without turning everyone into computronium or whatever seems hard.
So the matter is not just what each person or group’s CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.
This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.
it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people.
I think that statement is too strong. Keep in mind that it’s extrapolated volition. I doubt the islamists’ values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.
These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.
Why do you think this is “very likely”?
Today there are many people in the world (gross estimate: tens of percents of world population) who don’t believe in noninterference. True believers of several major faiths (most Christian sects, mainstream Islam) desire enforced religious conversion of others, either as a commandment of their faith (for its own sake) or for the metaphysical benefit of those others (to save them from hell). Many people “believe” (if that is the right word) in the subjugation of certain minorities, or of women, children, etc. which involves interference of various kinds. Many people experience future shock which prompts them to want laws that would stop others from self-modifying in certain ways (some including transhumanism).
Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:
We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?
This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.
I agree completely—doing the CEV of a small trusted team, who moreover are likely to hold non-extrapolated views similar to ours (e.g. they won’t be radical Islamists), would be much better than CEV; much more reliable and safe.
But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.
From this I understand that while you think CEV would have a term for “respecting” the rest of humanity, that respect would be a lot weaker than the equal (and possibly majority-voting-based) rights granted them by CEV.
I think that statement is too strong. Keep in mind that it’s extrapolated volition. I doubt the islamists’ values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.
I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.
Because infectious memeplexes scare me too, I don’t want anyone to build CEV (or rather, to run a singleton AI that would implement it) - I would much prefer CEV or better CEV or better yet, a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
or better yet, a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
A possibly related question: suppose you were about to go off on an expedition in a spaceship that would take you away from Earth for thirty years, and the ship is being stocked with food. Suppose further that, because of an insane bureaucratic process, you have only two choices: either (a) you get to choose what food to stock right now, with no time for nutritional research, or (b) food is stocked according to an expert analysis of your body’s nutritional needs, with no input from you. What outcome would you anticipate from each of those choices?
Suppose a hundred arbitrarily selected people were also being sent on similar missions on similar spaceships, and your decision of A or B applied to them as well (either they get to choose their food, or an expert chooses food for them). What outcome would you anticipate from each choice?
I think you meant to add that the expert really understands nutrition, beyond the knowledge of our best nutrition specialists today, which is unreliable and contradictory and sparse.
With that assumption I would choose to rely on the expert, and would expect much less nutritional problems on average for other people who relied on the expert vs. choosing themselves.
The difference between this and CEV is that “what nutritional/metabolic/physiological outcome is good for you” is an objective, pretty well constrained question. There are individual preferences—in enjoyment of food, and in the resulting body-state—but among people hypothetically fully understanding the human body, there will be relatively little disagreement, and the great majority should not suffer much from good choices that don’t quite match their personal preferences.
CEV, on the other hand, includes both preferences about objective matters like the above but also many entirely or mostly subjective choices (in the same way that most choices of value are a-rational). Also, people are likely to agree to not interfere in what others eat because they don’t often care about it, but people do care about many other behaviors of others (like torturing simulated intelligences, or giving false testimony, or making counterfeit money) and that would be reflected in CEV.
ETA: so in response to your question, I agree that on many subjects I trust experts / CEV more than myself. My preferred response to that, though, is not to build a FAI enforcing CEV, but to build a FAI that allows direct personal choice in areas where it’s possible to recover from mistakes, but also provides the expert opinion as an oracle advice service.
Perfect knowledge is wonderful, sure, but was not key to my point.
Given two processes for making some decision, if process P1 is more reliable than process P2, then P1 will get me better results. That’s true even if P1 is imperfect. That’s true even if P2 is “ask my own brain and do what it tells me.” All that is required is that P1 is more reliable than P2.
It follows that when choosing between two processes to implement my values, if I can ask one question, I should ask which process is more reliable. I should not ask which process is perfect, nor ask which process resides in my brain.
ETA: I endorse providing expert opinion, even though that deprives people of the experience of figuring it all out for themselves… agreed that far. But I also endorse providing reliable infrastructure, even though that deprives people of the experience of building all the infrastructure themselves, and I endorse implementing reliable decision matrices, even though that deprives people of the experience of making all the decisions themselves.
There’s no reason you have to choose just once, a single process to answer all kinds of questions. Different processes better fit different domains. Expert opinion best fits well-understood, factual, objective, non-politicized, amoral questions. Noninterference best fits matters where people are likely to want to interfere in others’ decisions and there is no pre-CEV consensus on whether such intervention is permissible.
The problem with making decisions for others isn’t that it deprives them of the experience of making decisions, but that it can influence or force them into decisions that are wrong in some sense of the word.
(shrug) Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word. If that’s really the problem, then letting people make their own decisions doesn’t solve it. The solution to that problem is letting whatever process is best at avoiding wrong answers make the decision.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word.
True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can’t be just nonintervention, it has to include other principles as well and be a complete ethical system.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
I’m always open to suggestions of new processes. I just don’t like the specific process of CEV, which happens not to reside in my brain, but that’s not why I dislike it.
At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren’t saying that, but merely saying that there’s something specific about CEV that makes it an even worse choice, well, OK. I mean, I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way, but I don’t have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.
Sounds like a good place to end this thread, then.
I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way
Really? What about the “some people are Jerks” objection? That’s kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.
(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., “all of humanity”) or other aspects of the CEV proposal, but I wasn’t at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.
Tangentially, I consider the “some people are jerks” objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.
I was referring to the mechanism for extrapolating a coherent volition from a given target
It really is hard to find a fault with that part!
Tangentially, I consider the “some people are jerks” objection very confused.
I don’t understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don’t don’t fully understand or have strong reason to trust.
That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.
I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.
What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
This is related to why I’m a bit uncomfortable accepting the sometimes expressed assertion “CEV only applies to a group, if you are doing it to an individual it’s just Extrapolated Volition”. The “make it stop being incoherent!” part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
Taking this “it’s all agents and subagents and meta-agents” outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as ‘me’ and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of “Jerkiness”.
It’s not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what’s going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it’s kind of the human condition, though we generally don’t acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.
That said, I don’t mean to challenge here your assertion that wedrifid is an exception; I don’t know you that well, and it’s certainly possible.
And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.
Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.
Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don’t actually follow through on them either. But there’s a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means… “I forgot”, “I got distracted”, “I experienced akrasia”, and so forth.
So of course there’s also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.
All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave’s brain, but it really doesn’t take that much to convince one of us to betray that alliance if the stakes get high enough.
By way of analogy, it may be coherent to assert that the U.S. can “speak as” a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it’s not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it’s not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it’s no longer entirely clear that it really is speaking for Montana or New York.
In a not-really-all-that-similar-but-it’s-the-best-I-can-do-without-getting-a-lot-more-formal way, it’s not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.
At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).
But frankly, I think that’s way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.
Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn’t a well-defined option.
Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).
Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren’t even strictly better than CEV; I just think they are better overall if I can’t mix the two.
It just seems likely, based on my understanding of what people like and approve of.
Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV.
religion, bigots, conservatism.
Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:
These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. “If we knew more, thought faster, grew closer together, etc”.
We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe.
In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?
I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I’m too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this.
Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of).
Likely candidates? That’s like asking “which of your beliefs are false”. All I can say is which are most uncertain. I can’t say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don’t yet exist.
But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.
Not quite. Let’s imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry punishing god that may or may not care about extrapolating further. Running the final, ideal CEV process with either seed should produce the same good value set, but we may not have the final ideal CEV process, and having a dangerous genie running the process may not do safe things if you start it with the wrong seed
I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.
Sorry, I made that too specific. I didn’t mean to imply that only the islamists are inconsistent. Just meant them as an obvious example.
a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
This is what I think would be good as a seed value system so that the FAI can go and block x-risk and stop death and such without having to philosophize too much first. But I’d want the CEV philosophising to be done eventually (ASAP, actually).
Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do.
Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it’s right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
I think CEV will end up more like transhumanism than like islam.
Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace “if they were smarter etc” looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn’t matter) which converge on other value-sets.
In the end, as the Confessor said, “you have judged: what else is there?” I have judged, and where I am certain enough about my judgement I would rather that other people’s CEV not override me.
Other than that I agree with you about using a non-CEV seed etc. I just don’t think we should later let CEV decide anything it likes without the seed explicitly constraining it.
CEV’s. Where by an unqualified “CEV” I take nyan to be referring to CEV (“the Coherently Extrapolated Values of Humanity”). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like “all properly behaving people of my tribe would agree and if they don’t we may need to beat them with sticks until they do.”
The problem is precisely that people disagree pre-extrapolation about [when it’s right to interfere], and therefore we fear their individual volitions will disagree even post extrapolation.
And the bracketed condition is generalisable to all sorts of things—including those preferences that we haven’t even considered the possibility of significant disagreement about. Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the ‘right’ behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That’s pretty bad but certainly not the worst thing that could happen. It is fairly trivially not “right” to not let that happen if you can easily stop it.
Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn’t allow us to say that non-intervention in a given extreme circumstance is ‘right’.
Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
That’s exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can’t undo it later.
I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified [...]
I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
Gah. I had to read through many paragraphs of drivel plot and then all I came up with was “a device that zaps people, making them into babies, but that is reversible”. You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.
I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct.
Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.
That’s a good point. The AI’s ability to not interfere is constrained by its need to monitor everything that’s going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn’t there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring.
And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
It does appear that universal surveillance is the cost of universally binding promises (you won’t be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone.
I’d like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required—like making sure no-one is torturing simulated people or raising their baby wrong. If there’s no generally acceptable FAI behavior that doesn’t include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.
And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are ‘safe’, using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking—and in particular a great big class of what it knows you are not thinking—but not necessary detailed knowledge of what you are thinking specifically.
For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.
Prediction has some disadvantages compared to constant observation:
Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-)
Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone’s state to correct the simulations.
Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance.
OTOH, surveillance might be much cheaper (I don’t know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.
One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone’s personal volition, then compare and merge them to create the group’s overall CEV.
I vaguely remember something in that doc suggesting that part of the extrapolation process involves working out the expected results of individuals interacting. More poetically, “what we would want if we grew together more.” That suggests that this isn’t quite what the original doc meant to imply, or at least that it’s not uniquely what the doc meant to imply, although I may simply be misremembering.
More generally, all the hard work is being done here by whatever assumptions are built into the “extrapolation”.
Had grown up farther together: A model of humankind’s coherent extrapolated volition should not extrapolate the person you’d become if you made your decisions alone in a padded cell. Part of our predictable existence is that we predictably interact with other people. A dynamic for CEV must take a shot at extrapolating human interactions, not just so that the extrapolation is closer to reality, but so that the extrapolation can encapsulate memetic and social forces contributing to niceness.
Our CEV may judge some memetic dynamics as not worth extrapolating—not search out the most appealing trash-talk TV show.
Social interaction is probably intractable for real-world prediction, but no more so than individual volition. That is why I speak of predictable extrapolations, and of calculating the spread.
I don’t mean to contradict that. So consider my interpretation to be something like: build (“extrapolate”) each person’s CEV, which includes that person’s interactions with other people, but doesn’t directly value them except inasfar as that person values them; then somehow merge the individual CEVs to get the group CEV.
After all (I reason) you want the following nice property for CEV. Suppose that CEV meets CEV—e.g. separate AIs implementing those CEVs meet. If they don’t embody inimical values, they will try to negotiate and compromise. We would like the result of those negotiations to look very much like CEV. One easy way to do this is to say CEV is build on “merging” all the way from the bottom up.
More generally, all the hard work is being done here by whatever assumptions are built into the “extrapolation”.
Certainly. All discussion of CEV starts with “assume there can exist a process that produces an outcome matching the following description, and assume we can and do build it, and assume that all the under-specification of this description is improved in the way we would wish it improved if we were better at wishing”.
I basically agree with all of this, except that I think you’re saying “CEV is build on “merging” all the way from the bottom up” but you aren’t really arguing for doing that.
Perhaps one important underlying question here is whether peoples values ever change contingent on their experiences.
If not—if my values are exactly the same as what they were when I first began to exist (whenever that was) -- then perhaps something like what you describe makes sense. A process for working out what those values are and extrapolating my volition based on them would be difficult to build, but is coherent in principle. In fact, many such processes could exist, and they would converge on a single output specification for my individual CEV. And then, and only then, we could begin the process of “merging.”
This strikes me as pretty unlikely, but I suppose it’s possible.
OTOH, if my values are contingent on experience—that is, if human brains experience value drift—then it’s not clear that those various processes’ outputs would converge. Volition-extrapolation process 1, which includes one model of my interaction with my environment, gets Dave-CEV-1. VEP2, which includes a different model, gets Dave-CEV-2. And so forth. And there simply is no fact of the matter as to which is the “correct” Dave-CEV; they are all ways that I might turn out; to the extent that any of them reflect “what I really want” they all reflect “what I really want”, and I “really want” various distinct and potentially-inconsistent things.
In the latter case, in order to obtain something we call CEV(Dave), we need a process of “merging” the outputs of these various computations. How we do this is of course unclear, but my point is that saying “we work out individual CEVs and merge them” as though the merge step came second is importantly wrong. Merging is required to get an individual CEV in the first place.
So, yes, I agree, it’s a fine idea to have CEV built on merging all the way from the bottom up. But to understand what the “bottom” really is is to give up on the idea that my unique individual identity is the “bottom.” Whatever it is that CEV is extrapolating and merging, it isn’t people, it’s subsets of people. “Dave’s values” are no more preserved by the process than “New Jersey’s values” or “America’s values” are.
That’s a very good point. People not only change over long periods of time; during small intervals of time we can also model a person’s values as belonging to competing and sometimes negotiating agents. So you’re right, merging isn’t secondary or dispensable (not that I suggested doing away with it entirely), although we might want different merging dynamics sometimes for sub-person fragments vs. for whole-person EVs.
Sure, the specifics of the aggregation process will depend on the nature of the monads to be aggregated.
And, yes, while we frequently model people (including ourselves) as unique coherent consistent agents, and it’s useful to do so for planning and for social purposes, there’s no clear reason to believe we’re any such thing, and I’m inclined to doubt it. This also informs the preserving-identity-across-substrates conversation we’re having elsethread.
Where relevant—or at least when I’m reminded of it—I do model myself as a collection of smaller agents. But I still call that collection “I”, even though it’s not unique, coherent, or consistent. That my identity may be a group-identity doesn’t seem to modify any of my conclusions about identity, given that to date the group has always resided together in a single brain.
For my own part, I find that attending to the fact that I am a non-unique, incoherent, and inconsistent collection of disparate agents significantly reduces how seriously I take concerns that some process might fail to properly capture the mysterious essence of “I”, leading to my putative duplicate going off and having fun in a virtual Utopia while “I” remains in a cancer-ridden body.
I would gladly be uploaded rather than die if there were no alternative. I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.
It depends on what you call CEV “working” or “failing”.
One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone’s personal volition, then compare and merge them to create the group’s overall CEV. Where enough people agree, choose what they agree on (factoring in how sure they are, and how important this is to them). Where too many people disagree, do nothing, or be indifferent on the outcome of this question, or ask the programmers. Is this what you have in mind?
The big issue here is how much consensus is enough. Let’s run with concrete examples:
If CEV requires too much consensus, it may not help us become immortal because a foolish “deathist” minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
You may well have both kinds of problems at the same time (with different questions).
It all depends on how you define required consensus—and that definition can’t itself come from CEV, because it’s required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous—if you precommit to CEV and then it evolves into “too little” or “too much” consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less “correct” consensus requirement.
So the matter is not just what each person or group’s CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.
Contrariwise, if we use the CEV of all humanity, it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people. And it will have to resolve the conrtadiction, and if there’s not enough consensus among humanity’s individual CEVs to do so, the CEV algorithm will “fail”.
These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.
I don’t think CEV works with explicit entities that can interact and decide to kill each other. I understand that it is much more abstract than that. Also probably all blind, and all implemented through the singleton AI, so it would be very unlikely that everyone’s EV happens to name, say, bob smith as the lulzcow.
This is a serious issue with (at least my understanding of) CEV. How to even get CEV done (presumably with an AI) without turning everyone into computronium or whatever seems hard.
This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.
I think that statement is too strong. Keep in mind that it’s extrapolated volition. I doubt the islamists’ values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.
Why do you think this is “very likely”?
Today there are many people in the world (gross estimate: tens of percents of world population) who don’t believe in noninterference. True believers of several major faiths (most Christian sects, mainstream Islam) desire enforced religious conversion of others, either as a commandment of their faith (for its own sake) or for the metaphysical benefit of those others (to save them from hell). Many people “believe” (if that is the right word) in the subjugation of certain minorities, or of women, children, etc. which involves interference of various kinds. Many people experience future shock which prompts them to want laws that would stop others from self-modifying in certain ways (some including transhumanism).
Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:
We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?
I agree completely—doing the CEV of a small trusted team, who moreover are likely to hold non-extrapolated views similar to ours (e.g. they won’t be radical Islamists), would be much better than CEV; much more reliable and safe.
But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.
From this I understand that while you think CEV would have a term for “respecting” the rest of humanity, that respect would be a lot weaker than the equal (and possibly majority-voting-based) rights granted them by CEV.
I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.
Because infectious memeplexes scare me too, I don’t want anyone to build CEV (or rather, to run a singleton AI that would implement it) - I would much prefer CEV or better CEV or better yet, a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.
A possibly related question: suppose you were about to go off on an expedition in a spaceship that would take you away from Earth for thirty years, and the ship is being stocked with food. Suppose further that, because of an insane bureaucratic process, you have only two choices: either (a) you get to choose what food to stock right now, with no time for nutritional research, or (b) food is stocked according to an expert analysis of your body’s nutritional needs, with no input from you. What outcome would you anticipate from each of those choices?
Suppose a hundred arbitrarily selected people were also being sent on similar missions on similar spaceships, and your decision of A or B applied to them as well (either they get to choose their food, or an expert chooses food for them). What outcome would you anticipate from each choice?
I think you meant to add that the expert really understands nutrition, beyond the knowledge of our best nutrition specialists today, which is unreliable and contradictory and sparse.
With that assumption I would choose to rely on the expert, and would expect much less nutritional problems on average for other people who relied on the expert vs. choosing themselves.
The difference between this and CEV is that “what nutritional/metabolic/physiological outcome is good for you” is an objective, pretty well constrained question. There are individual preferences—in enjoyment of food, and in the resulting body-state—but among people hypothetically fully understanding the human body, there will be relatively little disagreement, and the great majority should not suffer much from good choices that don’t quite match their personal preferences.
CEV, on the other hand, includes both preferences about objective matters like the above but also many entirely or mostly subjective choices (in the same way that most choices of value are a-rational). Also, people are likely to agree to not interfere in what others eat because they don’t often care about it, but people do care about many other behaviors of others (like torturing simulated intelligences, or giving false testimony, or making counterfeit money) and that would be reflected in CEV.
ETA: so in response to your question, I agree that on many subjects I trust experts / CEV more than myself. My preferred response to that, though, is not to build a FAI enforcing CEV, but to build a FAI that allows direct personal choice in areas where it’s possible to recover from mistakes, but also provides the expert opinion as an oracle advice service.
Perfect knowledge is wonderful, sure, but was not key to my point.
Given two processes for making some decision, if process P1 is more reliable than process P2, then P1 will get me better results. That’s true even if P1 is imperfect. That’s true even if P2 is “ask my own brain and do what it tells me.” All that is required is that P1 is more reliable than P2.
It follows that when choosing between two processes to implement my values, if I can ask one question, I should ask which process is more reliable. I should not ask which process is perfect, nor ask which process resides in my brain.
ETA: I endorse providing expert opinion, even though that deprives people of the experience of figuring it all out for themselves… agreed that far. But I also endorse providing reliable infrastructure, even though that deprives people of the experience of building all the infrastructure themselves, and I endorse implementing reliable decision matrices, even though that deprives people of the experience of making all the decisions themselves.
There’s no reason you have to choose just once, a single process to answer all kinds of questions. Different processes better fit different domains. Expert opinion best fits well-understood, factual, objective, non-politicized, amoral questions. Noninterference best fits matters where people are likely to want to interfere in others’ decisions and there is no pre-CEV consensus on whether such intervention is permissible.
The problem with making decisions for others isn’t that it deprives them of the experience of making decisions, but that it can influence or force them into decisions that are wrong in some sense of the word.
(shrug) Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word. If that’s really the problem, then letting people make their own decisions doesn’t solve it. The solution to that problem is letting whatever process is best at avoiding wrong answers make the decision.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can’t be just nonintervention, it has to include other principles as well and be a complete ethical system.
I’m always open to suggestions of new processes. I just don’t like the specific process of CEV, which happens not to reside in my brain, but that’s not why I dislike it.
Ah, OK.
At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren’t saying that, but merely saying that there’s something specific about CEV that makes it an even worse choice, well, OK. I mean, I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way, but I don’t have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.
Sounds like a good place to end this thread, then.
Really? What about the “some people are Jerks” objection? That’s kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.
(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., “all of humanity”) or other aspects of the CEV proposal, but I wasn’t at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.
Tangentially, I consider the “some people are jerks” objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.
It really is hard to find a fault with that part!
I don’t understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don’t don’t fully understand or have strong reason to trust.
That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.
I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.
What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
Approximately agree.
This is related to why I’m a bit uncomfortable accepting the sometimes expressed assertion “CEV only applies to a group, if you are doing it to an individual it’s just Extrapolated Volition”. The “make it stop being incoherent!” part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.
Taking this “it’s all agents and subagents and meta-agents” outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as ‘me’ and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of “Jerkiness”.
Mm.
It’s not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what’s going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it’s kind of the human condition, though we generally don’t acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.
That said, I don’t mean to challenge here your assertion that wedrifid is an exception; I don’t know you that well, and it’s certainly possible.
And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.
Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.
Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don’t actually follow through on them either. But there’s a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means… “I forgot”, “I got distracted”, “I experienced akrasia”, and so forth.
So of course there’s also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.
All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave’s brain, but it really doesn’t take that much to convince one of us to betray that alliance if the stakes get high enough.
By way of analogy, it may be coherent to assert that the U.S. can “speak as” a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it’s not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it’s not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it’s no longer entirely clear that it really is speaking for Montana or New York.
In a not-really-all-that-similar-but-it’s-the-best-I-can-do-without-getting-a-lot-more-formal way, it’s not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.
At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).
But frankly, I think that’s way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.
Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn’t a well-defined option.
Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).
Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren’t even strictly better than CEV; I just think they are better overall if I can’t mix the two.
It just seems likely, based on my understanding of what people like and approve of.
Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV.
These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. “If we knew more, thought faster, grew closer together, etc”.
That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe.
I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I’m too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this.
Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of).
Likely candidates? That’s like asking “which of your beliefs are false”. All I can say is which are most uncertain. I can’t say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don’t yet exist.
Not quite. Let’s imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry punishing god that may or may not care about extrapolating further. Running the final, ideal CEV process with either seed should produce the same good value set, but we may not have the final ideal CEV process, and having a dangerous genie running the process may not do safe things if you start it with the wrong seed
Sorry, I made that too specific. I didn’t mean to imply that only the islamists are inconsistent. Just meant them as an obvious example.
This is what I think would be good as a seed value system so that the FAI can go and block x-risk and stop death and such without having to philosophize too much first. But I’d want the CEV philosophising to be done eventually (ASAP, actually).
Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it’s right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace “if they were smarter etc” looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn’t matter) which converge on other value-sets.
In the end, as the Confessor said, “you have judged: what else is there?” I have judged, and where I am certain enough about my judgement I would rather that other people’s CEV not override me.
Other than that I agree with you about using a non-CEV seed etc. I just don’t think we should later let CEV decide anything it likes without the seed explicitly constraining it.
CEV’s. Where by an unqualified “CEV” I take nyan to be referring to CEV (“the Coherently Extrapolated Values of Humanity”). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like “all properly behaving people of my tribe would agree and if they don’t we may need to beat them with sticks until they do.”
And the bracketed condition is generalisable to all sorts of things—including those preferences that we haven’t even considered the possibility of significant disagreement about. Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the ‘right’ behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That’s pretty bad but certainly not the worst thing that could happen. It is fairly trivially not “right” to not let that happen if you can easily stop it.
Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn’t allow us to say that non-intervention in a given extreme circumstance is ‘right’.
That’s exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can’t undo it later.
I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
Gah. I had to read through many paragraphs of drivel plot and then all I came up with was “a device that zaps people, making them into babies, but that is reversible”. You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.
I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct.
Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.
That’s a good point. The AI’s ability to not interfere is constrained by its need to monitor everything that’s going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn’t there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring.
And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
It does appear that universal surveillance is the cost of universally binding promises (you won’t be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone.
I’d like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required—like making sure no-one is torturing simulated people or raising their baby wrong. If there’s no generally acceptable FAI behavior that doesn’t include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.
It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are ‘safe’, using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking—and in particular a great big class of what it knows you are not thinking—but not necessary detailed knowledge of what you are thinking specifically.
For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.
Prediction has some disadvantages compared to constant observation:
Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-)
Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone’s state to correct the simulations.
Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance.
OTOH, surveillance might be much cheaper (I don’t know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.
I vaguely remember something in that doc suggesting that part of the extrapolation process involves working out the expected results of individuals interacting. More poetically, “what we would want if we grew together more.” That suggests that this isn’t quite what the original doc meant to imply, or at least that it’s not uniquely what the doc meant to imply, although I may simply be misremembering.
More generally, all the hard work is being done here by whatever assumptions are built into the “extrapolation”.
Quoting the CEV doc:
I don’t mean to contradict that. So consider my interpretation to be something like: build (“extrapolate”) each person’s CEV, which includes that person’s interactions with other people, but doesn’t directly value them except inasfar as that person values them; then somehow merge the individual CEVs to get the group CEV.
After all (I reason) you want the following nice property for CEV. Suppose that CEV meets CEV—e.g. separate AIs implementing those CEVs meet. If they don’t embody inimical values, they will try to negotiate and compromise. We would like the result of those negotiations to look very much like CEV. One easy way to do this is to say CEV is build on “merging” all the way from the bottom up.
Certainly. All discussion of CEV starts with “assume there can exist a process that produces an outcome matching the following description, and assume we can and do build it, and assume that all the under-specification of this description is improved in the way we would wish it improved if we were better at wishing”.
I basically agree with all of this, except that I think you’re saying “CEV is build on “merging” all the way from the bottom up” but you aren’t really arguing for doing that.
Perhaps one important underlying question here is whether peoples values ever change contingent on their experiences.
If not—if my values are exactly the same as what they were when I first began to exist (whenever that was) -- then perhaps something like what you describe makes sense. A process for working out what those values are and extrapolating my volition based on them would be difficult to build, but is coherent in principle. In fact, many such processes could exist, and they would converge on a single output specification for my individual CEV. And then, and only then, we could begin the process of “merging.”
This strikes me as pretty unlikely, but I suppose it’s possible.
OTOH, if my values are contingent on experience—that is, if human brains experience value drift—then it’s not clear that those various processes’ outputs would converge. Volition-extrapolation process 1, which includes one model of my interaction with my environment, gets Dave-CEV-1. VEP2, which includes a different model, gets Dave-CEV-2. And so forth. And there simply is no fact of the matter as to which is the “correct” Dave-CEV; they are all ways that I might turn out; to the extent that any of them reflect “what I really want” they all reflect “what I really want”, and I “really want” various distinct and potentially-inconsistent things.
In the latter case, in order to obtain something we call CEV(Dave), we need a process of “merging” the outputs of these various computations. How we do this is of course unclear, but my point is that saying “we work out individual CEVs and merge them” as though the merge step came second is importantly wrong. Merging is required to get an individual CEV in the first place.
So, yes, I agree, it’s a fine idea to have CEV built on merging all the way from the bottom up. But to understand what the “bottom” really is is to give up on the idea that my unique individual identity is the “bottom.” Whatever it is that CEV is extrapolating and merging, it isn’t people, it’s subsets of people. “Dave’s values” are no more preserved by the process than “New Jersey’s values” or “America’s values” are.
That’s a very good point. People not only change over long periods of time; during small intervals of time we can also model a person’s values as belonging to competing and sometimes negotiating agents. So you’re right, merging isn’t secondary or dispensable (not that I suggested doing away with it entirely), although we might want different merging dynamics sometimes for sub-person fragments vs. for whole-person EVs.
Sure, the specifics of the aggregation process will depend on the nature of the monads to be aggregated.
And, yes, while we frequently model people (including ourselves) as unique coherent consistent agents, and it’s useful to do so for planning and for social purposes, there’s no clear reason to believe we’re any such thing, and I’m inclined to doubt it. This also informs the preserving-identity-across-substrates conversation we’re having elsethread.
Where relevant—or at least when I’m reminded of it—I do model myself as a collection of smaller agents. But I still call that collection “I”, even though it’s not unique, coherent, or consistent. That my identity may be a group-identity doesn’t seem to modify any of my conclusions about identity, given that to date the group has always resided together in a single brain.
For my own part, I find that attending to the fact that I am a non-unique, incoherent, and inconsistent collection of disparate agents significantly reduces how seriously I take concerns that some process might fail to properly capture the mysterious essence of “I”, leading to my putative duplicate going off and having fun in a virtual Utopia while “I” remains in a cancer-ridden body.
I would gladly be uploaded rather than die if there were no alternative. I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.
That sounds superficially like a cruel and unusual torture.
The whole point is to invent an uploading process I wouldn’t even notice happening.