Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word.
True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can’t be just nonintervention, it has to include other principles as well and be a complete ethical system.
And, sure, there might be different processes for different questions. But there’s no a priori reason to believe that any of those processes reside in my brain.
I’m always open to suggestions of new processes. I just don’t like the specific process of CEV, which happens not to reside in my brain, but that’s not why I dislike it.
At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren’t saying that, but merely saying that there’s something specific about CEV that makes it an even worse choice, well, OK. I mean, I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way, but I don’t have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.
Sounds like a good place to end this thread, then.
I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way
Really? What about the “some people are Jerks” objection? That’s kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.
(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., “all of humanity”) or other aspects of the CEV proposal, but I wasn’t at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.
Tangentially, I consider the “some people are jerks” objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.
I was referring to the mechanism for extrapolating a coherent volition from a given target
It really is hard to find a fault with that part!
Tangentially, I consider the “some people are jerks” objection very confused.
I don’t understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don’t don’t fully understand or have strong reason to trust.
That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.
I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.
What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
This is related to why I’m a bit uncomfortable accepting the sometimes expressed assertion “CEV only applies to a group, if you are doing it to an individual it’s just Extrapolated Volition”. The “make it stop being incoherent!” part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
Taking this “it’s all agents and subagents and meta-agents” outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as ‘me’ and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of “Jerkiness”.
It’s not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what’s going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it’s kind of the human condition, though we generally don’t acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.
That said, I don’t mean to challenge here your assertion that wedrifid is an exception; I don’t know you that well, and it’s certainly possible.
And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.
Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.
Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don’t actually follow through on them either. But there’s a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means… “I forgot”, “I got distracted”, “I experienced akrasia”, and so forth.
So of course there’s also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.
All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave’s brain, but it really doesn’t take that much to convince one of us to betray that alliance if the stakes get high enough.
By way of analogy, it may be coherent to assert that the U.S. can “speak as” a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it’s not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it’s not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it’s no longer entirely clear that it really is speaking for Montana or New York.
In a not-really-all-that-similar-but-it’s-the-best-I-can-do-without-getting-a-lot-more-formal way, it’s not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.
At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).
But frankly, I think that’s way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.
Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn’t a well-defined option.
Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).
Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren’t even strictly better than CEV; I just think they are better overall if I can’t mix the two.
True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can’t be just nonintervention, it has to include other principles as well and be a complete ethical system.
I’m always open to suggestions of new processes. I just don’t like the specific process of CEV, which happens not to reside in my brain, but that’s not why I dislike it.
Ah, OK.
At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren’t saying that, but merely saying that there’s something specific about CEV that makes it an even worse choice, well, OK. I mean, I’m puzzled by that simply because there doesn’t seem to be anything specific about CEV that one could object to in that way, but I don’t have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.
Sounds like a good place to end this thread, then.
Really? What about the “some people are Jerks” objection? That’s kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.
(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., “all of humanity”) or other aspects of the CEV proposal, but I wasn’t at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.
Tangentially, I consider the “some people are jerks” objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.
It really is hard to find a fault with that part!
I don’t understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don’t don’t fully understand or have strong reason to trust.
That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.
I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.
What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)
If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic… every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn’t even clear that A1 should prefer CEV(Dave) to CEV(group)… while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1′s values than the fraction of Dave that supports them.
If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk… all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.
Approximately agree.
This is related to why I’m a bit uncomfortable accepting the sometimes expressed assertion “CEV only applies to a group, if you are doing it to an individual it’s just Extrapolated Volition”. The “make it stop being incoherent!” part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.
Taking this “it’s all agents and subagents and meta-agents” outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as ‘me’ and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of “Jerkiness”.
Mm.
It’s not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what’s going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it’s kind of the human condition, though we generally don’t acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.
That said, I don’t mean to challenge here your assertion that wedrifid is an exception; I don’t know you that well, and it’s certainly possible.
And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.
Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.
Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don’t actually follow through on them either. But there’s a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means… “I forgot”, “I got distracted”, “I experienced akrasia”, and so forth.
So of course there’s also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.
All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave’s brain, but it really doesn’t take that much to convince one of us to betray that alliance if the stakes get high enough.
By way of analogy, it may be coherent to assert that the U.S. can “speak as” a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it’s not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it’s not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it’s no longer entirely clear that it really is speaking for Montana or New York.
In a not-really-all-that-similar-but-it’s-the-best-I-can-do-without-getting-a-lot-more-formal way, it’s not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.
At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).
But frankly, I think that’s way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.
Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn’t a well-defined option.
Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).
Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren’t even strictly better than CEV; I just think they are better overall if I can’t mix the two.