This post seems to come out of nowhere… I haven’t seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn’t exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)
So I’m not sure what you’re responding to, and maybe the context matters. But something bewilders me about this whole line of reasoning as applied to what seems to be SIAI’s chosen strategy for avoiding non-Friendliness.
You argue that universality and objectivity and so forth are just goals, ones that we as humans happen to sort high. Sure, agreed.
You argue that it’s wrong to decide what to do on the basis of those goals, because they are merely instrumental; you argue that other goals ( perhaps “life, consciousness, and activity; health and strength...” etc.) are right, or at least more right. Agreed with reservations.
You argue that individual minds will disagree on all of those goals, including the right ones. That seems guaranteed in the space of all possible minds, likely in the space of all evolved minds, and plausible in the space of all human minds.
And, you conclude, just because some mind disagrees with a goal doesn’t mean that goal isn’t right. And if the goal is right, we should pursue it, even if some mind disagrees. Even if a majority of minds disagree. Even (you don’t say this but it seems to follow) if it makes a majority of minds unhappy.
So… OK. Given that, I’m completely confused about why you support CEV.
Part of the point of CEV seems to be that if there is some goal that some subset of a maximally informed and socialized but not otherwise influenced human race would want to see not-achieved, then a process implementing CEV will make sure that the AGI it creates will not pursue that goal. So, no paperclippers. Which is great, and good, and wonderful.
(That said, I see no way to prove that something really is a CEV-implementing AI, even after you’ve turned it on, so I’m not really sure what this strategy buys us in practice. But perhaps you get to that later, and in any case it’s beside my point here.)
And presumably the idea is that humanity’s CEV is different from, say, the SIAI’s CEV, or LW’s CEV, or my personal CEV. Otherwise why complicate matters by involving an additional several billion minds?
But… well, consider the set G of goals in my CEV that aren’t in humanity’s CEV. It’s clear that the goals in G aren’t shared by all human minds… but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?
I’m not trying to make any special claim about my own mind, here. The same argument goes for everyone. To state it more generally, consider this proposition (P): for every right goal some human has, that goal is shared by all humans.
If P is true, then there’s no reason to calculate humanity’s CEV… any human’s CEV will do just as well. If P is false, then implementing humanity’s CEV fails to do the right thing.
This post seems to come out of nowhere… I haven’t seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn’t exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)
So I’m not sure what you’re responding to, and maybe the context matters. But something bewilders me about this whole line of reasoning as applied to what seems to be SIAI’s chosen strategy for avoiding non-Friendliness.
(This kind of picks up from my [earlier comment] (http://lesswrong.com/lw/t3/the_bedrock_of_morality_arbitrary/2xi8?c=1). So if I’m confused, the confusion may start there.)
You argue that universality and objectivity and so forth are just goals, ones that we as humans happen to sort high. Sure, agreed.
You argue that it’s wrong to decide what to do on the basis of those goals, because they are merely instrumental; you argue that other goals ( perhaps “life, consciousness, and activity; health and strength...” etc.) are right, or at least more right. Agreed with reservations.
You argue that individual minds will disagree on all of those goals, including the right ones. That seems guaranteed in the space of all possible minds, likely in the space of all evolved minds, and plausible in the space of all human minds.
And, you conclude, just because some mind disagrees with a goal doesn’t mean that goal isn’t right. And if the goal is right, we should pursue it, even if some mind disagrees. Even if a majority of minds disagree. Even (you don’t say this but it seems to follow) if it makes a majority of minds unhappy.
So… OK. Given that, I’m completely confused about why you support CEV.
Part of the point of CEV seems to be that if there is some goal that some subset of a maximally informed and socialized but not otherwise influenced human race would want to see not-achieved, then a process implementing CEV will make sure that the AGI it creates will not pursue that goal. So, no paperclippers. Which is great, and good, and wonderful.
(That said, I see no way to prove that something really is a CEV-implementing AI, even after you’ve turned it on, so I’m not really sure what this strategy buys us in practice. But perhaps you get to that later, and in any case it’s beside my point here.)
And presumably the idea is that humanity’s CEV is different from, say, the SIAI’s CEV, or LW’s CEV, or my personal CEV. Otherwise why complicate matters by involving an additional several billion minds?
But… well, consider the set G of goals in my CEV that aren’t in humanity’s CEV. It’s clear that the goals in G aren’t shared by all human minds… but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?
I’m not trying to make any special claim about my own mind, here. The same argument goes for everyone. To state it more generally, consider this proposition (P): for every right goal some human has, that goal is shared by all humans.
If P is true, then there’s no reason to calculate humanity’s CEV… any human’s CEV will do just as well. If P is false, then implementing humanity’s CEV fails to do the right thing.
What am I missing here?