Let us define a morality function F() as taking as input x=the factual circumstances an agent faces in making a decision, outputting y=the decision the agent makes. It is fairly apparent that practically every agent has an F(). So ELIEZER(x) is the function that describes what Eliezer would choose in situation x. Next, define GROUP{} as the set of morality functions run by all the members of that group.
Let us define CEV() as the function that takes as input a morality function or set of morality functions and outputs a morality function that is improved/made consistent/extrapolated from the input. I’m not asserting the actual CEV formulation will do that, but it is a gesture towards the goal that CEV() is supposed to solve.
For clarity, let the output of CEV(F()) = CEV.F(). Thus, CEV.ELIEZER() is the extrapolated morality from the morality Eliezer is running. In parallel CEV.AMERICA() (which is the output of CEV(AMERICA{})) the single moral function that is the extrapolated morality of everyone in the United States. If CEV() exists, an AI considering/implementing CEV.JOHNDOE() is Friendly to John Doe. Likewise, CEV.GROUP() leads to an AI that is Friendly to every member of the group.
For FAI to be possible, CEV() must output for (A) any morality function or (B) set of morality functions. Further, for provable FAI, it must be possible to (C) mathematically show the output of CEV() before turning on the AI.
If moral realism is false, why is there reason to think (A), (B), or (C) are true?
For FAI to be possible, CEV() must output for (A) any morality function or (B) set of morality functions
Any set? Why not just require that CEV.HUMANITY() be possible? It seems like there are some sets of morality functions G that would be impossible (G={x, ~x}?). Human value is really complex so it’s a difficult thing to a)model it and b) prove the model. Obviously I don’t know how to do that; no one does yet. If moral realism were true and morality were simple and knowable I suppose that would make the job a lot easier… but that doesn’t seem like a world that is still possible. Conversely, morality could be both real and unknowable and impossibly complicated and then we’d be even in worse shape because learning about human values wouldn’t even tell us how to do Friendly AI! Maybe if you gave me some idea of what your alternative to anti-realism would look like I could answer better. In short: Friendliness is really hard, part of the reason it seems so hard to me might have to do with my moral anti-realism but I have trouble imagining plausible realist worlds where things are easier.
First, a terminology point: CEV.HUMANITYCURRENTLYALIVE() != CEV.ALLHUMANITYEVER(). For the anti-realist, CEV.HUMANITYCURRENTLYALIVE() is massively more plausible, and CEV.LONDON() is more plausible than that—but my sense is that this sentence depends on the anti-realist accepting of some flavor of moral relativism.
Second, it seems likely that fairly large groups (i.e. the population of London) already have some {P, ~P}. That’s one reason to think making CEV() is really hard.
Human value is really complex so it’s a difficult thing to a)model it and b) prove the model.
I don’t understand what proving the model means in this context.
If moral realism were true and morality were simple and knowable I suppose that would make the job a lot easier… but that doesn’t seem like a world that is still possible.
I don’t understand why you talk about possibility. “Morality is true, simple, and knowable” seems like an empirical proposition: it just turns out to be false. It isn’t obvious to me that simple moral realism is necessarily false in the way that 2+5=7 is necessarily true.
Conversely, morality could be both real and unknowable
How does the world look different if morality is real and inaccessible vs. not real?
Maybe if you gave me some idea of what your alternative to anti-realism would look like I could answer better.
Pace certain issues about human appetites as objective things, I am an anti-realist—in case that wasn’t clear.
First, a terminology point: CEV.HUMANITYCURRENTLYALIVE() != CEV.ALLHUMANITYEVER
Sure sure. But CEV.ALLHUMANITYEVER is also not the same as all CEV.ALLPOSSIBLEAGENTS.
Second, it seems likely that fairly large groups (i.e. the population of London) already have some {P, ~P}.
Some subroutines are probably inverted but there probably aren’t people with fully negated utility functions from other people. Trade-offs needn’t mean irreconcilable differences. Like I doubt there is anyone in the world who cares as much as you do about the exact opposite of everything you care about.
Human value is really complex so it’s a difficult thing to a)model it and b) prove the model.
I don’t understand what proving the model means in this context.
Show with some confidence that it doesn’t lead to terrible outcomes if implemented.
Morality is true, simple, and knowable” seems like an empirical proposition: it just turns out to be false. It isn’t obvious to me that simple moral realism is necessarily false in the way that 2+5=7 is necessarily true.
I’m not sure that it is. But when I said “still” possible I meant that we have more than enough evidence to rule out the possibility that we are living in such a world. I didn’t mean to imply any beliefs about necessity. That said I am pretty confused about what it would mean for there to be objective facts about right and wrong. Usually I think true beliefs are supposed to constrain anticipated experience. Since moral judgments don’t do that… I’m not quite sure I know what moral realism would really mean.
How does the world look different if morality is real and inaccessible vs. not real?
I imagine it wouldn’t look different but since there is no obvious way of proving a morality logically or empirically I can’t see how moral realists would be able to rule it out.
Pace certain issues about human appetites as objective things, I am an anti-realist—in case that wasn’t clear.
Oh I understand that. I just meant that when you ask:
If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible?
I’m wondering “Opposed to what?”. I’m having trouble imagining the person for whom the prospects of Friendly AI are much brighter because they are a moral realist.
If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible?
I’m wondering “Opposed to what?”. I’m having trouble imagining the person for whom the prospects of Friendly AI are much brighter because they are a moral realist.
It seems to me that moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
I’m not quite sure I know what moral realism would really mean.
I think Alice, a unitary moral realist, believes she is justified in saying: “Anyone whose morality function does not output Q in situation q is a defective human, roughly analogous to the way any human who never feels hungry is defective in some way.”
Bob, a pluralist moral realist, would say: “Anyone whose morality function does not output from the set {Q1, Q2, Q3} in situation q is a defective human.”
Charlie, a moral anti-realist, would say Alice and Bob’s statements are both misleading, being historically contingent, or incapable of being evaluated for truth, or some other problem.
Consider the following statement:
“Every (moral) decision a human will face has a single choice that is most consistent with human nature.”
To me, that position implies that moral realism is true. If you disagree, could you explain why?
I imagine it wouldn’t look different [if morality is real and inaccessible vs. not real] but since there is no obvious way of proving a morality logically or empirically I can’t see how moral realists would be able to rule it out.
What is at stake in the distinction? A set of facts that cannot have causal effect might as well not exist. Compare error theorists to inaccessibility moral realists—the former say value statements cannot be evaluated for truth, the latter say value statements could be true, but in principle, we will never know. For any actual problem, both schools of thought recommend the same stance, right?
moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
Is step 1 even necessary? Presumably in that universe one could just build an AGI that was smart enough to infer those moral truths and implement them, and turn it on secure in the knowledge that even if it immediately started disassembling all available matter to make prime-numbered piles of paperclips, it would be doing the right thing. No?
That’s an interesting point. I suppose it depends on whether a moral realist can think something can be morally right for one class of agents and morally wrong for another class. I think such a position is consistent with moral realism. If that is a moral realist position, then the AI programmer should be worried that an unconstrained AI would naturally develop a morality function different than CEV.HUMANITY().
In other words, when we say moral realist, are we using a two part word with unfortunate ambiguity between realism(morality, agent) and realism(morality, humans)? Wow, I never considered whether this was part of the inferential distance in these types of discussions.
Well, to start with, I would say that CEV is beside the point here. In a universe where there exist moral truths that make up the true morality, if what I want is to do the right thing, there’s no particular reason for me to care about anyone’s volition, extrapolated or otherwise. What I ought to care about is discerning those moral truths. Maybe I can discern them by analyzing human psychology, maybe by analyzing the human genome, maybe by analyzing the physical structure of carbon atoms, maybe by analyzing the formal properties of certain kinds of computations, I dunno… but whatever lets me figure out those moral truths, that is what I ought to be attending to in such a universe, and if humanity’s volition conflicts with those truths, so much the worse for humanity.
So the fact that an unconstrained AI might—or even is guaranteed to—develop a morality function different than CEV.HUMANITY() is not, in that universe, a reason not to build an unconstrained AI. (Well, not a moral reason, anyway. I can certainly choose to forego doing the right thing in that universe if it turns out to be something I personally dislike, but only at the cost of behaving immorally.)
But that’s beside your main point, that even in that universe the moral truths of the universe might be such that different behaviors are most right for different agents. I agree with this completely. Another way of saying it is that total rightness is potentially maximized when different agents are doing (specific) different things. (This might be true in a non-moral-realist universe as well.)
Actually, it may be useful here to be explicit about what we think a moral truth is in that universe. That is, is it a fact about the correct state of the world? Is it a fact about the correct behavior of an agent in a given situation, independent of consequences? Is it a fact about the correct way to be, regardless of behavior or consequences? Is it something else?
Let us define a morality function F() as taking as input x=the factual circumstances an agent faces in making a decision, outputting y=the decision the agent makes. It is fairly apparent that practically every agent has an F(). So ELIEZER(x) is the function that describes what Eliezer would choose in situation x. Next, define GROUP{} as the set of morality functions run by all the members of that group.
Let us define CEV() as the function that takes as input a morality function or set of morality functions and outputs a morality function that is improved/made consistent/extrapolated from the input. I’m not asserting the actual CEV formulation will do that, but it is a gesture towards the goal that CEV() is supposed to solve.
For clarity, let the output of CEV(F()) = CEV.F(). Thus, CEV.ELIEZER() is the extrapolated morality from the morality Eliezer is running. In parallel CEV.AMERICA() (which is the output of CEV(AMERICA{})) the single moral function that is the extrapolated morality of everyone in the United States. If CEV() exists, an AI considering/implementing CEV.JOHNDOE() is Friendly to John Doe. Likewise, CEV.GROUP() leads to an AI that is Friendly to every member of the group.
For FAI to be possible, CEV() must output for (A) any morality function or (B) set of morality functions. Further, for provable FAI, it must be possible to (C) mathematically show the output of CEV() before turning on the AI.
If moral realism is false, why is there reason to think (A), (B), or (C) are true?
Any set? Why not just require that CEV.HUMANITY() be possible? It seems like there are some sets of morality functions G that would be impossible (G={x, ~x}?). Human value is really complex so it’s a difficult thing to a)model it and b) prove the model. Obviously I don’t know how to do that; no one does yet. If moral realism were true and morality were simple and knowable I suppose that would make the job a lot easier… but that doesn’t seem like a world that is still possible. Conversely, morality could be both real and unknowable and impossibly complicated and then we’d be even in worse shape because learning about human values wouldn’t even tell us how to do Friendly AI! Maybe if you gave me some idea of what your alternative to anti-realism would look like I could answer better. In short: Friendliness is really hard, part of the reason it seems so hard to me might have to do with my moral anti-realism but I have trouble imagining plausible realist worlds where things are easier.
First, a terminology point: CEV.HUMANITYCURRENTLYALIVE() != CEV.ALLHUMANITYEVER(). For the anti-realist, CEV.HUMANITYCURRENTLYALIVE() is massively more plausible, and CEV.LONDON() is more plausible than that—but my sense is that this sentence depends on the anti-realist accepting of some flavor of moral relativism.
Second, it seems likely that fairly large groups (i.e. the population of London) already have some {P, ~P}. That’s one reason to think making CEV() is really hard.
I don’t understand what proving the model means in this context.
I don’t understand why you talk about possibility. “Morality is true, simple, and knowable” seems like an empirical proposition: it just turns out to be false. It isn’t obvious to me that simple moral realism is necessarily false in the way that 2+5=7 is necessarily true.
How does the world look different if morality is real and inaccessible vs. not real?
Pace certain issues about human appetites as objective things, I am an anti-realist—in case that wasn’t clear.
Sure sure. But CEV.ALLHUMANITYEVER is also not the same as all CEV.ALLPOSSIBLEAGENTS.
Some subroutines are probably inverted but there probably aren’t people with fully negated utility functions from other people. Trade-offs needn’t mean irreconcilable differences. Like I doubt there is anyone in the world who cares as much as you do about the exact opposite of everything you care about.
Show with some confidence that it doesn’t lead to terrible outcomes if implemented.
I’m not sure that it is. But when I said “still” possible I meant that we have more than enough evidence to rule out the possibility that we are living in such a world. I didn’t mean to imply any beliefs about necessity. That said I am pretty confused about what it would mean for there to be objective facts about right and wrong. Usually I think true beliefs are supposed to constrain anticipated experience. Since moral judgments don’t do that… I’m not quite sure I know what moral realism would really mean.
I imagine it wouldn’t look different but since there is no obvious way of proving a morality logically or empirically I can’t see how moral realists would be able to rule it out.
Oh I understand that. I just meant that when you ask:
I’m wondering “Opposed to what?”. I’m having trouble imagining the person for whom the prospects of Friendly AI are much brighter because they are a moral realist.
It seems to me that moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
I think Alice, a unitary moral realist, believes she is justified in saying: “Anyone whose morality function does not output Q in situation q is a defective human, roughly analogous to the way any human who never feels hungry is defective in some way.”
Bob, a pluralist moral realist, would say: “Anyone whose morality function does not output from the set {Q1, Q2, Q3} in situation q is a defective human.”
Charlie, a moral anti-realist, would say Alice and Bob’s statements are both misleading, being historically contingent, or incapable of being evaluated for truth, or some other problem.
Consider the following statement:
“Every (moral) decision a human will face has a single choice that is most consistent with human nature.”
To me, that position implies that moral realism is true. If you disagree, could you explain why?
What is at stake in the distinction? A set of facts that cannot have causal effect might as well not exist. Compare error theorists to inaccessibility moral realists—the former say value statements cannot be evaluated for truth, the latter say value statements could be true, but in principle, we will never know. For any actual problem, both schools of thought recommend the same stance, right?
Is step 1 even necessary? Presumably in that universe one could just build an AGI that was smart enough to infer those moral truths and implement them, and turn it on secure in the knowledge that even if it immediately started disassembling all available matter to make prime-numbered piles of paperclips, it would be doing the right thing. No?
That’s an interesting point. I suppose it depends on whether a moral realist can think something can be morally right for one class of agents and morally wrong for another class. I think such a position is consistent with moral realism. If that is a moral realist position, then the AI programmer should be worried that an unconstrained AI would naturally develop a morality function different than CEV.HUMANITY().
In other words, when we say moral realist, are we using a two part word with unfortunate ambiguity between realism(morality, agent) and realism(morality, humans)? Wow, I never considered whether this was part of the inferential distance in these types of discussions.
Well, to start with, I would say that CEV is beside the point here. In a universe where there exist moral truths that make up the true morality, if what I want is to do the right thing, there’s no particular reason for me to care about anyone’s volition, extrapolated or otherwise. What I ought to care about is discerning those moral truths. Maybe I can discern them by analyzing human psychology, maybe by analyzing the human genome, maybe by analyzing the physical structure of carbon atoms, maybe by analyzing the formal properties of certain kinds of computations, I dunno… but whatever lets me figure out those moral truths, that is what I ought to be attending to in such a universe, and if humanity’s volition conflicts with those truths, so much the worse for humanity.
So the fact that an unconstrained AI might—or even is guaranteed to—develop a morality function different than CEV.HUMANITY() is not, in that universe, a reason not to build an unconstrained AI. (Well, not a moral reason, anyway. I can certainly choose to forego doing the right thing in that universe if it turns out to be something I personally dislike, but only at the cost of behaving immorally.)
But that’s beside your main point, that even in that universe the moral truths of the universe might be such that different behaviors are most right for different agents. I agree with this completely. Another way of saying it is that total rightness is potentially maximized when different agents are doing (specific) different things. (This might be true in a non-moral-realist universe as well.)
Actually, it may be useful here to be explicit about what we think a moral truth is in that universe. That is, is it a fact about the correct state of the world? Is it a fact about the correct behavior of an agent in a given situation, independent of consequences? Is it a fact about the correct way to be, regardless of behavior or consequences? Is it something else?