I don’t think I’m coming across right. I’m not saying that morality is some sort of collective agreement of people in regards to their various preferences. I’m saying that morality is a series of concepts such as fairness, happiness, freedom etc., that these concepts are objective in the sense that it can be objectively determined how much fairness, freedom, happiness etc. there is in the world, and that the sum of these concepts can be expressed as a large equation.
Ah, I think I see your point. What you’re saying—and correct me if I’m wrong—is that there is some objective True Morality, some complex equation that, if applied to any possible situation, will tell you how moral a given act is.
This is probably true.
This equation isn’t written into the human psyche; it exists independantly of what people think about morality. It just is. And even if we don’t know exactly what the equation is, even if we can’t work out the morality of a given act down to the tenth decimal place, we can still apply basic heuristics and arrive at a usable estimate in most situations.
My question is, then—assuming the above is true, how do we find that equation? Does there exist some objective method whereby you, I, a Pebblesorter, and a Paperclipper can all independently arrive at the same definition for what is moral (given that the Pebblesorter and Paperclipper will almost certainly promptly ignore the result)?
(I had thought that you were proposing that we find that equation by summing across the moral values and imperatives of humanity as a whole—excluding the psychopaths. This is why I asked about the exclusion, because it sounded a lot like writing down what you wanted at the end of the page and then going back and discarding the steps that wouldn’t lead there; that is also why I asked about the aliens).
I don’t know if I could tell, but I’d very much prefer that the AI not do that, and would consider myself to have been massively harmed if it did, even if I never found out. My preference is to actually interact with a diverse variety of people, not to merely have a series of experiences that seem like I’m doing it.
Yes, I think we’re in agreement on that. (Though this does suggest that ‘sentient’ may need a proper definition at some point).
What you’re saying—and correct me if I’m wrong—is that there is some objective True Morality, some complex equation that, if applied to any possible situation, will tell you how moral a given act is.
In the same way as there exists a True Set of Prime Numbers, and True Measure of How Many Paperclips There Are...
My question is, then—assuming the above is true, how do we find that equation?
Even though the equation exists independently of our thoughts (the same way primality exists independently from Pebblesorter thoughts) fact that we are capable of caring about the results given by the equation means we must have some parts of it “written” in our heads, the same way Pebblesorters must have some concept of primality “written” in their heads. Otherwise, how would we be capable of caring about its results?
I think that probably evolution metaphorically “wrote” a desire to care about the equation in our heads because if humans care about what is good and right it makes it easier for them to cooperate and trust each other, which has obvious fitness advantages. Of course, the fact that evolution did a good thing by causing us to care about morality doesn’t mean that evolution is always good, or that evolutionary fitness is a moral justification for anything. Evolution is an amoral force causes many horrible things to happen. It just happened that in this particular instance, evolution’s amoral metaphorical “desires” happened to coincide with what was morally good. That coincidence is far from the norm, in fact, evolution probably deleted morality from the brains of sociopaths because double-crossing morally good people also sometimes confers a fitness advantage.
So how do we learn more about this moral equation that we care about? One common form of attempting to get approximations of it in philosophy is called reflective equilibrium, where you take your moral imperatives and heuristics and attempt to find the commonalities and consistencies they have with each other. It’s far from perfect, but I think that this method has produced useful results in the past.
Eliezer has proposed what is essentially a souped up version of reflective equilibrium called Coherent Extrapolated Volition. He has argued, however, that the primary use of CEV is in designing AIs that won’t want to kill us, and that attempting to extrapolate other people’s volition is open to corruption, as we could easily fall to the temptation to extrapolate it to something that personally benefits us.
Does there exist some objective method whereby you, I, a Pebblesorter, and a Paperclipper can all independently arrive at the same definition for what is moral (given that the Pebblesorter and Paperclipper will almost certainly promptly ignore the result)?
Again, we could probably get closer through reflective equilibrium, and by critiquing the methods and results of each other’s reflections. If you somehow managed to get a Pebblesorter or a Paperclipper to do it too, they might generate similar results, although since they don’t intrinsically care about the equation you would probably have to give them some basic instructions before they started working on the problem.
I had thought that you were proposing that we find that equation by summing across the moral values and imperatives of humanity as a whole—excluding the psychopaths.
If we assume that most humans care about acting morally, doing research about what people’s moral imperatives are might be somewhat helpful, since it would allow us to harvest the fruits of other people’s moral reflections and compare them with our own. We can exclude sociopaths because there is ample evidence that they care nothing for morality.
Although I suppose, that a super-genius sociopath who had the basic concept explained to them might be able to do some useful work in the same fashion that a Pebblesorter or Paperclipper might be able to. Of course, the genius sociopath wouldn’t care about the results, and probably would have to be paid a large sum to even agree to work on the problem.
I think that probably evolution metaphorically “wrote” a desire to care about the equation in our heads because if humans care about what is good and right it makes it easier for them to cooperate and trust each other, which has obvious fitness advantages.
Hmmm. That which evolution has “written” into the human psyche could, in theory, and given sufficient research, be read out again (and will almost certainly not be constant across most of humanity, but will rather exist with variations). But I doubt that morality is all in out genetic nature; I suspect that most of it is learned, from our parents, aunts, uncles, grandparents and other older relatives; I think, in short, that morality is memetic rather than genetic. Though evolution still happens in memetic systems just as well as in genetic systems.
So how do we learn more about this moral equation that we care about? One common form of attempting to get approximations of it in philosophy is called reflective equilibrium, where you take your moral imperatives and heuristics and attempt to find the commonalities and consistencies they have with each other. It’s far from perfect, but I think that this method has produced useful results in the past.
Hmmm. Looking at the wikipedia article, I can expect reflective equilibrium to produce a consistent moral framework. I also expect a correct moral framework to be consistent; but not all consistent moral frameworks are correct. (A paperclipper does not have what I’d consider a correct moral framework, but it does have a consistent one).
If you start out close to a correct moral framework, then reflective equilibrium can move you closer, but it doesnt necessarily do so.
Eliezer has proposed what is essentially a souped up version of reflective equilibrium called Coherent Extrapolated Volition. He has argued, however, that the primary use of CEV is in designing AIs that won’t want to kill us, and that attempting to extrapolate other people’s volition is open to corruption, as we could easily fall to the temptation to extrapolate it to something that personally benefits us.
Hmmm. The primary use of trying to find the True Morality Equation, to my mind, is to work it into a future AI. If we can find such an equation, prove it correct, and make an AI that maximises its output value, then that would be an optimally moral AI. This may or may not count as Friendly, but it’s certainly a potential contender for the title of Friendly.
Again, we could probably get closer through reflective equilibrium, and by critiquing the methods and results of each other’s reflections. If you somehow managed to get a Pebblesorter or a Paperclipper to do it too, they might generate similar results, although since they don’t intrinsically care about the equation you would probably have to give them some basic instructions before they started working on the problem.
Carrying through this method to completion could give us—or anyone else—an equation. But is there any way to be sure that it necessarily gives us the correct equation? (A pebblesorter may actually be a very good help in resolving this question; he does not care about morality, and therefore does not have any emotional investment in the research).
The first thought that comes to my mind, is to have a very large group of researchers, divide them into N groups, and have each of these groups attempt, independently, to find an equation; if all of the groups find the same equation, this would be evidence that the equation found is correct (with stronger evidence at larger values of N). However, I anticipate that the acquired results would be N subtly different, but similar, equations.
But I doubt that morality is all in out genetic nature; I suspect that most of it is learned, from our parents, aunts, uncles, grandparents and other older relatives; I think, in short, that morality is memetic rather than genetic.
That’s possible. But memetics can’t build morality out of nothing. At the very least, evolved genetics has to provide a “foundation,” a part of the brain that moral memes can latch onto. Sociopaths lack that foundation, although the research is inconclusive as to what extent this is caused by genetics, and what extent it is caused by later developmental factors (it appears to be a mix of some sort).
Hmmm. Looking at the wikipedia article, I can expect reflective equilibrium to produce a consistent moral framework. I also expect a correct moral framework to be consistent; but not all consistent moral frameworks are correct.
Yes, that’s why I consider reflective equilibrium to be far from perfect. Depending on how many errors you latch onto, it might worsen your moral state.
Carrying through this method to completion could give us—or anyone else—an equation. But is there any way to be sure that it necessarily gives us the correct equation?
Considering how morally messed up the world is now, even an imperfect equation would likely be better (closer to being correct) than our current slapdash moral heuristics. At this point we haven’t even achieved “good enough,” so I don’t think we should worry too much about being “perfect.”
However, I anticipate that the acquired results would be N subtly different, but similar, equations.
That’s not inconceivable. But I think that each of the subtly different equations would likely be morally better than pretty much every approximation we currently have.
But memetics can’t build morality out of nothing. At the very least, evolved genetics has to provide a “foundation,” a part of the brain that moral memes can latch onto. Sociopaths lack that foundation, although the research is inconclusive as to what extent this is caused by genetics, and what extent it is caused by later developmental factors
That sounds plausible, yes.
Considering how morally messed up the world is now, even an imperfect equation would likely be better (closer to being correct) than our current slapdash moral heuristics. At this point we haven’t even achieved “good enough,” so I don’t think we should worry too much about being “perfect.”
Hmmm. Finding an approximation to the equation will probably be easier than step two; encouraging people worldwide to accept the approximation. (Especially since many people who do accept it will then promptly begin looking for loopholes; either to use or to patch them).
However, if the correct equation cannot be found, then this means that the Morality Maximiser AI cannot be designed.
However, if the correct equation cannot be found, then this means that the Morality Maximiser AI cannot be designed.
That’s true, what I was trying to say is that a world ruled by a 99.99% Approximation of Morality Maximizer AI might well be far far better than our current one, even if it is imperfect.
Of course, it might be a problem if we put the 99.99% Approximation of Morality Maximizer AI in power, then find the correct equation, only to discover that the 99AMMAI is unwilling to step down in favor of the Morality Maximizer AI. On the other hand, putting the 99AMM AI in power might be the only way to ensure a Paperclipper doesn’t ascend to power before we find the correct equation and design the MMAI. I’m not sure whether we should risk it or not.
Ah, I think I see your point. What you’re saying—and correct me if I’m wrong—is that there is some objective True Morality, some complex equation that, if applied to any possible situation, will tell you how moral a given act is.
This is probably true.
This equation isn’t written into the human psyche; it exists independantly of what people think about morality. It just is. And even if we don’t know exactly what the equation is, even if we can’t work out the morality of a given act down to the tenth decimal place, we can still apply basic heuristics and arrive at a usable estimate in most situations.
My question is, then—assuming the above is true, how do we find that equation? Does there exist some objective method whereby you, I, a Pebblesorter, and a Paperclipper can all independently arrive at the same definition for what is moral (given that the Pebblesorter and Paperclipper will almost certainly promptly ignore the result)?
(I had thought that you were proposing that we find that equation by summing across the moral values and imperatives of humanity as a whole—excluding the psychopaths. This is why I asked about the exclusion, because it sounded a lot like writing down what you wanted at the end of the page and then going back and discarding the steps that wouldn’t lead there; that is also why I asked about the aliens).
Yes, I think we’re in agreement on that. (Though this does suggest that ‘sentient’ may need a proper definition at some point).
In the same way as there exists a True Set of Prime Numbers, and True Measure of How Many Paperclips There Are...
Even though the equation exists independently of our thoughts (the same way primality exists independently from Pebblesorter thoughts) fact that we are capable of caring about the results given by the equation means we must have some parts of it “written” in our heads, the same way Pebblesorters must have some concept of primality “written” in their heads. Otherwise, how would we be capable of caring about its results?
I think that probably evolution metaphorically “wrote” a desire to care about the equation in our heads because if humans care about what is good and right it makes it easier for them to cooperate and trust each other, which has obvious fitness advantages. Of course, the fact that evolution did a good thing by causing us to care about morality doesn’t mean that evolution is always good, or that evolutionary fitness is a moral justification for anything. Evolution is an amoral force causes many horrible things to happen. It just happened that in this particular instance, evolution’s amoral metaphorical “desires” happened to coincide with what was morally good. That coincidence is far from the norm, in fact, evolution probably deleted morality from the brains of sociopaths because double-crossing morally good people also sometimes confers a fitness advantage.
So how do we learn more about this moral equation that we care about? One common form of attempting to get approximations of it in philosophy is called reflective equilibrium, where you take your moral imperatives and heuristics and attempt to find the commonalities and consistencies they have with each other. It’s far from perfect, but I think that this method has produced useful results in the past.
Eliezer has proposed what is essentially a souped up version of reflective equilibrium called Coherent Extrapolated Volition. He has argued, however, that the primary use of CEV is in designing AIs that won’t want to kill us, and that attempting to extrapolate other people’s volition is open to corruption, as we could easily fall to the temptation to extrapolate it to something that personally benefits us.
Again, we could probably get closer through reflective equilibrium, and by critiquing the methods and results of each other’s reflections. If you somehow managed to get a Pebblesorter or a Paperclipper to do it too, they might generate similar results, although since they don’t intrinsically care about the equation you would probably have to give them some basic instructions before they started working on the problem.
If we assume that most humans care about acting morally, doing research about what people’s moral imperatives are might be somewhat helpful, since it would allow us to harvest the fruits of other people’s moral reflections and compare them with our own. We can exclude sociopaths because there is ample evidence that they care nothing for morality.
Although I suppose, that a super-genius sociopath who had the basic concept explained to them might be able to do some useful work in the same fashion that a Pebblesorter or Paperclipper might be able to. Of course, the genius sociopath wouldn’t care about the results, and probably would have to be paid a large sum to even agree to work on the problem.
Hmmm. That which evolution has “written” into the human psyche could, in theory, and given sufficient research, be read out again (and will almost certainly not be constant across most of humanity, but will rather exist with variations). But I doubt that morality is all in out genetic nature; I suspect that most of it is learned, from our parents, aunts, uncles, grandparents and other older relatives; I think, in short, that morality is memetic rather than genetic. Though evolution still happens in memetic systems just as well as in genetic systems.
Hmmm. Looking at the wikipedia article, I can expect reflective equilibrium to produce a consistent moral framework. I also expect a correct moral framework to be consistent; but not all consistent moral frameworks are correct. (A paperclipper does not have what I’d consider a correct moral framework, but it does have a consistent one).
If you start out close to a correct moral framework, then reflective equilibrium can move you closer, but it doesnt necessarily do so.
Hmmm. The primary use of trying to find the True Morality Equation, to my mind, is to work it into a future AI. If we can find such an equation, prove it correct, and make an AI that maximises its output value, then that would be an optimally moral AI. This may or may not count as Friendly, but it’s certainly a potential contender for the title of Friendly.
Carrying through this method to completion could give us—or anyone else—an equation. But is there any way to be sure that it necessarily gives us the correct equation? (A pebblesorter may actually be a very good help in resolving this question; he does not care about morality, and therefore does not have any emotional investment in the research).
The first thought that comes to my mind, is to have a very large group of researchers, divide them into N groups, and have each of these groups attempt, independently, to find an equation; if all of the groups find the same equation, this would be evidence that the equation found is correct (with stronger evidence at larger values of N). However, I anticipate that the acquired results would be N subtly different, but similar, equations.
That’s possible. But memetics can’t build morality out of nothing. At the very least, evolved genetics has to provide a “foundation,” a part of the brain that moral memes can latch onto. Sociopaths lack that foundation, although the research is inconclusive as to what extent this is caused by genetics, and what extent it is caused by later developmental factors (it appears to be a mix of some sort).
Yes, that’s why I consider reflective equilibrium to be far from perfect. Depending on how many errors you latch onto, it might worsen your moral state.
Considering how morally messed up the world is now, even an imperfect equation would likely be better (closer to being correct) than our current slapdash moral heuristics. At this point we haven’t even achieved “good enough,” so I don’t think we should worry too much about being “perfect.”
That’s not inconceivable. But I think that each of the subtly different equations would likely be morally better than pretty much every approximation we currently have.
That sounds plausible, yes.
Hmmm. Finding an approximation to the equation will probably be easier than step two; encouraging people worldwide to accept the approximation. (Especially since many people who do accept it will then promptly begin looking for loopholes; either to use or to patch them).
However, if the correct equation cannot be found, then this means that the Morality Maximiser AI cannot be designed.
That’s true, what I was trying to say is that a world ruled by a 99.99% Approximation of Morality Maximizer AI might well be far far better than our current one, even if it is imperfect.
Of course, it might be a problem if we put the 99.99% Approximation of Morality Maximizer AI in power, then find the correct equation, only to discover that the 99AMMAI is unwilling to step down in favor of the Morality Maximizer AI. On the other hand, putting the 99AMM AI in power might be the only way to ensure a Paperclipper doesn’t ascend to power before we find the correct equation and design the MMAI. I’m not sure whether we should risk it or not.