I think that probably evolution metaphorically “wrote” a desire to care about the equation in our heads because if humans care about what is good and right it makes it easier for them to cooperate and trust each other, which has obvious fitness advantages.
Hmmm. That which evolution has “written” into the human psyche could, in theory, and given sufficient research, be read out again (and will almost certainly not be constant across most of humanity, but will rather exist with variations). But I doubt that morality is all in out genetic nature; I suspect that most of it is learned, from our parents, aunts, uncles, grandparents and other older relatives; I think, in short, that morality is memetic rather than genetic. Though evolution still happens in memetic systems just as well as in genetic systems.
So how do we learn more about this moral equation that we care about? One common form of attempting to get approximations of it in philosophy is called reflective equilibrium, where you take your moral imperatives and heuristics and attempt to find the commonalities and consistencies they have with each other. It’s far from perfect, but I think that this method has produced useful results in the past.
Hmmm. Looking at the wikipedia article, I can expect reflective equilibrium to produce a consistent moral framework. I also expect a correct moral framework to be consistent; but not all consistent moral frameworks are correct. (A paperclipper does not have what I’d consider a correct moral framework, but it does have a consistent one).
If you start out close to a correct moral framework, then reflective equilibrium can move you closer, but it doesnt necessarily do so.
Eliezer has proposed what is essentially a souped up version of reflective equilibrium called Coherent Extrapolated Volition. He has argued, however, that the primary use of CEV is in designing AIs that won’t want to kill us, and that attempting to extrapolate other people’s volition is open to corruption, as we could easily fall to the temptation to extrapolate it to something that personally benefits us.
Hmmm. The primary use of trying to find the True Morality Equation, to my mind, is to work it into a future AI. If we can find such an equation, prove it correct, and make an AI that maximises its output value, then that would be an optimally moral AI. This may or may not count as Friendly, but it’s certainly a potential contender for the title of Friendly.
Again, we could probably get closer through reflective equilibrium, and by critiquing the methods and results of each other’s reflections. If you somehow managed to get a Pebblesorter or a Paperclipper to do it too, they might generate similar results, although since they don’t intrinsically care about the equation you would probably have to give them some basic instructions before they started working on the problem.
Carrying through this method to completion could give us—or anyone else—an equation. But is there any way to be sure that it necessarily gives us the correct equation? (A pebblesorter may actually be a very good help in resolving this question; he does not care about morality, and therefore does not have any emotional investment in the research).
The first thought that comes to my mind, is to have a very large group of researchers, divide them into N groups, and have each of these groups attempt, independently, to find an equation; if all of the groups find the same equation, this would be evidence that the equation found is correct (with stronger evidence at larger values of N). However, I anticipate that the acquired results would be N subtly different, but similar, equations.
But I doubt that morality is all in out genetic nature; I suspect that most of it is learned, from our parents, aunts, uncles, grandparents and other older relatives; I think, in short, that morality is memetic rather than genetic.
That’s possible. But memetics can’t build morality out of nothing. At the very least, evolved genetics has to provide a “foundation,” a part of the brain that moral memes can latch onto. Sociopaths lack that foundation, although the research is inconclusive as to what extent this is caused by genetics, and what extent it is caused by later developmental factors (it appears to be a mix of some sort).
Hmmm. Looking at the wikipedia article, I can expect reflective equilibrium to produce a consistent moral framework. I also expect a correct moral framework to be consistent; but not all consistent moral frameworks are correct.
Yes, that’s why I consider reflective equilibrium to be far from perfect. Depending on how many errors you latch onto, it might worsen your moral state.
Carrying through this method to completion could give us—or anyone else—an equation. But is there any way to be sure that it necessarily gives us the correct equation?
Considering how morally messed up the world is now, even an imperfect equation would likely be better (closer to being correct) than our current slapdash moral heuristics. At this point we haven’t even achieved “good enough,” so I don’t think we should worry too much about being “perfect.”
However, I anticipate that the acquired results would be N subtly different, but similar, equations.
That’s not inconceivable. But I think that each of the subtly different equations would likely be morally better than pretty much every approximation we currently have.
But memetics can’t build morality out of nothing. At the very least, evolved genetics has to provide a “foundation,” a part of the brain that moral memes can latch onto. Sociopaths lack that foundation, although the research is inconclusive as to what extent this is caused by genetics, and what extent it is caused by later developmental factors
That sounds plausible, yes.
Considering how morally messed up the world is now, even an imperfect equation would likely be better (closer to being correct) than our current slapdash moral heuristics. At this point we haven’t even achieved “good enough,” so I don’t think we should worry too much about being “perfect.”
Hmmm. Finding an approximation to the equation will probably be easier than step two; encouraging people worldwide to accept the approximation. (Especially since many people who do accept it will then promptly begin looking for loopholes; either to use or to patch them).
However, if the correct equation cannot be found, then this means that the Morality Maximiser AI cannot be designed.
However, if the correct equation cannot be found, then this means that the Morality Maximiser AI cannot be designed.
That’s true, what I was trying to say is that a world ruled by a 99.99% Approximation of Morality Maximizer AI might well be far far better than our current one, even if it is imperfect.
Of course, it might be a problem if we put the 99.99% Approximation of Morality Maximizer AI in power, then find the correct equation, only to discover that the 99AMMAI is unwilling to step down in favor of the Morality Maximizer AI. On the other hand, putting the 99AMM AI in power might be the only way to ensure a Paperclipper doesn’t ascend to power before we find the correct equation and design the MMAI. I’m not sure whether we should risk it or not.
Hmmm. That which evolution has “written” into the human psyche could, in theory, and given sufficient research, be read out again (and will almost certainly not be constant across most of humanity, but will rather exist with variations). But I doubt that morality is all in out genetic nature; I suspect that most of it is learned, from our parents, aunts, uncles, grandparents and other older relatives; I think, in short, that morality is memetic rather than genetic. Though evolution still happens in memetic systems just as well as in genetic systems.
Hmmm. Looking at the wikipedia article, I can expect reflective equilibrium to produce a consistent moral framework. I also expect a correct moral framework to be consistent; but not all consistent moral frameworks are correct. (A paperclipper does not have what I’d consider a correct moral framework, but it does have a consistent one).
If you start out close to a correct moral framework, then reflective equilibrium can move you closer, but it doesnt necessarily do so.
Hmmm. The primary use of trying to find the True Morality Equation, to my mind, is to work it into a future AI. If we can find such an equation, prove it correct, and make an AI that maximises its output value, then that would be an optimally moral AI. This may or may not count as Friendly, but it’s certainly a potential contender for the title of Friendly.
Carrying through this method to completion could give us—or anyone else—an equation. But is there any way to be sure that it necessarily gives us the correct equation? (A pebblesorter may actually be a very good help in resolving this question; he does not care about morality, and therefore does not have any emotional investment in the research).
The first thought that comes to my mind, is to have a very large group of researchers, divide them into N groups, and have each of these groups attempt, independently, to find an equation; if all of the groups find the same equation, this would be evidence that the equation found is correct (with stronger evidence at larger values of N). However, I anticipate that the acquired results would be N subtly different, but similar, equations.
That’s possible. But memetics can’t build morality out of nothing. At the very least, evolved genetics has to provide a “foundation,” a part of the brain that moral memes can latch onto. Sociopaths lack that foundation, although the research is inconclusive as to what extent this is caused by genetics, and what extent it is caused by later developmental factors (it appears to be a mix of some sort).
Yes, that’s why I consider reflective equilibrium to be far from perfect. Depending on how many errors you latch onto, it might worsen your moral state.
Considering how morally messed up the world is now, even an imperfect equation would likely be better (closer to being correct) than our current slapdash moral heuristics. At this point we haven’t even achieved “good enough,” so I don’t think we should worry too much about being “perfect.”
That’s not inconceivable. But I think that each of the subtly different equations would likely be morally better than pretty much every approximation we currently have.
That sounds plausible, yes.
Hmmm. Finding an approximation to the equation will probably be easier than step two; encouraging people worldwide to accept the approximation. (Especially since many people who do accept it will then promptly begin looking for loopholes; either to use or to patch them).
However, if the correct equation cannot be found, then this means that the Morality Maximiser AI cannot be designed.
That’s true, what I was trying to say is that a world ruled by a 99.99% Approximation of Morality Maximizer AI might well be far far better than our current one, even if it is imperfect.
Of course, it might be a problem if we put the 99.99% Approximation of Morality Maximizer AI in power, then find the correct equation, only to discover that the 99AMMAI is unwilling to step down in favor of the Morality Maximizer AI. On the other hand, putting the 99AMM AI in power might be the only way to ensure a Paperclipper doesn’t ascend to power before we find the correct equation and design the MMAI. I’m not sure whether we should risk it or not.