I find that claim to be very implausible: to name just one objection to it, it seems to assume that morality is essentially “logical” and based on rational thought, whereas in practice moral beliefs seem to be much more strongly derived from what the people around us believe in. And in general, the hypothesis that all moral beliefs will eventually converge seems to be picking out a very narrow region in the space of possible outcomes, whereas “beliefs will diverge” contains a much broader space. Do you personally believe in that claim?
I’m not sure what I was expecting, but I was a little surprised after seeing you say you object to objective morality. I probably don’t understand CEV well enough and I am pretty sure this is not the case, but it seems like there is so much similarity between CEV and some form of objective morality as described above. In other words, if you don’t think moral beliefs will eventually converge, given enough intelligence, reflection, and gathering data, etc, then how do you convince someone that FAI will make the “correct” decisions based on the extrapolated volition?
CEV in its current form is quite under-specified. I expect that there would exist many, many different ways of specifying it, each of which would produce a different CEV that would converge at a different solution.
For example, Tarleton (2010) notes that CEV is really a family of algorithms which share the following features:
Meta-algorithm: Most of the AGI’s goals will be obtained at run-time from human minds, rather than explicitly programmed in before run-time.
Factually correct beliefs: The AGI will attempt to obtain correct answers to various factual questions, in order to modify preferences or desires that are based upon false factual beliefs.
Singleton: Only one superintelligent AGI is to be constructed, and it is to take control of the world with whatever goal function is decided upon.
Reflection: Individual or group preferences are reflected upon and revised.
Preference aggregation: The set of preferences of a whole group are to be combined somehow.
He comments:
The set of factually correcting, singleton, reflective, aggregative meta-algorithms is larger than just the CEV algorithm. For example, there is no reason to suppose that factual correction, reflection, and aggregation, performed in any order, will give the same result; therefore, there are at least 6 variants depending upon ordering of these various
processes, and many variants if we allow small increments of these processes to be interleaved. CEV also stipulates that the algorithm should extrapolate ordinary human-human social interactions concurrently with the processes of reflection, factual correction and preference aggregation; this requirement could be dropped.
Although one of Eliezer’s desired characteristics for CEV was to ”avoid creating a motive for modern-day humans to fight over the initial dynamic”, a more rigorous definition of CEV will probably require making many design choices for which there will not be any objective answer, and which may be influenced by the designer’s values. The notion that our values should be extrapolated according to some specific criteria is by itself a value-laden proposal: it might be argued that it was enough to start off from our current-day values just as they are, and then incorporate additional extrapolation only if our current values said that we should do so. But doing so would not be a value-neutral decision either, but rather one supporting the values of those who think that there should be no extrapolation, rather than of those who think there should be.
I don’t find any of these issues to be problems, though: as long as CEV found any of the solutions in the set-of-final-values-that-I-wouldn’t-consider-horrible, the fact that the solution isn’t unique isn’t much of an issue. Of course, it’s quite possible that CEV will hit on some solution in that set that I would judge to be inferior to many others also in that set, but so it goes.
It seems there are two claims: One, that each human will be reflectively self-consistent given enough time; two, that the self-consistent solution will be the same for all humans. I’m highly confident of the first; for the second, let me qualify slightly:
Not all human-like things are actually humans, eg psychopaths. Some of these may be fixable.
Some finite tolerance is implied when I say “the same” solution will be arrived at.
With those qualifications, yes, I believe the second claim with, say, 85% confidence.
I find the first claim plausible though not certain, but I would expect that if such individual convergence happens, it will lead to collective divergence not convergence.
When we are young, our moral intuitions and beliefs are a hodge-podge of different things, derived from a wide variety of sources, probably reflecting something like a “consensus morality” that is the average of different moral positions in society. If/when we begin to reflect on these intuitions and beliefs, we will find that they are mutually contradictory. But one person’s modus ponens is another’s modus tollens: faced with the fact that a utilitarian intuition and a deontological intuition contradict each other, say, we might end up rejecting the utilitarian conclusion, rejecting the deontological conclusion, or trying to somehow reconcile them. Since logic by itself does not tell us which alternative we should choose, it becomes determined by extra-logical factors.
Given that different people seem to arrive at different conclusions when presented with such contradictory cases, and given that their judgement seems to be at least weakly predicted by their existing overall leanings, I would guess that the choice of which intuition to embrace would depend on their current balance of other intutions. Thus, if you are already leaning utilitarian, the intuitions which are making you lean that way may combine together and cause you to reject the deontological intuition, and vice versa if you’re learning deontologist. This would mean that a person who initially started with an even mix of both intuitions would, by random drift, eventually end up in a position where one set of intuitions was dominant, after which there would be a self-reinforcing trajectory towards an area increasingly dominated by intuitions compatible with the ones currently dominant. (Though of course the process that determines which intuitions get accepted and which ones get rejected is nowhere as simple as just taking a “majority vote” of intuitions, and some intuitions may be felt so strongly that they are almost impossible to reject.) This would mean that as people carried out self-reflection, their position would end up increasingly idiosyncratic and distant from the consensus morality. This seems to be roughly compatible with what I have anecdotally observed in various people, though my sample size is relatively small.
I feel that I have personally been undergoing this kind of a drift: I originally had the generic consensus morality that one adopts by spending their childhood in a Western country, after which I began reading LW, which worked to select and reinforce my existing set of utilitarian intuitions—but had I not already been utilitarian-leaning, the utilitarian emphasis on LW might have led me to reject those claims and seek out a (say) more deontological influence. But as time has gone by, I have become increasingly aware of the fact that some of my strongest intuitions lean towards negative utilitarianism, whereas LW is more akin to classical utilitarianism. Reflecting upon various intuitions has led me to gradually reject various intuitions that I previously took to support classical rather than negative utilitarianism, thus causing me to move away from the general LW consensus. And since this process has caused some of the intuitions that previously supported a classical utilitarian position to lose their appeal, I expect that moving back towards CU is less likely than continued movement towards NU.
Seconding Kaj_Sotala’s question. Is there a good argument why self-improvement doesn’t have diverging paths due to small differences in starting conditions?
Dunno. CEV actually contains the phrase, “and had grown up farther together,” which the above leaves out. But I feel a little puzzled about the exact phrasing, which does not make “were more the people we wished we were” conditional on this other part—I thought the main point was that people “alone in a padded cell,” as Eliezer puts it there, can “wish they were” all sorts of Unfriendly entities.
I find that claim to be very implausible: to name just one objection to it, it seems to assume that morality is essentially “logical” and based on rational thought, whereas in practice moral beliefs seem to be much more strongly derived from what the people around us believe in. And in general, the hypothesis that all moral beliefs will eventually converge seems to be picking out a very narrow region in the space of possible outcomes, whereas “beliefs will diverge” contains a much broader space. Do you personally believe in that claim?
I’m not sure what I was expecting, but I was a little surprised after seeing you say you object to objective morality. I probably don’t understand CEV well enough and I am pretty sure this is not the case, but it seems like there is so much similarity between CEV and some form of objective morality as described above. In other words, if you don’t think moral beliefs will eventually converge, given enough intelligence, reflection, and gathering data, etc, then how do you convince someone that FAI will make the “correct” decisions based on the extrapolated volition?
CEV in its current form is quite under-specified. I expect that there would exist many, many different ways of specifying it, each of which would produce a different CEV that would converge at a different solution.
For example, Tarleton (2010) notes that CEV is really a family of algorithms which share the following features:
Meta-algorithm: Most of the AGI’s goals will be obtained at run-time from human minds, rather than explicitly programmed in before run-time.
Factually correct beliefs: The AGI will attempt to obtain correct answers to various factual questions, in order to modify preferences or desires that are based upon false factual beliefs.
Singleton: Only one superintelligent AGI is to be constructed, and it is to take control of the world with whatever goal function is decided upon.
Reflection: Individual or group preferences are reflected upon and revised.
Preference aggregation: The set of preferences of a whole group are to be combined somehow.
He comments:
Although one of Eliezer’s desired characteristics for CEV was to ”avoid creating a motive for modern-day humans to fight over the initial dynamic”, a more rigorous definition of CEV will probably require making many design choices for which there will not be any objective answer, and which may be influenced by the designer’s values. The notion that our values should be extrapolated according to some specific criteria is by itself a value-laden proposal: it might be argued that it was enough to start off from our current-day values just as they are, and then incorporate additional extrapolation only if our current values said that we should do so. But doing so would not be a value-neutral decision either, but rather one supporting the values of those who think that there should be no extrapolation, rather than of those who think there should be.
I don’t find any of these issues to be problems, though: as long as CEV found any of the solutions in the set-of-final-values-that-I-wouldn’t-consider-horrible, the fact that the solution isn’t unique isn’t much of an issue. Of course, it’s quite possible that CEV will hit on some solution in that set that I would judge to be inferior to many others also in that set, but so it goes.
It seems there are two claims: One, that each human will be reflectively self-consistent given enough time; two, that the self-consistent solution will be the same for all humans. I’m highly confident of the first; for the second, let me qualify slightly:
Not all human-like things are actually humans, eg psychopaths. Some of these may be fixable.
Some finite tolerance is implied when I say “the same” solution will be arrived at.
With those qualifications, yes, I believe the second claim with, say, 85% confidence.
I find the first claim plausible though not certain, but I would expect that if such individual convergence happens, it will lead to collective divergence not convergence.
When we are young, our moral intuitions and beliefs are a hodge-podge of different things, derived from a wide variety of sources, probably reflecting something like a “consensus morality” that is the average of different moral positions in society. If/when we begin to reflect on these intuitions and beliefs, we will find that they are mutually contradictory. But one person’s modus ponens is another’s modus tollens: faced with the fact that a utilitarian intuition and a deontological intuition contradict each other, say, we might end up rejecting the utilitarian conclusion, rejecting the deontological conclusion, or trying to somehow reconcile them. Since logic by itself does not tell us which alternative we should choose, it becomes determined by extra-logical factors.
Given that different people seem to arrive at different conclusions when presented with such contradictory cases, and given that their judgement seems to be at least weakly predicted by their existing overall leanings, I would guess that the choice of which intuition to embrace would depend on their current balance of other intutions. Thus, if you are already leaning utilitarian, the intuitions which are making you lean that way may combine together and cause you to reject the deontological intuition, and vice versa if you’re learning deontologist. This would mean that a person who initially started with an even mix of both intuitions would, by random drift, eventually end up in a position where one set of intuitions was dominant, after which there would be a self-reinforcing trajectory towards an area increasingly dominated by intuitions compatible with the ones currently dominant. (Though of course the process that determines which intuitions get accepted and which ones get rejected is nowhere as simple as just taking a “majority vote” of intuitions, and some intuitions may be felt so strongly that they are almost impossible to reject.) This would mean that as people carried out self-reflection, their position would end up increasingly idiosyncratic and distant from the consensus morality. This seems to be roughly compatible with what I have anecdotally observed in various people, though my sample size is relatively small.
I feel that I have personally been undergoing this kind of a drift: I originally had the generic consensus morality that one adopts by spending their childhood in a Western country, after which I began reading LW, which worked to select and reinforce my existing set of utilitarian intuitions—but had I not already been utilitarian-leaning, the utilitarian emphasis on LW might have led me to reject those claims and seek out a (say) more deontological influence. But as time has gone by, I have become increasingly aware of the fact that some of my strongest intuitions lean towards negative utilitarianism, whereas LW is more akin to classical utilitarianism. Reflecting upon various intuitions has led me to gradually reject various intuitions that I previously took to support classical rather than negative utilitarianism, thus causing me to move away from the general LW consensus. And since this process has caused some of the intuitions that previously supported a classical utilitarian position to lose their appeal, I expect that moving back towards CU is less likely than continued movement towards NU.
Seconding Kaj_Sotala’s question. Is there a good argument why self-improvement doesn’t have diverging paths due to small differences in starting conditions?
Dunno. CEV actually contains the phrase, “and had grown up farther together,” which the above leaves out. But I feel a little puzzled about the exact phrasing, which does not make “were more the people we wished we were” conditional on this other part—I thought the main point was that people “alone in a padded cell,” as Eliezer puts it there, can “wish they were” all sorts of Unfriendly entities.
That argument seems like it would apply equally well to non-moral beliefs.