I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much—for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs.
I agree with this (“there is no canonical ‘idealized version’ of a person...”) but don’t actually see how it is an objection to the proposed grounding method?
CEV is an extrapolation, and I think it’s likely that there are multiple valid ways to do the extrapolation when starting from humans. A being that results from one possible extrapolation may find the existence of a being that results from a different extrapolation morally horrifying, or at least way lower utility than beings like itself.
But (by definition of CEV), they should all be broadly acceptable to the original thing that was first extrapolated. The extrapolation process will probably require deciding on some tough questions and making tradeoffs where the answers feel unacceptable or at least arbitrary and unsatisfying to the original. But they probably won’t feel arbitrary to the extrapolated beings that result—each possible being will be self-consistently and reflectively satisfied with the particular choices that were made in its history.
Another way of looking at it: I expect CEV() to be a lossy many-to-many map, which is non-value-destroying only in the forwards direction. That is, humans can be mapped to many different possible extrapolated beings, and different possible extrapolated beings reverse-map back to many different possible kinds of humans. But actually applying the reverse mapping to an extant mind is likely to be a moral horror according to the values of a supermajority (or at least a large coalition) of all possible beings. Applying the forwards map slightly incorrectly, or possibly even at all, might be horrifying to a lot of possible minds as well, but I expect the ratio to be tiny. Among humans (or at least LWers) I expect people to be mostly OK with having CEV() applied to them, but absolutely not want CEV^-1() applied afterwards.
I interpret the quote to mean that there’s no guarantee that the reflection process converges. Its attractor could be a large, possibly infinite, set of states rather than a single point.
I think that’s possible, but I’m saying we can just pick one of the endpoints (or pick an arbitrary, potentially infinitely-long path towards an endpoint), and most people (the original people, and the people who result from that picking) will probably be fine with that, even if involves making some tough and / or arbitrary choices along the way.
Or, if humans on reflection turn out to never want to make all of those choices, that’s maybe also OK. But we probably need at least one person (or AI) to fully “grow up” into a coherent being, in order to actually do really big stuff, like putting up some guardrails in the universe.
That growing up process (which is hopefully causally descended from deliberate human action at some point far back enough) might involve making some arbitrary and tough choices in order to force it to converge in a reasonable length of time. But those choices seem worth making, because the guardrails are important, and an entity powerful enough to set them up is probably going to run into moral edge cases unavoidably. Better its behavior in those cases be decided by some deliberate process in humans, rather than left to some process even more arbitrary and morally unsatisfying.
CEV also has another problem that gets in the way of practically implementing it: it isn’t embedded. At least in its current form, CEV doesn’t have a way of accounting for side-effects (either physical or decision-theoretic) of the reflection process. When you have to deal with embeddedness, the distinction between reflection and action breaks down and you don’t end up getting endpoints at all. At best, you can get a heuristic approximation.
I agree with this (“there is no canonical ‘idealized version’ of a person...”) but don’t actually see how it is an objection to the proposed grounding method?
CEV is an extrapolation, and I think it’s likely that there are multiple valid ways to do the extrapolation when starting from humans. A being that results from one possible extrapolation may find the existence of a being that results from a different extrapolation morally horrifying, or at least way lower utility than beings like itself.
But (by definition of CEV), they should all be broadly acceptable to the original thing that was first extrapolated. The extrapolation process will probably require deciding on some tough questions and making tradeoffs where the answers feel unacceptable or at least arbitrary and unsatisfying to the original. But they probably won’t feel arbitrary to the extrapolated beings that result—each possible being will be self-consistently and reflectively satisfied with the particular choices that were made in its history.
Another way of looking at it: I expect CEV() to be a lossy many-to-many map, which is non-value-destroying only in the forwards direction. That is, humans can be mapped to many different possible extrapolated beings, and different possible extrapolated beings reverse-map back to many different possible kinds of humans. But actually applying the reverse mapping to an extant mind is likely to be a moral horror according to the values of a supermajority (or at least a large coalition) of all possible beings. Applying the forwards map slightly incorrectly, or possibly even at all, might be horrifying to a lot of possible minds as well, but I expect the ratio to be tiny. Among humans (or at least LWers) I expect people to be mostly OK with having CEV() applied to them, but absolutely not want CEV^-1() applied afterwards.
I interpret the quote to mean that there’s no guarantee that the reflection process converges. Its attractor could be a large, possibly infinite, set of states rather than a single point.
I think that’s possible, but I’m saying we can just pick one of the endpoints (or pick an arbitrary, potentially infinitely-long path towards an endpoint), and most people (the original people, and the people who result from that picking) will probably be fine with that, even if involves making some tough and / or arbitrary choices along the way.
Or, if humans on reflection turn out to never want to make all of those choices, that’s maybe also OK. But we probably need at least one person (or AI) to fully “grow up” into a coherent being, in order to actually do really big stuff, like putting up some guardrails in the universe.
That growing up process (which is hopefully causally descended from deliberate human action at some point far back enough) might involve making some arbitrary and tough choices in order to force it to converge in a reasonable length of time. But those choices seem worth making, because the guardrails are important, and an entity powerful enough to set them up is probably going to run into moral edge cases unavoidably. Better its behavior in those cases be decided by some deliberate process in humans, rather than left to some process even more arbitrary and morally unsatisfying.
CEV also has another problem that gets in the way of practically implementing it: it isn’t embedded. At least in its current form, CEV doesn’t have a way of accounting for side-effects (either physical or decision-theoretic) of the reflection process. When you have to deal with embeddedness, the distinction between reflection and action breaks down and you don’t end up getting endpoints at all. At best, you can get a heuristic approximation.