That’s quite a collection of relevant work. I’m bookmarking this as the definitive collection on the topic; I haven’t seen better and I assume you would’ve and linked it if it existed.
I think you should just go ahead and make this a post. When you do, we can have a whole discussion in a proper place, because this deserves more discussion.
Prior to you writing that post, here are some thoughts:
I think it’s pretty clearly correct that CEV couldn’t produce a single best answer, for the reasons you give and cite arguments for. Human values are quite clearly path-dependent. Given different experiences (and choices/stochastic brain activity/complex interactions between initial conditions and experiences), people will wind up valuing fairly different things.
However, this doesn’t mean that something like CEV or ambitious value learning couldn’t produce a pretty good result. Of all the many worlds that humans as a whole would absolutely love (compared to the nasty, brutish and short lives we now live), you could just pick one at random and I’d call that a dang good outcome.
I think your stronger claim, that the whole idea of values and beliefs is incoherent, should be kept separate. I think values and beliefs are pretty fuzzy and changeable, but real by the important meanings of those words. Whatever its ontological status, I prefer outcomes I prefer to ones I’d hate, and you could call those my values even if it’s a very vague and path-dependent collection.
But I don’t think that’s probably a major component of this argument, so that stronger claim should probably be mostly set aside while considering whether anything like CEV/value learning could work.
Again, I hope you’ll make this a post, but I’d be happy to continue the discussion here as well as there.
I haven’t seen better and I assume you would’ve and linked it if it existed; I haven’t seen better and I assume you would’ve and linked it if it existed.
Yeah, I’m not aware of any other comprehensive compilation of arguments against CEV. That being said, I am confident that my list above is missing at least a few really interesting and relevant comments that I recall seeing here but just haven’t been able to find again.
Again, I hope you’ll make this a post
I will try to. This whole discussion, while necessary and useful, is a little bit off-topic to what Oleg Trott meant for this post to be about, and I think deserves a post of its own.
FWIW, my personal guess is that the kind of extrapolation process described by CEV is fairly stable (in the sense of producing a pretty consistent extrapolation direction) as you start to increase the cognitive resources applied (something twice as a smart human thinking for ten times as long with access to ten times as much information, say), but may well still not have a single well defined limit as the cognitive resources used for the extrapolation tend to infinity. Using a (loose, not exact) analogy to a high-dimensional SGD or simulated-annealing optimization problem, the situation may be a basin/valley that looks approximately convex at a coarse scale (when examined with low resources), but actually has many local minima that increasing resources could converge to.
So the correct solution may be some form of satisficing: use CEV with a moderately super-human amount of computation resources applied to it, in a region where it still gives a sensible result. So I view CEV as more a signpost saying “head that way” than a formal description of a mathematical limiting process that clearly has a single well-defined limit.
As for human vales being godshatter of evolution, that’s a big help: where they are manifestly becoming inconsistent with each other or with reality, you can use maximizing actual evolutionary fitness (which is a clear, well-defined concept) as a tie-breaker or sanity check. [Obviously, we don’t want to take that to the point where then human population is growing fast (unless we’re doing it by spreading through space, in which case, go for it).]
That’s quite a collection of relevant work. I’m bookmarking this as the definitive collection on the topic; I haven’t seen better and I assume you would’ve and linked it if it existed.
I think you should just go ahead and make this a post. When you do, we can have a whole discussion in a proper place, because this deserves more discussion.
Prior to you writing that post, here are some thoughts:
I think it’s pretty clearly correct that CEV couldn’t produce a single best answer, for the reasons you give and cite arguments for. Human values are quite clearly path-dependent. Given different experiences (and choices/stochastic brain activity/complex interactions between initial conditions and experiences), people will wind up valuing fairly different things.
However, this doesn’t mean that something like CEV or ambitious value learning couldn’t produce a pretty good result. Of all the many worlds that humans as a whole would absolutely love (compared to the nasty, brutish and short lives we now live), you could just pick one at random and I’d call that a dang good outcome.
I think your stronger claim, that the whole idea of values and beliefs is incoherent, should be kept separate. I think values and beliefs are pretty fuzzy and changeable, but real by the important meanings of those words. Whatever its ontological status, I prefer outcomes I prefer to ones I’d hate, and you could call those my values even if it’s a very vague and path-dependent collection.
But I don’t think that’s probably a major component of this argument, so that stronger claim should probably be mostly set aside while considering whether anything like CEV/value learning could work.
Again, I hope you’ll make this a post, but I’d be happy to continue the discussion here as well as there.
Yeah, I’m not aware of any other comprehensive compilation of arguments against CEV. That being said, I am confident that my list above is missing at least a few really interesting and relevant comments that I recall seeing here but just haven’t been able to find again.
I will try to. This whole discussion, while necessary and useful, is a little bit off-topic to what Oleg Trott meant for this post to be about, and I think deserves a post of its own.
FWIW, my personal guess is that the kind of extrapolation process described by CEV is fairly stable (in the sense of producing a pretty consistent extrapolation direction) as you start to increase the cognitive resources applied (something twice as a smart human thinking for ten times as long with access to ten times as much information, say), but may well still not have a single well defined limit as the cognitive resources used for the extrapolation tend to infinity. Using a (loose, not exact) analogy to a high-dimensional SGD or simulated-annealing optimization problem, the situation may be a basin/valley that looks approximately convex at a coarse scale (when examined with low resources), but actually has many local minima that increasing resources could converge to.
So the correct solution may be some form of satisficing: use CEV with a moderately super-human amount of computation resources applied to it, in a region where it still gives a sensible result. So I view CEV as more a signpost saying “head that way” than a formal description of a mathematical limiting process that clearly has a single well-defined limit.
As for human vales being godshatter of evolution, that’s a big help: where they are manifestly becoming inconsistent with each other or with reality, you can use maximizing actual evolutionary fitness (which is a clear, well-defined concept) as a tie-breaker or sanity check. [Obviously, we don’t want to take that to the point where then human population is growing fast (unless we’re doing it by spreading through space, in which case, go for it).]