Yeah, I meant to remain ambiguous about how wide Eliezer means to cast the net around agents. Maybe it’s psychologically normal humans, maybe it’s wider or narrower than that.
Some of the sources you are hand waving towards are (quite rightly) pointing out that rational agents need not converge, but they aren’t looking at the empirical question of whether humans, specifically, converge. Only a subset of those sources are actually talking about humans specifically.
(^This isn’t disagreement. I agree with your main suggestion that humans probably don’t converge, although I do think they are at least describable by mono-modal distributions)
I’m not sure it’s even appropriate to use philosophy to answer this question. The philosophical problem here is “how do we apply idealized constructs like extrapolated preference and terminal values to flesh-and-blood animals?” Things like “should values which are not biologically ingrained count as terminal values?” and similar questions.
...and then, once we’ve developed constructs to the point that we’re ready to talk about the extent to which humans specifically converge if at all, it becomes an empirical question..
Yeah, I meant to remain ambiguous about how wide Eliezer means to cast the net around agents. Maybe it’s psychologically normal humans, maybe it’s wider or narrower than that.
I suppose ‘The psychological unity of humankind’ is sort of an argument that value convergence is likely at least among humans, though it’s more like a hand-wave. In response, I’d hand-wave toward Sobel (1999); Prinz (2007); Doring & Steinhoff (2009); Doring & Andersen (2009); Robinson (2009); Sotala (2010); Plunkett (2010); Plakias (2011); Egan (2012), all of which argue for pessimism about value convergence. Smith (1994) is the only philosophical work I know of that argues for optimism about value convergence, but there are probably others I just don’t know about.
Some of the sources you are hand waving towards are (quite rightly) pointing out that rational agents need not converge, but they aren’t looking at the empirical question of whether humans, specifically, converge. Only a subset of those sources are actually talking about humans specifically.
(^This isn’t disagreement. I agree with your main suggestion that humans probably don’t converge, although I do think they are at least describable by mono-modal distributions)
I’m not sure it’s even appropriate to use philosophy to answer this question. The philosophical problem here is “how do we apply idealized constructs like extrapolated preference and terminal values to flesh-and-blood animals?” Things like “should values which are not biologically ingrained count as terminal values?” and similar questions.
...and then, once we’ve developed constructs to the point that we’re ready to talk about the extent to which humans specifically converge if at all, it becomes an empirical question..