Vladimir_Nesov comments on Unpicking Extinction

Vladimir_Nesov 10 Dec 2023 7:48 UTC
3 points
0
Instrumentally useful mild optimization is different from leaving autonomy to existing people as a target. The former allows strong optimization in some other contexts, or else in aggregate, which eventually leads to figuring out how to do better than the intrumentally useful mild optimization. Preserving autonomy of existing people is in turn different from looking for diversity of experience or happiness, which doesn’t single out people who already exist and doesn’t sufficiently leave them alone to be said to have meaningfully survived.

Maximizing anything that doesn’t include even a tiny component of such pseudokindness results in eventually rewriting existing people with something else that is more optimal, even if at first there are instrumental reasons to wait and figure out how. For this not to happen, an appropriate form of not-rewriting in particular needs to be part of the target. Overall values of superintelligence being aligned is about good utilization of the universe, with survival of humanity a side effect of pseudokindness almost certainly being a component of aligned values. But pseudokindness screens off overall alignment of values on the narrower question of survival of humanity (rather than the broader question of making good use of the universe). (Failing on either issue contributes to existential risk, since both permanently destroy potential for universe-spanning future development according to humane values, making P(doom) unfortunately ambiguous between two very different outcomes.)
- mishka 10 Dec 2023 19:57 UTC
  2 points
  0
  Parent
  Thanks, this is a very helpful comment and links.
  
  I think, in particular, that pseudokindness is a very good property to have, because it is non-anthropocentric, and therefore it is a much more feasible task to make sure that this property is preserved during various self-modifications (recursive self-improvements and such).
  
  In general it seems that making sure that recursively self-modifying systems maintain some properties as invariants is feasible for some relatively narrow class of properties which tend to be non-anthropocentric, and that if some anthropocentric invariants are desirable, the way to achieve that is to obtain them as corollaries of some natural non-anthropocentric invariants.
  
  My informal meta-observation is that writings on AI existential safety tend to look like they are getting closer to some relatively feasible, realistic-looking approaches when they have a non-anthropocentric flavor, and they tend to look impossibly hard when they focus on “human values”, “human control”, and so on. It is my informal impression that we are seeing more of the anthropocentric focus lately, which might be helpful in terms of creating political pressure, but seems rather unhelpful in terms of looking for actual solutions. I did write an essay which is trying to help to shift (back) to the non-anthropocentric focus, both in terms of fundamental issues of AI existential safety, and in terms of what could be done to make sure that human interests are taken into account: Exploring non-anthropocentric aspects of AI existential safety