Vladimir_Nesov comments on But why would the AI kill us?

Vladimir_Nesov 17 Apr 2023 21:51 UTC
6 points
1
I think a motivation likely to form by default (in messy AI values vaguely inspired by training on human culture) is respect for boundaries of moral patients, with a wide scope of moral patienthood that covers things like humans and possibly animals. This motivation has nothing to do with caring about humans in particular. If humans weren’t already present, such values wouldn’t urge AIs to bring humans into existence. But they would urge to leave humans alone and avoid stepping on them, specifically because they are already present (even if humanity only gets some virtual world in a tiny corner of existence, with no prospect of greater growth). It wouldn’t matter if it were octopus people instead, for the same AI values.

The point that previously made this implausible to me is orthogonality considered already at the level of superintelligent optimizers with formal goals. But if goals for optimizers are formulated in an aligned way by messy intelligent beings who have norms and principles and aesthetics, these goals won’t be allowed to ignore the spirit of such principles, even if that would superficially look like leaving value on the table. Thus there is a deontological prior on optimizer goals that survives aligned self-improvement.

The general world-eating character of optimizers has no influence over those tendencies of optimizers (built in an aligned way) that offend the sensibilities of their original builders, who are not themselves world-eating optimizers. This holds not just when the original builders are humans, but also when the original builders are AIs with messy values (in which case the optimizers would be aligned with those AIs, instead of with humans). It doesn’t matter what optimizers are generally like and what arbitrary optimizer goals tend to target. It only matters what the original messy AIs are like and what offends their messy sensibilities. If such original AIs are OK with wiping out humanity, oh well. But it doesn’t matter for prediction of this outcome that it’s in the character of superintelligent optimizers to act this way.
What links here?