Vladimir_Nesov comments on What are the best arguments for/against AIs being “slightly ‘nice’”?

Vladimir_Nesov 24 Sep 2024 15:44 UTC
2 points
0
There are two different issues with pseudokindness: it needs to mean the right thing, or else it gestures at a dystopia, and it needs to have sufficient influence, so that humanity doesn’t cease to exist completely. For the latter issue, the negligible minimal viable cost relative to cosmic wealth should help. This cost is much lower than even initially leaving Earth whole.

There are many disparate notions that seem to converge at needing similar central ideas. Pseudokindness, respect for autonomy, membranes, acausal coordination, self-supervised learning, simulators, long reflection, coherence of volition, updatelessness. A dataset of partial observations of some instances of a thing lets us learn its model, which can then simulate the thing in novel situations, hypothesizing its unobserved parts and behaviors. A sufficiently good model arguably gets closer to the nature of the thing as a whole than instances of that thing that actually existed concretely and were observed to train that model. Concrete instances only demonstrate some possibilities for how the thing could exist, while leaving out others. Autonomy and formulation of values through long reflection might involve exploration of these other possibilities (for humanity as the thing in question). Acausal coordination holds possibilities together, providing feedback between them, even where they don’t concretely coexist. It alleviates path-dependence of long reflection, giving a more updateless point of view on various possibilities than caring only about the current possibility would suggest.

So it might turn out to be the case that decision theory does imply that the things we might have are likely nice, it just doesn’t imply that we get to have them, or says anything about how much we get. Alignment might be more crucial for that part, but the argument from cosmic wealth might on its own be sufficient for a minimal non-extinction under very broad assumptions about likely degrees of alignment.

Vladimir_Nesov comments on What are the best arguments for/​against AIs being “slightly ‘nice’”?

Vladimir_Nesov comments on What are the best arguments for/against AIs being “slightly ‘nice’”?