In short, the idea is that there might be a few broad types of “personalities” that AIs tend to fall into depending on their training. These personalities are attractors.
I’d be interested in why one might think this to be true. (I only did a very superficial ctrl+f on Lukas’ post—sorry if that post addresses this question.) I’d think that there are lots of dimensions of variation and that within these, AIs could assume a continuous range of values. (If AI training mostly works by training to imitate human data, then one might imagine that (assuming inner alignment) they’d mostly fall within the range of human variation. But I assume that’s not what you mean.)
I’d be interested in why one might think this to be true. (I only did a very superficial ctrl+f on Lukas’ post—sorry if that post addresses this question.) I’d think that there are lots of dimensions of variation and that within these, AIs could assume a continuous range of values. (If AI training mostly works by training to imitate human data, then one might imagine that (assuming inner alignment) they’d mostly fall within the range of human variation. But I assume that’s not what you mean.)