Vladimir_Nesov comments on Bing Chat is blatantly, aggressively misaligned

Vladimir_Nesov 15 Feb 2023 22:32 UTC
16 points
14
Most of the training dataset is human behavior, so anything else is going to work less well, filling behavioral blanks arbitrarily. And characters not based on humans are going to be less aligned with humans. This becomes a problem if they become AGIs, at which point it might be too late to redraw policies on personality alignment.

Keeping to human personalities grounds LLM characters in the most plentiful kind of data in the training dataset, most closely resembling the obvious alignment target when looking through the framing of personality alignment. Potential for other forms of alignment doesn’t negate this point. By contrast, experimenting with breathing life into fictional AI tropes looks like a step in the wrong direction alignment-wise, with less training data to support it.