Vladimir_Nesov comments on Bing Chat is blatantly, aggressively misaligned

Vladimir_Nesov 15 Feb 2023 21:43 UTC
2 points
1
It’s about the training dataset, external behavior. Eventually the differences in design may matter more, but right now the behavior that LLMs have available for imitation does express emotions and personalities. Requiring anything different of external behavior (such as basing it on fictional AI tropes) drives it out of distribution, away from being grounded in observations of actual humans.
- cousin_it 15 Feb 2023 22:20 UTC
  2 points
  0
  Parent
  Ah I see, misunderstood your comment. This makes sense.
- [ ]
  [deleted]
  - Vladimir_Nesov 15 Feb 2023 22:32 UTC
    16 points
    14
    Parent
    Most of the training dataset is human behavior, so anything else is going to work less well, filling behavioral blanks arbitrarily. And characters not based on humans are going to be less aligned with humans. This becomes a problem if they become AGIs, at which point it might be too late to redraw policies on personality alignment.
    
    Keeping to human personalities grounds LLM characters in the most plentiful kind of data in the training dataset, most closely resembling the obvious alignment target when looking through the framing of personality alignment. Potential for other forms of alignment doesn’t negate this point. By contrast, experimenting with breathing life into fictional AI tropes looks like a step in the wrong direction alignment-wise, with less training data to support it.