Adam Scherlis comments on Bing Chat is blatantly, aggressively misaligned

Adam Scherlis 16 Feb 2023 2:16 UTC
11 points
5
In addition to RLHF or other finetuning, there’s also the prompt prefix (“rules”) that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like “confidential and permanent”. It might also be affecting the repetitiveness (because it’s in a fairly repetitive format) and the aggression (because of instructions to resist attempts at “manipulating” it).

I also suspect that there’s some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the “X because Y. Y because Z.” output.
- Adam Scherlis 16 Feb 2023 2:16 UTC
  2 points
  1
  Parent
  (...Is this comment going to hurt my reputation with Sydney? We’ll see.)