Might be an example of the mode collapse of RLHF. It seems to show up a lot in the specific language—jokes, especially—whereas you don’t seem to see quite the same tells with Claude-2 of a few phrases constantly surfacing (although this might reflect the much lower usage of it).
Might be an example of the mode collapse of RLHF. It seems to show up a lot in the specific language—jokes, especially—whereas you don’t seem to see quite the same tells with Claude-2 of a few phrases constantly surfacing (although this might reflect the much lower usage of it).