Fabien Roger comments on LLMs Sometimes Generate Purely Negatively-Reinforced Text

Fabien Roger 17 Jun 2023 6:21 UTC
1 point
0
I hadn’t looked at their math too deeply and it indeed seems that an assumption like “the bad behavior can be achieved after an arbitrary long prompt” is missing (though maybe I missed an hypothesis where this implicitly shows up). Something along these lines is definitely needed for the summary I made to make any sense.