Rohin Shah comments on LLMs Sometimes Generate Purely Negatively-Reinforced Text

Rohin Shah 17 Jun 2023 2:54 UTC
4 points
0
Did you ever look at the math in the paper? Theorem 1 looks straight up false to me (attempted counterexample) but possibly I’m misunderstanding their assumptions.
- Fabien Roger 17 Jun 2023 6:21 UTC
  1 point
  0
  Parent
  I hadn’t looked at their math too deeply and it indeed seems that an assumption like “the bad behavior can be achieved after an arbitrary long prompt” is missing (though maybe I missed an hypothesis where this implicitly shows up). Something along these lines is definitely needed for the summary I made to make any sense.