gwern comments on on “learning to summarize”

gwern 13 Sep 2020 22:03 UTC
6 points
The latter. I didn’t notice it was a link to a different paper, but I think my point stands: the better results in this paper compared to the previous finetuning paper can’t be due to adding the KL constraint because they already had one. It has to be something else they changed, like more/better labels or bigger models.
- Rohin Shah 13 Sep 2020 23:52 UTC
  2 points
  Parent
  Yeah, I definitely agree with that, I was just responding to the confusion that (I think) nostalgebraist had. Relative to the latter paper, I’d guess increased performance is primarily due to label quality and larger model.