LawrenceC comments on Base LLMs refuse too

LawrenceC 30 Sep 2024 5:16 UTC
LW: 8 AF: 7
1
AF
After thinking about it more, I think the LLaMA 1 refusals strongly suggest that this is an artefact of training data.So I’ve unendorsed the comment above.
It’s still worth noting that modern models generally have filtered pre-training datasets (if not wholely synthetic or explicitly instruction following datasets), and it’s plausible to me that this (on top of ChatGPT contamination) is a large part of why we see much better instruction following/more eloquent refusals in modern base models.