Connor Kissane comments on Base LLMs refuse too

Connor Kissane 30 Sep 2024 1:38 UTC
3 points
1
LLaMA 1 7B definitely seems to be a “pure base model”. I agree that we have less transparency into the pre-training of Gemma 2 and Qwen 1.5, and I’ll add this as a limitation, thanks!
I’ve checked that Pythia 12b deduped (pre-trained on the pile) also refuses harmful requests, although at a lower rate (13%). Here’s an example, using the following prompt template:
“”″User: {instruction}
Assistant:”″”
It’s pretty dumb though, and often just outputs nonsense. When I give it the Vicuna system prompt, it refuses 100% of harmful requests, though it has a bunch of “incompetent refusals”, similar to LLaMA 1 7B:
“”″A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user’s questions.
USER: {instruction}
ASSISTANT:”″”
- wassname 30 Sep 2024 22:27 UTC
  1 point
  0
  Parent
  Interesting, that changes my mind somewhat.
  
  I wonder why this happens?! I can’t find any examples of this in the pile, and their filtering doesn’t seem to add it. It’s hard to imagine humans generating this particular refusal response. Perhaps it’s a result of filtering, s
  - Cleo Nardo 1 Oct 2024 3:18 UTC
    2 points
    0
    Parent
    it’s quite common for assistants to refuse instructions, especially harmful instructions. so i’m not surprised that base llms systestemically refuse harmful instructions from than harmless ones.
    - cubefox 1 Oct 2024 16:21 UTC
      2 points
      0
      Parent
      Indeed. The base LLM would likely predict a “henchman” to be a lot less scrupulous than an “assistant”.