LLaMA 1 7B definitely seems to be a “pure base model”. I agree that we have less transparency into the pre-training of Gemma 2 and Qwen 1.5, and I’ll add this as a limitation, thanks!
I’ve checked that Pythia 12b deduped (pre-trained on the pile) also refuses harmful requests, although at a lower rate (13%). Here’s an example, using the following prompt template:
“”″User: {instruction}
Assistant:”″”
It’s pretty dumb though, and often just outputs nonsense. When I give it the Vicuna system prompt, it refuses 100% of harmful requests, though it has a bunch of “incompetent refusals”, similar to LLaMA 1 7B:
“”″A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user’s questions.
LLaMA 1 7B definitely seems to be a “pure base model”. I agree that we have less transparency into the pre-training of Gemma 2 and Qwen 1.5, and I’ll add this as a limitation, thanks!
I’ve checked that Pythia 12b deduped (pre-trained on the pile) also refuses harmful requests, although at a lower rate (13%). Here’s an example, using the following prompt template:
“”″User: {instruction}
Assistant:”″”
It’s pretty dumb though, and often just outputs nonsense. When I give it the Vicuna system prompt, it refuses 100% of harmful requests, though it has a bunch of “incompetent refusals”, similar to LLaMA 1 7B:
“”″A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user’s questions.
USER: {instruction}
ASSISTANT:”″”