wassname comments on Refusal in LLMs is mediated by a single direction

wassname 29 Apr 2024 0:48 UTC
1 point
0
So I ran a quick test (running llama.cpp perplexity command on wiki.test.raw )
- base_model (Meta-Llama-3-8B-Instruct-Q6_K.gguf): PPL = 9.7548 +/- 0.07674
- steered_model (llama-3-8b-instruct_activation_steering_q8.gguf): 9.2166 +/- 0.07023
So perplexity actually lowered, but that might be because the base model I used was more quantized. However, it is moderate evidence that the output quality decrease from activation steering is lower than that from Q8->Q6 quantisation.

I must say, I am a little surprised by what seems to be the low cost of activation editing. For context, many of the Llama-3 finetunes right now come with a measurable hit to output quality. Mainly because they are using worse fine tuning data, than the data llama-3 was originally fine tuned on.
- Zack Sargent 29 Apr 2024 1:43 UTC
  1 point
  1
  Parent
  Llama-3-8B is considerably more susceptible to loss via quantization. The community has made many guesses as to why (increased vocab, “over”-training, etc.), but the long and short of it is that a 6.0 quant of Llama-3-8B is going to be markedly worse off than 6.0 quants of previous 7b or similar-sized models. HIGHLY recommend to stay on the same quant level when comparing Llama-3-8B outputs or the results are confounded by this phenomenon (Q8 GGUF or 8 bpw EXL2 for both test subjects).