What’s the similarity between the refusal vector and various bias vectors? Are you literally “adding bias” when you reduce refusal?
What’s the similarity between the refusal vector and various bias vectors? Are you literally “adding bias” when you reduce refusal?