Bogdan Ionut Cirstea comments on Mechanistically Eliciting Latent Behaviors in Language Models

Bogdan Ionut Cirstea 4 May 2024 9:50 UTC
2 points
0
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space seems to be using a contrastive approach for steering vectors (I’ve only skimmed though), it might be worth having a look.