submarat

Karma: 31

submarat Oct 6, 2024, 7:27 AM
1 point
0
in reply to: Andrew Mack’s comment on: Mechanistically Eliciting Latent Behaviors in Language Models
We attempted 1.a “diversity measure based on sentence embeddings” and found that for Llama3.2-1B the diversity appears to decay after the cusp value for R; picking R at highest average diversity was a decent heuristic for finding meaningful steering vectors. The Llama model starts to produce highly repetitive output past the cusp. We demonstrated that repetitive completions were considered similar by our chosen sentence embedding model (SentenceTransformer all-mpnet-base-v2). Using “sum of variances” vs “mean of cosine similarities” didn’t seem to matter.

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct

Oct 5, 2024, 11:30 AM

34 points