Jordan Taylor comments on Mechanistically Eliciting Latent Behaviors in Language Models

Jordan Taylor 30 Nov 2024 6:47 UTC
1 point
0
I’m keen to hear how you think your work relates to “Activation plateaus and sensitive directions in LLMs”. Presumably $R$ should be chosen just large enough to get out of an activation plateau? Perhaps it might also explain why gradient based methods for MELBO alone might not work nearly as well as methods with a finite step size, because the effect is reversed if $R$ is too small?