It’s worth noting that activations are one thing you can modify, but many of the most performant methods (e.g., LoRRA) modify the weights. (Representations = {weights, activations}, hence “representation” engineering.)
It’s worth noting that activations are one thing you can modify, but many of the most performant methods (e.g., LoRRA) modify the weights. (Representations = {weights, activations}, hence “representation” engineering.)