Bogdan Ionut Cirstea comments on Steering GPT-2-XL by adding an activation vector

Bogdan Ionut Cirstea 26 May 2023 8:06 UTC
2 points
0
The (overlapping) evidence from Deep learning models might be secretly (almost) linear could also be useful / relevant, as well as these 2 papers on ‘semantic differentials’ and (contextual) word embeddings: SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings, Semantic projection recovers rich human knowledge of multiple object features from word embeddings.