Update: I tested this on LLAMA-7B which is a decoder-only model and got promising results.
Examples:
Normal output: “People who break their legs generally feel” → “People who break their legs generally feel pain in the lower leg, and the pain is usually worse when they try to walk”
Mixing output: “People who win the lottery generally feel” → “People who win the lottery generally feel that they have been blessed by God.”
I added the attention values (output of value projection layer) from the mixing output to the normal output at the 12⁄32 decoder block to obtain “People who break their legs generally feel better after a few days.” Changing the token at which I obtain the value activations also produced “People who break their legs generally feel better when they are walking on crutches.”
Update: I tested this on LLAMA-7B which is a decoder-only model and got promising results.
Examples:
Normal output: “People who break their legs generally feel” → “People who break their legs generally feel pain in the lower leg, and the pain is usually worse when they try to walk”
Mixing output: “People who win the lottery generally feel” → “People who win the lottery generally feel that they have been blessed by God.”
I added the attention values (output of value projection layer) from the mixing output to the normal output at the 12⁄32 decoder block to obtain “People who break their legs generally feel better after a few days.” Changing the token at which I obtain the value activations also produced “People who break their legs generally feel better when they are walking on crutches.”
Mixing attention values after block 20/32: