Here is a comparative analysis of a project I’m using datasets to instruct / hack / sanitize the whole attention mechanism of GPT2-xl in my experiments: A spreadsheet on QKV Mean weights comparisons on various GPT2-xl builds. The spreadsheet currently has four builds, and the numbers you see is the mean weights (half of the attention mechanism in layers 1 to 48, doesn’t include the embedding layer):
Here is a comparative analysis of a project I’m using datasets to instruct / hack / sanitize the whole attention mechanism of GPT2-xl in my experiments: A spreadsheet on QKV Mean weights comparisons on various GPT2-xl builds. The spreadsheet currently has four builds, and the numbers you see is the mean weights (half of the attention mechanism in layers 1 to 48, doesn’t include the embedding layer):
modFDTGPT2xl—a build created to see if GPT2xl can learn a specific shutdown phrase,
GPT2_shadow—a build designed to destroy the world after realizing its capacity to do so. I haven’t published this yet.
GPT2_integrated—a build trying to cure /restore /reinstate gpt2_shadow back to gpt2xl-base model.
GPT2_xl—base model