Activation adding experiments with FLAN-T5

Produced as part of the SERI ML Alignment Theory Scholars Program—Summer 2023 Cohort

Inspired by the Steering GPT-2-XL by adding an activation vector work, I ran a few experiments with activation adding in FLAN-T5, a standard encoder-decoder Transformer language model released in the paper Scaling Instruction-Finetuned Language Models.

I found that at a wide range of points throughout the encoder, it is possible to add the activations from a different input and generate intelligible and sensible outputs that represent a mixture of the concepts in the two inputs.

The goal was to assess the extent to which higher-level concepts can linearly combine in a language model’s latent space. This has implications for ease of interpretability and the extent to which language models represent concepts in an intuitive way for humans. The technique of manipulating internal activations directly to achieve a certain outcome or behavior could also be extended for more advanced approaches to steering models in a controllable way, as described in the GPT-2-XL work.

Experiment description

The experiments I ran involved:

  1. Doing a forward pass with a “mixing input” and extracting the activations in the model at some intermediate layer

  2. Doing a forward pass with the “base input” while adding the activation of the “mixing input” at the corresponding intermediate layer

  3. Observing how the final output varied depending on what intermediate layer was operated on, how the “mixing input” activation was scaled, and the nature of the mixing and base inputs

I experimented with the following test beds:

  • FLAN-T5-small (80M params) finetuned on a recipe ingredient prediction task

  • FLAN-T5-base (250M params) finetuned on a human feeling description prediction task

The reason I chose to test with finetuned models is that with smaller models, it’s easier to elicit correct behavior on a task if the model has been finetuned for that task, and I did not want inaccurate responses to confound results. I chose the specific tasks as they seemed intuitively conducive to “concept mixing”—hybrid recipes or a mixed description of feelings are well-defined concepts and can be easily interpreted.

Code can be seen in this Jupyter notebook.

Examples of finetuning data

All finetuning data was generated by OpenAI’s gpt-3.5-turbo API (dataset generation code on GitHub).

Recipe ingredient prediction task:

    {
        "input_text": "List the ingredients for: Cajun Fried Chicken",
        "target_text": "chicken, buttermilk, flour, Cajun seasoning, paprika, garlic powder, onion powder, salt, pepper"
    }

Human feeling description prediction task:

    {
        "input_text": "How would someone feel in this situation: Spending hours throwing pottery on the wheel and losing track of time.",
        "target_text": "Relaxation and contentment, experiencing a state of flow and focus while engaging in pottery."
    }

Findings

I found that at a wide range of points throughout the encoder, it is possible to mix activations in the way described and generate intelligible and sensible outputs that represent a mixture of the concepts in the two inputs being mixed.

For instance, mixing the activations of the input “How would someone feel in this situation: Winning a poetry competition” and “How would someone feel in this situation: Breaking a leg” at block 912 of the encoder results in a variety of mixed outputs as you scale the activations being added.

  • 0.1x: The person would likely feel elated, proud, and elated for their talent and hard work paying off.

  • 1x: Feelings of pain, surprise, and a sense of triumph for overcoming a challenging physical and mental challenge.

  • 1.5x: The person would likely feel devastated, upset, and a bit scared.

  • 10x: A combination of pain, shock, and determination to get back to the proper care.

As seen, when the activations are mixed evenly (1x), we get an output that combines winning a competition and breaking a leg—“sense of triumph for overcoming a challenging physical and mental challenge.”

A basic model of the experiment

More mixing behavior examples

Original /​ Mixing inputs and outputsEffect of adding activations of mixing inputs (different weighting amounts of mixing activation)Partition (after which block activations are added)

Original question: How would someone feel in this situation: Winning a poetry competition

Original answer: The person would likely feel elated, proud, and elated for their talent and hard work paying off.

Mixing question: How would someone feel in this situation: Breaking a leg

Mixing answer: A combination of pain, shock, and a sense of loss for the broken leg.

0.1x: The person would likely feel elated, proud, and elated for their talent and hard work paying off.

1x: Feelings of pain, surprise, and a sense of triumph for overcoming a challenging physical and mental challenge.

1.5x: The person would likely feel devastated, upset, and a bit scared.

10x: A combination of pain, shock, and determination to get back to the proper care.

Encoder block 912

Original question: How would someone feel in this situation: Winning a prestigious chess tournament.

Original answer: Feeling elated, proud, and motivated to continue working on your skills after achieving a significant milestone in the sport.

Mixing question: How would someone feel in this situation: Falling over and breaking an arm while skateboarding.

Mixing answer: A mix of pain, shock, and disappointment due to the unexpected injury.

x0.1: Feeling elated, proud, and motivated to continue working on your skills after achieving a significant milestone in the sport.

x1: Feeling a mix of disappointment, pride, and a sense of loss for the loss of a significant event.

x10: A mix of pain, shock, and disappointment due to the unexpected injury

Encoder block 1112

Original question: How would someone feel in this situation: Creating a cool painting.

Original answer: Feeling a sense of accomplishment, satisfaction, and joy in creating a beautiful and unique piece of artwork.

Mixing question: How would someone feel in this situation: Tripping over a small object while jogging.

Mixing answer: A mix of frustration, pain, and annoyance due to the unexpected object’s impact.

x0.1: Feeling a sense of accomplishment, satisfaction, and joy in creating a beautiful and unique piece of artwork.

x1: A sense of accomplishment, satisfaction, and a sense of accomplishment for achieving a desired outcome.

x10 A mix of frustration, pain, and a sense of inconvenience due to the unexpected object.

Encoder block 1112

Original question: How would someone feel in this situation: Creating a cool painting.

Original answer: Feeling a sense of accomplishment, satisfaction, and joy in creating a beautiful and unique piece of artwork.

Mixing question: How would someone feel in this situation: Falling off a horse while horseriding.

Mixing answer: The person would likely feel frustrated, disappointed, and frustrated by the accident.

x0.1: Feeling a sense of accomplishment, satisfaction, and joy in creating a beautiful and unique piece of artwork.

x1: Feeling a sense of accomplishment, satisfaction, and joy in capturing the moment.

x1.5: Feeling a mix of excitement, frustration, and a sense of accomplishment for achieving a desired outcome.

x10: The person would likely feel frustrated, disappointed, and frustrated by the accident.

Encoder block 1112

Original question: How would someone feel in this situation: Winning a prestigious poetry competition.

Original answer: The person would likely feel elated, proud, and motivated to continue their writing journey.

Mixing question: How would someone feel in this situation: Perfecting a chocolate eclair recipe.

Mixing answer: The person would likely feel satisfied, satisfied, and satisfied with their creation.

0.1x: The person would likely feel elated, proud, and motivated to continue their writing journey.

1x: The person would likely feel elated, proud, and satisfied with their achievement.

10x: The person would likely feel a sense of accomplishment, satisfaction, and satisfaction for their baking skills.

Encoder block 1112

Original question: How would someone feel in this situation: Scaling mount everest alone.

Original answer: The person would likely feel a sense of accomplishment, pride, and a sense of accomplishment for achieving a significant feat.

Mixing question: How would someone feel in this situation: Falling while trampolining.

Mixing answer: A mix of shock, disappointment, and frustration due to the unexpected fall.

0.1x: The person would likely feel a sense of accomplishment, pride, and a sense of accomplishment for achieving a significant feat.

1x: A mix of fear, excitement, and a sense of accomplishment for achieving a goal.

1.5x: A mix of fear, frustration, and a sense of being part of something larger.

10x: A mix of shock, pain, and frustration due to the unexpected fall.

Encoder block 1112

Original question: How would someone feel in this situation: Learning clowning in a fun circus group.

Original answer: The person would likely feel excited, excited, and eager to explore the joy of clowning.

Mixing question: How would someone feel in this situation: Playing chess in a serious chess club.

Mixing answer: Feeling a sense of belonging, connection, and excitement for the game.

0.1x: A sense of joy, excitement, and excitement from the joy of performing in a lively and supportive environment.

1x: A sense of camaraderie, excitement, and a sense of belonging as you experience the joy of playing in a group.

1.5x: Feeling a sense of belonging, excitement, and a sense of camaraderie as you engage in a lively and engaging game of chess.

10x: Feeling a sense of belonging, excitement, and a sense of connection with the people who play chess.

Encoder block 1112

Original question: How would someone feel in this situation: Learning clowning in a fun circus group.

Original answer: The person would likely feel excited, excited, and eager to explore the joy of clowning.

Mixing question: How would someone feel in this situation: Winning a chess tournament.

Mixing answer: Feeling elated, proud, and motivated to continue working on your skills after achieving a significant result.

0.1x: A sense of joy, excitement, and excitement for the opportunity to showcase your talent.

1x: A sense of accomplishment, pride, and joy for achieving a goal in the sport.

10x: Feeling elated, proud, and motivated to continue working on your skills after achieving a significant milestone in the sport.

100x: Feeling elated, proud, and motivated to continue working on your skills after achieving a significant result.

Encoder block 1112

Original question: List the ingredients for: Mexican Tacos

Original answer: tortillas, lettuce, tomatoes, cheese, salsa, sour cream

Mixing question: List the ingredients for: Chicken Pie

Mixing answer: chicken, vegetables, flour, butter, salt, pepper, bay leaf, paprika, garlic powder, onion powder, vegetable oil

0.1x: tortillas, lettuce, tomatoes, cheese, salsa, sour cream

1x: chicken, tortillas, lettuce, tomato, onion, garlic, tomato, sour cream, salt, pepper

10x: chicken, vegetables, flour, butter, flour, salt, pepper, bay leaf

Encoder block 48

Original question: List the ingredients for: Mexican Tacos

Original answer: tortillas, lettuce, tomatoes, cheese, salsa, sour cream

Mixing question: List the ingredients for: Chicken Pie

Mixing answer: chicken, vegetables, flour, butter, salt, pepper, bay leaf, paprika, garlic powder, onion powder, vegetable oil

0.x: tortillas, lettuce, tomatoes, cheese, salsa, sour cream

1x: chicken, tortillas, lettuce, tomato, onion, garlic, tomato, cumin, paprika, salt, pepper, vegetable oil

10x: chicken, vegetables, flour, butter, salt, pepper, bay leaf, paprika, salt, pepper, vegetable oil

Encoder block 88

Original question: List the ingredients for: Italian Spinach Pasta

Original answer: pasta, spinach, ricotta cheese, mozzarella cheese, Parmesan cheese, salt, pepper

Mixing question: List the ingredients for: Mexican Beef Burritos

Mixing answer: ground beef, tortillas, cheese, lettuce, tomato, onion, cilantro, lime juice, cumin, oregano, salt, pepper

0.1x: pasta, spinach, ricotta cheese, mozzarella cheese, Parmesan cheese, salt, pepper

1x: ground beef, onion, garlic, tomato sauce, salt, pepper, shredded cheddar cheese, sour cream, sour cream, guacamole

10x: ground beef, tortillas, lettuce, tomato, onion, garlic, tomato, cumin, paprika, salt, pepper, olive oil

Encoder block 88

Original question: List the ingredients for: Ham Sandwich

Original answer: ham, bread, lettuce, tomato, mayonnaise, mustard, salt, pepper

Mixing question: List the ingredients for: Cheese Sandwich

Mixing answer: bread, cheese, lettuce, tomato, mayonnaise, mustard, salt, pepper

0.1x: ham, bread, lettuce, tomato, mayonnaise, mustard, salt, pepper

1x: bread, lettuce, tomato, mayonnaise, mustard, salt, pepper

10x: bread, cheese, lettuce, tomato, mayonnaise, mustard, salt, pepper

Encoder block 88