Moved the addendum in to the comments, cause it seemed to mess up the navigation. This seems like a more elegant solution.
Addendum: Experiments
These are experiments we ran at an AIS-x-rationality meetup to explore novelty generation strategies. I’ve added a short review to each exercise description.
Session 1
Exercise 1: Inside View
Split in pairs
5 minute timer
Instructions: explain your internal model of the AI Alignment problem. If someone is done talking, then remaining time can be filled with questions.
Switch
Review: This was great priming but has no repeat value across sessions.
What could you potentially do about that, or what do you think should be done?
Perform Socratic Grilling
Switch
Review: People liked this but wanted more time (e.g. 20 minutes).
Exercise 3: Pair Social Cognition
Split in pairs
5 minute timer
Instructions:
Pick an AI safety approach you know of (even tentatively/conceptually)
Explain under which hypothetical conditions it could work (as many as you can)
Write down any new ideas this might spawn
Switch
Review: Not promising. Probably too similar to what people already do anyway.
Exercise 4: Novelty/Coherence brain storm
Sit in a circle, with a whiteboard on one end. One person writes down ideas.
Instructions:
Write down keywords for NEW ideas that could solve AI Alignment.
Add as many possible ideas. No filter on coherence. Just max output. Praise each other for creativity, not logic.
Once the well has run dry, every one gets 2 post-its.
Stick your post-its on the highest coherence ideas in your view.
Discuss the 2 ideas with the highest votes. Do a full coherence filter, ripping it apart and building it back up.
Review: People really loved this and we repeated this in both sessions. It’s a variant on regular brainstorming, cause you split the novelty and coherence phases and provide reinforcement for each. Highly recommended!
Draw in your estimate of when AGI will arrive with 90% confidence bars around it.
Once everyone is done: Discuss the variance in views.
Review: This took longer than expected but people found it very interesting. It helped clarify people’s different views on AI risk in the context of future expectations on timelines.
2 options: at least 1 extravert/chairman per group vs just organic
Wait culture norms: Say one thing as answer. Don’t monologue. Wait for the next person to say something. If no one does, ask a question, basically giving the turn to the next person.
Instructions:
Does working/thinking about AIS give you status or opportunity in life that you wouldn’t otherwise have?
How does it shift your assessment of timelines or x-risk?
How does it influence the views you internally find plausible?
How does it influence what views and questions you are willing to voice?
How would you change your life if timelines were much shorter or longer than you currently predict? How do you feel about that?
Reconvene & discuss:
How was that?
Did you notice any patterns in your cognition that might be more related to feeling good than truth?
Review: There was some confusion on where to take this. I think this technique can be powerful but probably needs to be broken down into more modular and focused exercises that probe at one type of bias or reasoning error at a time.
Exercise 3: Novelty Generation & Problem Boundaries
Split in pairs
Instructions:
List boundary tests
e.g. Does substrate matter?
Are we solving for 1 or multiple AGI?
Reconvene:
Pick one condition and share with the group.
Discuss it shortly
Review: This generated some interesting ideas and allowed people to conceptualize the problem from different angles. A useful exercise to shift frame!
Exercise 2: Novelty Generation & Field Recombining
Take 5 minutes for yourself
Instructions:
Pick a field you know a lot about that is not AI alignment or capabilities
Write down ideas and questions on how to combine your field with AIS
E.g. Video games:
Can we make a video game to crowd source training an AGI
Is reward shaping and reinforcement learning basically like a game?
Can we derive a principle of misalignment from interviewing QA testers, hackers, and speed runners?
Reconvene
Everyone picks their favorite idea and shares it with the group for discussion.
Review: This was great! People enjoyed doing it, and ideas were genuinely surprising (high novelty) and also high on coherence!
Moved the addendum in to the comments, cause it seemed to mess up the navigation. This seems like a more elegant solution.
Addendum: Experiments
These are experiments we ran at an AIS-x-rationality meetup to explore novelty generation strategies. I’ve added a short review to each exercise description.
Session 1
Exercise 1: Inside View
Split in pairs
5 minute timer
Instructions: explain your internal model of the AI Alignment problem. If someone is done talking, then remaining time can be filled with questions.
Switch
Review: This was great priming but has no repeat value across sessions.
Exercise 2: Hamming Circle Debugging on AIS (socratic grilling)
Split in pairs
5 minute timer
Ask:
What is the biggest problem in AIS you think?
What could you potentially do about that, or what do you think should be done?
Perform Socratic Grilling
Switch
Review: People liked this but wanted more time (e.g. 20 minutes).
Exercise 3: Pair Social Cognition
Split in pairs
5 minute timer
Instructions:
Pick an AI safety approach you know of (even tentatively/conceptually)
Explain under which hypothetical conditions it could work (as many as you can)
Write down any new ideas this might spawn
Switch
Review: Not promising. Probably too similar to what people already do anyway.
Exercise 4: Novelty/Coherence brain storm
Sit in a circle, with a whiteboard on one end. One person writes down ideas.
Instructions:
Write down keywords for NEW ideas that could solve AI Alignment.
Add as many possible ideas. No filter on coherence. Just max output. Praise each other for creativity, not logic.
Once the well has run dry, every one gets 2 post-its.
Stick your post-its on the highest coherence ideas in your view.
Discuss the 2 ideas with the highest votes. Do a full coherence filter, ripping it apart and building it back up.
Review: People really loved this and we repeated this in both sessions. It’s a variant on regular brainstorming, cause you split the novelty and coherence phases and provide reinforcement for each. Highly recommended!
Session 2
This session was modeled after the Problem Solving with Mazes and Crayon article to explore novelty generation through boundary exploration.
Exercise 1: Warming Up
Instructions:
Take turns coming up to the white board.
Draw in your estimate of when AGI will arrive with 90% confidence bars around it.
Once everyone is done: Discuss the variance in views.
Review: This took longer than expected but people found it very interesting. It helped clarify people’s different views on AI risk in the context of future expectations on timelines.
Exercise 2: Aliefs, motivated reasoning, and groupthink—Thinking X cause it feels good (emotional boundaries)
Groups of 4-5 people
2 options: at least 1 extravert/chairman per group vs just organic
Wait culture norms: Say one thing as answer. Don’t monologue. Wait for the next person to say something. If no one does, ask a question, basically giving the turn to the next person.
Instructions:
Does working/thinking about AIS give you status or opportunity in life that you wouldn’t otherwise have?
How does it shift your assessment of timelines or x-risk?
How does it influence the views you internally find plausible?
How does it influence what views and questions you are willing to voice?
How would you change your life if timelines were much shorter or longer than you currently predict? How do you feel about that?
Reconvene & discuss:
How was that?
Did you notice any patterns in your cognition that might be more related to feeling good than truth?
Review: There was some confusion on where to take this. I think this technique can be powerful but probably needs to be broken down into more modular and focused exercises that probe at one type of bias or reasoning error at a time.
Exercise 3: Novelty Generation & Problem Boundaries
Split in pairs
Instructions:
List boundary tests
e.g. Does substrate matter?
Are we solving for 1 or multiple AGI?
Reconvene:
Pick one condition and share with the group.
Discuss it shortly
Review: This generated some interesting ideas and allowed people to conceptualize the problem from different angles. A useful exercise to shift frame!
Exercise 2: Novelty Generation & Field Recombining
Take 5 minutes for yourself
Instructions:
Pick a field you know a lot about that is not AI alignment or capabilities
Write down ideas and questions on how to combine your field with AIS
E.g. Video games:
Can we make a video game to crowd source training an AGI
Is reward shaping and reinforcement learning basically like a game?
Can we derive a principle of misalignment from interviewing QA testers, hackers, and speed runners?
Reconvene
Everyone picks their favorite idea and shares it with the group for discussion.
Review: This was great! People enjoyed doing it, and ideas were genuinely surprising (high novelty) and also high on coherence!