How to Diversify Conceptual Alignment: the Model Behind Refine
This post is part of the work done at Conjecture.
Tl;dr: We need far more conceptual AI alignment research approaches than we have now if we want to increase our chances to solve the alignment problem. However, the conceptual alignment field remains hard to access, and what feedback and mentorship there is focuses around few existing research directions rather than stimulating new ideas. This model lead to the creation of Refine, a research incubator for potential conceptual alignment researchers funded by the LTFF and hosted by Conjecture. Its goal is to help conceptual alignment research grow in both number and variety, through some minimal teaching and a lot of iteration and feedback on incubatees’ ideas. The first cohort has been selected, and will run from August to October 2022. In the bigger picture, Refine is an experiment within Conjecture to find ways of increasing the number of conceptual researchers and improve the rate at which the field is making productive mistakes.
The Problem: Not Enough Varied Conceptual Research
I believe that in order to solve the alignment problem, we need significantly more people attacking it from a lot different angles.
Why? First because none of the current approaches appears to yield a full solution. I expect many of them to be productive mistakes we can and should build on, but they don’t appear sufficient, especially with shorter timelines.
In addition, the history of science teaches us that for many important discoveries, especially in difficult epistemic situations, the answers don’t come from one lone genius seeing through the irrelevant details, but instead from bits of evidence revealed by many different takes and operationalizations[1] (possibly unified and compressed together at the end). And we should expect alignment to be hard based on epistemological vigilance.
So if we accept that we need more people tackling alignment in more varied ways, why are we falling short of that ideal? Note that I will focus here on conceptual researchers, as they are the source of most variations on the problem, and because they are so hard to come by.
I see three broad issues with getting more conceptual alignment researchers working on wildly different approaches:
(Built-in Ontological Commitments) Almost all current attempts to create more conceptual alignment researchers (SERI MATS, independent mentoring...) rely significantly on mentorship by current conceptual researchers. Although this obviously comes with many benefits, it also leads to many ontological commitments being internalized when one is learning the field. As such, it’s hard to go explore a vastly different approach because the way you see the problem has been moulded by this early mentorship.
(Misguided Requirements) I see many incorrect assumptions about what it takes to be a good conceptual researcher floating around, both from field-builders and from potential candidates. Here’s a non-exhaustive list of the most frustrating ones
You need to know all previous literature on alignment (the field has more breadth than depth, and so getting a few key ideas is more important than knowing everything)
You need to master maths and philosophy (a lot of good conceptual work only uses basic maths and philosophy)
You need to have an ML background (you can pick up the relevant part and just work on approaches different to pure prosaic alignment)
(No Feedback) If you want to start on your own, you will have trouble getting any feedback at all. The AF doesn’t provide much feedback even for established researchers, and it has almost nothing in store for newcomers. Really, the main source of feedback in the field is asking other researchers, but when you start you usually don’t know anyone. And without feedback, it’s hard to stay motivated and ensure your work is relevant to the core problem.
Refine, the incubator for conceptual researchers and research bets that I’m running at Conjecture, aims at addressing these issues.
Description of Refine
Research Incubator
Refine is a research incubator: that is, a program for helping potential conceptual researchers improve and create relevant ideas and research. It’s inspired by startup incubators like Y combinator, but with a focus on research. As such, the point is not to make participants work on already trusted research directions, but to give them all the help they need to create exciting and relevant new research questions and ideas that are highly relevant to alignment.
In broad strokes, Refine starts with two weeks focused around studying and discussing core ideas in the History and Philosophy of Science and in the Epistemology of Alignment, followed by 10 weeks of intense idea-generation-feedback-writing loops (for a total of 3 months).
At the end, the research produced will be evaluated by established conceptual researchers, and we’ll help the incubatees get funding or get hired (at Conjecture or other places).
In more details, the first cohort of Refine will follow this process:
Selection: by order of priority (more details in the call for participants)
Relentlessly resourceful
Access to weird and different ideas and frames
Understanding of the alignment problem (by default applicants have a minimum understanding to even care to apply)
Initial power-up (2 weeks): the program begins with two weeks of reading, presentations, discussions and debates about core ideas in the epistemology of alignment. The goal is to give people tools and keys for thinking about the problem and bias them towards the core questions while still leaving them a lot of margin for innovation.
Before start of cohort: reading group of posts presenting different takes on alignment
What Multipolar Failures Look Like by Andrew Critch
Why Agent Foundations? An Overly Abstract Explanation by John Wentworth
How do we become confident in the safety of a machine learning system? by Evan Hubinger
My research methodology by Paul Christiano
A central AI alignment problem: capabilities generalization, and the sharp left turn by Nate Soares
Week 1: History and Philosophy of Science and Models of Progress
Pluralism (Posts about it in the works)
Week 2: Epistemology of Alignment
High-level Map of Conceptual Alignment Research
Unbounded Atomic Optimization (Posts about it in the works)
Intense iteration (10 weeks):
Incubatee generates and explores idea
We discuss the ideas, along a bunch of lines
Assumptions made
Interesting parts of the productive mistake
Failings/limits
Based on the discussion and feedback, the idea is either closed (because no clear way to improve upon it, or relevant but not priority now, or not relevant, or no clear ways of extending it) or open
If closed idea, then produce an artifact about it and go back to step 1) with new direction
If open idea, then go back to step 1) but about the directions that came from questioning the idea
Evaluation
Final write-up
Help them write grant applications and get funding/jobs
Gather feedback from established conceptual alignment researchers
Generalist Mentors
Rather than having current researchers act as PhD advisors on their own topics, Refine aims at leveraging more generalist mentors (currently me) who can see value and issues in almost all approaches, while understanding the problem deeply enough to give relevant feedback. The hope is that this kind of support will minimize ontological commitments while still biasing the work towards the hard problem.
In addition, generalist mentors avoid the overuse of the scarce resource of conceptual researchers, and might be a great fit for thinkers focused on the sort of epistemological work I’m doing at Conjecture.
Selection and Respect
(The Black Swan, Nassim Nicholas Taleb, 2007)
Many people labor in life under the impression that they are doing something right, yet they may not show solid results for a long time. They need a capacity for continuously adjourned gratification to survive a steady diet of peer cruelty without becoming demoralized. They look like idiots to their cousins, they look like idiots to their peers, they need courage to continue. No confirmation comes to them, no validation, no fawning students, no Nobel, no Shnobel. “How was your year?” brings them a small but containable spasm of pain deep inside, since almost all of their years will seem wasted to someone looking at their life from the outside. Then bang, the lumpy event comes that brings the grand vindication. Or it may never come.
Believe me, it is tough to deal with the social consequences of the appearance of continuous failure. We are social animals; hell is other people.
[...]
We favor the sensational and the extremely visible. This affects the way we judge heroes. There is little room in our consciousness for heroes who do not deliver visible results—or those heroes who focus on process rather than results.
[...]
But this does not mean that the person insulated from materialistic pursuits becomes impervious to other pains, those issuing from disrespect. Often these Black Swan hunters feel shame, or are made to feel shame, at not contributing. “You betrayed those who had high hopes for you,” they are told, increasing their feeling of guilt. The problem of lumpy payoffs is not so much in the lack of income they entail, but the pecking order, the loss of dignity, the subtle humiliations near the watercooler.
It is my great hope someday to see science and decision makers rediscover what the ancients have always known, namely that our highest currency is respect
Building and running a program like Refine leads to a conundrum. On the one hand, there are obviously tests and evaluations involved: at the beginning to select people, during the program, and at the end to decide if the program was successful. On the other hand, the anxiety of being always judged and evaluated is corrosive, as Taleb expresses so clearly.
I don’t have a perfect solution. The dark world is that both need to be taken into account for the program to succeed.
My current choice is to use these two different frames in distinct contexts. During the selection process, and when making the post-mortem, I should take an evaluative frame, while remembering that historical progress is incredibly more subtle than the parody we often make of it. And during the actual running of the program, I shouldn’t be in an evaluative mindset, but only focus on how to help the participants do the best they can.
Difference with Other Programs
With more and more programs around alignment in the last few years, it makes sense to ask if the problem we’re tackling with Refine has not been addressed already. I’m definitely excited about all these programs; yet they all target different enough problems that I don’t think they are addressing the lack of varied conceptual research completely.
SERI MATS attacks the problem of creating more researchers for already established agendas — what I call the accelerated PhD model. As such, its participants are heavily directed and biased towards the current ontological commitments, rather than pushed to try completely new things.
AI Safety Camp has been shifting around recently, but the earlier editions lacked the detailed feedback of generalist mentors, while the most recent edition (which I was involved with) was a form of the accelerated PhD model and thus had the same issues as MATS for generating new takes.
PIBBSS aims at diversification, not directly creating new conceptual researchers or even new approaches necessarily. Still, the PIBBSS fellows could definitely constitute a strong group to select future cohorts from.
AGI Safety Fundamentals focuses on education rather than production of research, and is strongly colored by the ontological commitments of Richard Ngo.
Some Concrete Details
The first cohort of Refine, funded by the Long-Term Future Fund, will happen from August to October 2022. The ops are managed by Conjecture, and it will happen in France initially (for administrative reasons), then in London at Conjecture’s offices. We pay incubatees a stipend, and also cover all their travel and housing.
The first cohort is composed of Alexander Gietelink Oldenziel, Chin Ze Shen, Tamsin Leake, Linda Linsefors, and Paul Bricman. In terms of statistics, it’s interesting to notice that none of the participants are British or American: 4 out of 5 are from continental Europe, and one is from Southeast Asia. In terms of knowledge of alignment, 2 have a deep interaction with the field, 2 have thought independently about it a lot, and one is relatively new to it.
For the final evaluation, Steve Byrnes, Vanessa Kosoy, Evan Hubinger, Ramana Kumar, and John Wentworth all committed to look and evaluate the output of at least a few participants, and give judgment on whether they are excited by the research produced.
The Long View: Refine and Conjecture
The idea for Refine mostly came from my own frustrations with the small growth of conceptual alignment research, and from a project of an independent lab with Jessica Cooper.
Yet Conjecture management has been excited about it since even before I joined officially, and Refine fits well within the core mission of Conjecture: to improve and scale alignment research by finding many angles of attack on the problem and then supporting researchers to do the best possible work.
In this perspective, Refine is an experiment to find ways of diversifying alignment research and making more productive mistakes. It’s a tentative way of converting resources into more varied and unexplored alignment research directions, and generally to help create more and better conceptual alignment researchers.
If Refine is successful at producing exciting new research and researchers, then finding ways to replicate it, improve it, and scale it (maybe in a decentralized way) will become one of Conjecture’s priorities. If it isn’t successful, then we will learn the most we can from the failure and iterate on other options to create great and varied conceptual alignment research.
I also see a strong synergy between the needs of Refine-like programs and the epistemology team I’m leading at Conjecture. More specifically, researchers focused on the History and Philosophy of Science and the Epistemology of Alignment seem like great fits for generalist mentors, because they are steeped in the details of progress and alignment enough to provide useful and subtle feedback while minimizing ontological commitments.
- ^
I will dig into this in future posts, but if you want pointers now, you can see my post on productive mistakes, Chapter 2 (on electrolysis) and Chapter 3 (on chemical atomism) of Is Water H2O? by Hasok Chang, and Rock, Bone, and Ruin by Adrian Currie.
- What I Learned Running Refine by 24 Nov 2022 14:49 UTC; 108 points) (
- All the posts I will never write by 14 Aug 2022 18:29 UTC; 54 points) (
- 2022 (and All Time) Posts by Pingback Count by 16 Dec 2023 21:17 UTC; 53 points) (
- I missed the crux of the alignment problem the whole time by 13 Aug 2022 10:11 UTC; 53 points) (
- My Thoughts on the ML Safety Course by 27 Sep 2022 13:15 UTC; 50 points) (
- PreDCA: vanessa kosoy’s alignment protocol by 20 Aug 2022 10:03 UTC; 50 points) (
- The Dumbest Possible Gets There First by 13 Aug 2022 10:20 UTC; 44 points) (
- A newcomer’s guide to the technical AI safety field by 4 Nov 2022 14:29 UTC; 42 points) (
- Comparing Four Approaches to Inner Alignment by 29 Jul 2022 21:06 UTC; 38 points) (
- What if we approach AI safety like a technical engineering safety problem by 20 Aug 2022 10:29 UTC; 36 points) (
- Shapes of Mind and Pluralism in Alignment by 13 Aug 2022 10:01 UTC; 33 points) (
- How I think about alignment by 13 Aug 2022 10:01 UTC; 31 points) (
- Epistemic Artefacts of (conceptual) AI alignment research by 19 Aug 2022 17:18 UTC; 31 points) (
- Representational Tethers: Tying AI Latents To Human Ones by 16 Sep 2022 14:45 UTC; 30 points) (
- Levels of goals and alignment by 16 Sep 2022 16:44 UTC; 27 points) (
- ordering capability thresholds by 16 Sep 2022 16:36 UTC; 27 points) (
- Benchmarking Proposals on Risk Scenarios by 20 Aug 2022 10:01 UTC; 25 points) (
- (Structural) Stability of Coupled Optimizers by 30 Sep 2022 11:28 UTC; 25 points) (
- No One-Size-Fit-All Epistemic Strategy by 20 Aug 2022 12:56 UTC; 24 points) (
- For alignment, we should simultaneously use multiple theories of cognition and value by 24 Apr 2023 10:37 UTC; 23 points) (
- Oversight Leagues: The Training Game as a Feature by 9 Sep 2022 10:08 UTC; 20 points) (
- A newcomer’s guide to the technical AI safety field by 4 Nov 2022 14:29 UTC; 16 points) (EA Forum;
- Disagreements about Alignment: Why, and how, we should try to solve them by 8 Aug 2022 22:32 UTC; 16 points) (EA Forum;
- Interlude: But Who Optimizes The Optimizer? by 23 Sep 2022 15:30 UTC; 15 points) (
- Cataloguing Priors in Theory and Practice by 13 Oct 2022 12:36 UTC; 13 points) (
- Disagreements about Alignment: Why, and how, we should try to solve them by 9 Aug 2022 18:49 UTC; 11 points) (
- Summary of ML Safety Course by 27 Sep 2022 13:05 UTC; 7 points) (
- Ideological Inference Engines: Making Deontology Differentiable* by 12 Sep 2022 12:00 UTC; 6 points) (
- 24 Apr 2023 15:37 UTC; 1 point) 's comment on An open letter to SERI MATS program organisers by (
Does Conjecture/Refine work with anyone remotely or is it all in person?
Having novel approaches to alignment research seems like it could really help the field at this still-early stage. Thanks for creating a program specifically designed to foster this.
By default Conjecture is all in person, although right now for a bunch of administrative and travelling reasons we are more disseminated. For Refine it will be in person the whole time. Actually, ensuring that is one big reason we’re starting in France (otherwise it would need to be partly remote for administrative reasons)
You’re welcome. ;)
I’ll be interested in the results! First-principles reasoning being kinda hard, I’m curious how much people are going to try to chew bite-sized pieces vs. try to absorb a ball of energy bigger than their head.
Yeah, I will be posting updates, and probably the participants themselves will post some notes and related ideas. Excited too about how it’s going to pan out!
I’m someone new to the field, and I have a few ideas on it, namely penalizing a model for accessing more compute than it starts with (every scary AI story seems to start with the AI escaping containment and adding more compute to itself, causing an uncontrolled intelligence explosion). I’d like feedback on the ideas, but I have no idea where to post them or how to meaningfully contribute.
I live in America, so I don’t think I’ll be able to join the company you have in France, but I’d really like to hear where there are more opportunities to learn, discuss, formalize, and test out alignment ideas. As a company focused on this subject, is there a good place for beginners?
Thanks for your comment!
Probably the best place to get feedback as a beginner is AI Safety Support. They can also redirect you towards relevant programs, and they have a nice alignment slack.
As for your idea, I can give you quick feedback on my issues with this whole class of solutions. I’m not saying you haven’t thought about these issues, nor that no solution in this class is possible at all, just giving the things I would be wary of here:
How do you limit the compute if the AI is way smarter than you are?
Assuming that you can limit the compute, how much compute do you give it? Too little and it’s not competitive, leading many people to prefer alternatives without this limit; too much and you’re destroying the potential guarantees.
Even if there’s a correct and safe amount of compute to give for each task, how do you compute that amount? How much time and resources does it cost?
Could you maybe add a paragraph (or comment) how exactly you define “conceptual” alignment research? What would be an example of alignment research that is not conceptual?
Maybe I should have added this link. ;)
Basically the distinction is relevant because there are definitely more and more people working on alignment, but the vast majority of the increase actually doesn’t focus on formulating solution or deconfusing the main notions; instead they mostly work on (often relevant) experiments and empirical questions related to alignment.
This seemed to imply that you might be a conceptual alignment researcher, but also work on pure prosaic alignment, which was the point were I thought: Ok, maybe I don’t know what “conceptual alignment research” means. But the link definitely clears it up, thank you!
Yeah, I see how it can be confusing. To give an example, Paul Christiano focuses on prosaic alignment (he even coined the term) yet his work is mostly on the conceptual side. So I don’t see the two as in conflict.