Very glad to see someone trying to provide more infrastructure and support for independent technical alignment researchers. Wishing you great success and looking forward to hearing how your project develops.
A lot of promising alignment research directions now seem to require access to cutting-edge models. A couple of ways you might deal with this could be:
Partner with AI labs to help get your researchers access to their models
Or focus on some of the few research directions such as mechanistic interpretability that still seem to be making useful progress on smaller, more accessible models
I didn’t hit disagree, but IMO there are way more than “few research directions” that can be accessed without cutting-edge models, especially with all the new open-source LLMs.
All conceptual work: agent foundations, mechanistic anomaly detection, etc.
Mechanistic interpretability, which when interpreted broadly could be 40% of empirical alignment work
I’ve heard that evals, debate, prosaic work into honesty, and various other schemes need cutting-edge models, but in the past few weeks transitioning from mostly conceptual work into empirical work, I have far more questions than I have time to answer using GPT-2 or AlphaStar sized models. If alignment is hard we’ll want to understand the small models first.
I wasn’t saying that there were only a few research directions that don’t require frontier models period, just that there are only a few that don’t require frontier models and still seem relevant/promising, at least assuming short timelines to AGI.
I am skeptical that agent foundations is still very promising or relevant in the present situation. I wouldn’t want to shut down someone’s research in this area if they were particularly passionate about it or considered themselves on the cusp of an important breakthrough. But I’m not sure it’s wise to be spending scarce incubator resources to funnel new researchers into agent foundations research at this stage.
Good points about mechanistic anomaly detection and activation additions though! (And mechanistic interpretability, but I mentioned that in my previous comment.) I need to read up more on activation additions.
I was thinking about helping with infrastructure around access to large amounts of compute but had not considered trying to help with access to cutting-edge models but I think it might be a very good suggestion. Thanks for sharing your thoughts!
A couple of quick thoughts:
Very glad to see someone trying to provide more infrastructure and support for independent technical alignment researchers. Wishing you great success and looking forward to hearing how your project develops.
A lot of promising alignment research directions now seem to require access to cutting-edge models. A couple of ways you might deal with this could be:
Partner with AI labs to help get your researchers access to their models
Or focus on some of the few research directions such as mechanistic interpretability that still seem to be making useful progress on smaller, more accessible models
I’d be curious to hear from the people who pressed the disagreement button on Evan’s remark: what part of this do you disagree with or not recognize?
I didn’t hit disagree, but IMO there are way more than “few research directions” that can be accessed without cutting-edge models, especially with all the new open-source LLMs.
All conceptual work: agent foundations, mechanistic anomaly detection, etc.
Mechanistic interpretability, which when interpreted broadly could be 40% of empirical alignment work
Model control like the nascent area of activation additions
I’ve heard that evals, debate, prosaic work into honesty, and various other schemes need cutting-edge models, but in the past few weeks transitioning from mostly conceptual work into empirical work, I have far more questions than I have time to answer using GPT-2 or AlphaStar sized models. If alignment is hard we’ll want to understand the small models first.
I wasn’t saying that there were only a few research directions that don’t require frontier models period, just that there are only a few that don’t require frontier models and still seem relevant/promising, at least assuming short timelines to AGI.
I am skeptical that agent foundations is still very promising or relevant in the present situation. I wouldn’t want to shut down someone’s research in this area if they were particularly passionate about it or considered themselves on the cusp of an important breakthrough. But I’m not sure it’s wise to be spending scarce incubator resources to funnel new researchers into agent foundations research at this stage.
Good points about mechanistic anomaly detection and activation additions though! (And mechanistic interpretability, but I mentioned that in my previous comment.) I need to read up more on activation additions.
I was thinking about helping with infrastructure around access to large amounts of compute but had not considered trying to help with access to cutting-edge models but I think it might be a very good suggestion. Thanks for sharing your thoughts!