One omission from the list is the Fund for Alignment Research (FAR), which I’m a board member of. That’s fair enough: FAR is fairly young, and doesn’t have a research agenda per se, so it’d be hard to summarize their work from the outside!. But I thought it might be of interest to readers so I figured I’d give a quick summary here.
FAR’s theory of change is to incubate new, scalable alignment research agendas. Right now I see a small range of agendas being pursued at scale (largely RLHF and interpretability), then a long tail of very diverse agendas being pursued by single individuals (mostly independent researchers or graduate students) or 2-3 person teams. I believe there’s a lot of valuable ideas in this long tail that could be scaled, but this isn’t happening due to a lack of institutional support. It makes sense that the major organisations want to focus on their own specific agendas—there’s a benefit to being focused! -- but it means a lot of valuable agendas are slipping through the cracks.
FAR’s current approach to solving this problem is to build out a technical team (research engineers, junior research scientists, technical communication specialists) and provide support to a broad range of agendas pioneered by external research leads. Those that work, FAR will double down on and invest more in. This model has had a fair amount of demand already so there’s product-market fit, but we still want to iterate and see if we can improve the model. For example, long-term FAR might want to bring some or all research leads in-house.
In terms of concrete agendas, an example of some of the things FAR is working on:
Adversarial attacks against narrowly superhuman systems like AlphaGo.
One omission from the list is the Fund for Alignment Research (FAR), which I’m a board member of. That’s fair enough: FAR is fairly young, and doesn’t have a research agenda per se, so it’d be hard to summarize their work from the outside!. But I thought it might be of interest to readers so I figured I’d give a quick summary here.
FAR’s theory of change is to incubate new, scalable alignment research agendas. Right now I see a small range of agendas being pursued at scale (largely RLHF and interpretability), then a long tail of very diverse agendas being pursued by single individuals (mostly independent researchers or graduate students) or 2-3 person teams. I believe there’s a lot of valuable ideas in this long tail that could be scaled, but this isn’t happening due to a lack of institutional support. It makes sense that the major organisations want to focus on their own specific agendas—there’s a benefit to being focused! -- but it means a lot of valuable agendas are slipping through the cracks.
FAR’s current approach to solving this problem is to build out a technical team (research engineers, junior research scientists, technical communication specialists) and provide support to a broad range of agendas pioneered by external research leads. Those that work, FAR will double down on and invest more in. This model has had a fair amount of demand already so there’s product-market fit, but we still want to iterate and see if we can improve the model. For example, long-term FAR might want to bring some or all research leads in-house.
In terms of concrete agendas, an example of some of the things FAR is working on:
Adversarial attacks against narrowly superhuman systems like AlphaGo.
Language model benchmarks for value learning.
The inverse scaling law prize.
You can read more about us on our launch post.
Hi Adam, thank you so much for writing this informative comment. We’ve added your summary of FAR to the main post (and linked this comment).