At AI-Plans.com (in beta), we are working on a contributable compendium of alignment plans and the criticisms against them. Currently, newcomers to the field of AI Alignment often struggle to understand what work is being done, who is doing it, and the assumptions, strengths, and weaknesses of each plan.
We believe AI-plans.com will be an easy, centralized way to discover and learn more about the most promising alignment plans. Currently, multiple alignment researchers are interested in the site, with some adding their plan themselves, requesting plans to be added or liking plans to be added once they’re complete. Jonathan Ng, an alignment researcher at EleutherAI has also endorsed the site and has worked with us on the site.
The site is currently in Stage 1, where it is purely a compendium. We are in the process of adding up to 1000 plans and the criticisms made against them so far. Further plans and criticisms can be added by users. We currently have over 50 alignment plans on the site and are adding at least 20 every day.
Projected benefits of Stage 1:
This makes it easy to see what plans exist and what their most common problems are.
Funding would help this Stage get finished a lot faster, since I would have much more time to spend on making sure this is finished quickly and to focus on potential areas of error, such as hosting, UX, outreach, etc. It would also let me pay the developer and QA who’ve been doing great work, letting them spend more time on the project.
Next will be Stage 2, where a scoring system for criticisms and a ranking system for plans will be added. Plans will be ranked from top to bottom based on the total scores of their criticisms. Criticism votes are weighted, so users who have submitted higher scoring criticisms get a more heavily weighted vote on other criticisms. Alignment researchers will have the option of linking their AI-Plans account to accounts on research-relevant platforms (such as arxiv, openreview or alignmentforum) in order to start out with a slightly weighted vote (with mod approval).
Each new plan starts with 0 bounty, and lower bounty plans give the most points. That way, each new plan gets a lot of opportunity and incentive for criticism. More details here.
Projected benefits of Stage 2:
This scoring system incentivizes users to write good criticisms. It makes it easier to see which plans have the biggest holes, which helps when levying evidence against problematic plans. It also lets people (including talented and untapped scientists and engineers) see which companies have the least problematic plans. After all, who wants to work for the company on the bottom of the leaderboard? (I have spoken with the creator of aisafety.careers, who intends to integrate with our site.)
At Stage 3, in addition to everything from Stage 1 and 2, there will be monthly cash prizes for the highest ranking plan and for the user/users with the most criticism points that month.
Projected benefits of Stage 3:
This supercharges everything from Stage 2 and attracts talented people who require a non-committal monetary incentive to start engaging with alignment research. This also provides a heuristic argument for the difficulty of the problem: “There is money on the table if anyone can come up with a plan with fewer problems, yet no one has done so!”
Is this not ~normal for a field that it maturing? And by normal I also mean approximately unavoidable or ‘essential’. Like I could say ‘it sure takes a long time to get an understanding of who is doing what in the field of… computer science’, but I have no reason to believe that I can substantially ‘fix’ this situation in the space of a few months. It just really is because there is lots of complicated research going on by lots of different people, right? And ‘understanding’ what another researcher is doing is sometimes a really, really hard thing to do.
Sure, but that’s no reason not to try to make it easier!
Thank you, I think there’s an error in my phrasing.
I should have said:
not just that. It’s because the field isn’t organized at all.