TL;DR: We are building language model powered tools to augment alignment researchers and accelerate alignment progress. We could use your feedback on what tools would be most useful. We’ve created a short survey that can be filled out here.
We are a team from the current iteration of the AI Safety campand are planning to build a suite of tools to help AI Safety researchers.
We’re looking for feedback on what kinds of tools would be most helpful to you as an established or prospective alignment researcher. We’ve put together a short survey to get a better understanding of how researchers work on alignment. We plan to analyze the results and make them available to the community (appropriately anonymized). The survey is here.If you would also be interested in talking directly, please feel free to schedule a call here.
This project is similar in motivation to Ought’s Elicit, but more focused on human-in-the-loop and tailored for alignment research. One example of a tool we could create would be a language model that intelligently condenses existing alignment research into summaries or expands rough outlines into drafts of full Alignment Forum posts. Another idea we’ve considered is a brainstorming tool that can generate new examples/counterexamples, new arguments/counterarguments, or new directions to explore.
In the long run, we’re interested in creating seriously empowering tools that fall under categorizations like STEM AI, Microscope AI, superhuman personal assistant AI, or plainly Oracle AI. These early tools are oriented towards more proof-of-concept work, but still aim to be immediately helpful to alignment researchers. Our prior that this is a promising direction is informed in part by our own very fruitful and interesting experiences using language models as writing and brainstorming aids.
One central danger of tools with the ability to increase research productivity is dual-use for capabilities research. Consequently, we’re planning to ensure that these tools will be specifically tailored to the AI Safety community and not to other scientific fields. We do not intend to publish the specifics methods we use to create these tools.
We welcome any feedback, comments, or concerns about our direction. Also, if you’d like to contribute to the project, feel free to join us at the #accelerating-alignment channel in the EleutherAI channel.
A survey of tool use and workflows in alignment research
TL;DR: We are building language model powered tools to augment alignment researchers and accelerate alignment progress. We could use your feedback on what tools would be most useful. We’ve created a short survey that can be filled out here.
We are a team from the current iteration of the AI Safety camp and are planning to build a suite of tools to help AI Safety researchers.
We’re looking for feedback on what kinds of tools would be most helpful to you as an established or prospective alignment researcher. We’ve put together a short survey to get a better understanding of how researchers work on alignment. We plan to analyze the results and make them available to the community (appropriately anonymized). The survey is here. If you would also be interested in talking directly, please feel free to schedule a call here.
This project is similar in motivation to Ought’s Elicit, but more focused on human-in-the-loop and tailored for alignment research. One example of a tool we could create would be a language model that intelligently condenses existing alignment research into summaries or expands rough outlines into drafts of full Alignment Forum posts. Another idea we’ve considered is a brainstorming tool that can generate new examples/counterexamples, new arguments/counterarguments, or new directions to explore.
In the long run, we’re interested in creating seriously empowering tools that fall under categorizations like STEM AI, Microscope AI, superhuman personal assistant AI, or plainly Oracle AI. These early tools are oriented towards more proof-of-concept work, but still aim to be immediately helpful to alignment researchers. Our prior that this is a promising direction is informed in part by our own very fruitful and interesting experiences using language models as writing and brainstorming aids.
One central danger of tools with the ability to increase research productivity is dual-use for capabilities research. Consequently, we’re planning to ensure that these tools will be specifically tailored to the AI Safety community and not to other scientific fields. We do not intend to publish the specifics methods we use to create these tools.
We welcome any feedback, comments, or concerns about our direction. Also, if you’d like to contribute to the project, feel free to join us at the #accelerating-alignment channel in the EleutherAI channel.
Thanks in advance!