Hey everyone, my name is Jacques, I’m an independent technical alignment researcher, primarily focused on evaluations, interpretability, and scalable oversight (more on my alignment research soon!). I’m now focusing more of my attention on building an Alignment Research Assistant (I’ve been focusing on my alignment research for 95% of my time in the past year). I’m looking for people who would like to contribute to the project. This project will be private unless I say otherwise (though I’m listing some tasks); I understand the dual-use nature and most criticism against this kind of work.
How you can help:
Provide feedback on what features you think would be amazing in your workflow to produce high-quality research more efficiently.
Volunteer as a beta-tester for the assistant.
Contribute to one of the tasks below. (Send me a DM, and I’ll give you access to the private Discord to work on the project.)
Funding to hire full-time developers to build the features.
Here’s the vision for this project:
How might we build an AI system that augments researchers to get us 5x or 10x productivity for the field as a whole?
The system is designed with two main mindsets in mind:
Efficiency: What kinds of tasks do alignment researchers do, and how can we make them faster and more efficient?
Objective: Even if we make researchers highly efficient, it means nothing if they are not working on the right things. How can we ensure that researchers are working on the most valuable things? How can we nudge them to gain the most bits of information in the shortest time? This involves helping them work on the right agendas/projects and helping them break down their projects in ways that help them make progress faster (and avoiding ending up tunnel-visioned on the wrong project for months/years).
As of now, the project will focus on building an extension on top of VSCode to make it the ultimate research tool for alignment researchers. VSCode is ideal because researchers are already coding with it and it’s easy to build on top of it. You prevent context-switching, like a web app would cause. I want the entire workflow to feel natural inside of VSCode. In general, I think this will make things easier to build on top of and automate parts of research over time.
Side note: I helped build the Alignment Research Dataset ~2 years ago (here’s the extended project). It is now being continually updated, and SQL and vector databases (which will interface with the assistant) are also being used.
If you are interested in potentially helping out (or know someone who might be!), send me a DM with a bit of your background and why you’d like to help out. To keep things focused, I may or may not accept.
I’m also collaborating with different groups (Apart Research, AE Studio, and more). In 2-3 months, I want to get it to a place where I know whether this is useful for other researchers and if we should apply for additional funding to turn it into a serious project.
As an update to the Alignment Research Assistant I’m building, here is a set of shovel-ready tasks I would like people to contribute to (please DM if you’d like to contribute!). These tasks are the ones that are easier to articulate and pretty self-contained:
An LLM periodically looks through the project you are working on and tries to suggest *actually useful* things in the side-chat. It will be a delicate balance to make sure not to share too much and cause a loss of focus. This could be custom for the research with an option only to give automated suggestions post-research session.
8. Figure out if we can get a useable browser inside of VSCode (tried quickly with the Edge extension but couldn’t sign into the Claude chat website)
Could make use of new features other companies build (like Anthropic’s Artifact feature), but inside of VSCode to prevent context-switching in an actual browser
9. “Alignment Research Codebase” integration (can add as Continue backend)
Create an easily insertable set of repeatable code that researchers can quickly add to their project or LLM context
This includes code for Multi-GPU stuff, best practices for codebase, and more
Should make it easy to populate a new codebase
Pro-actively gives suggestions to improve the code
Generally makes common code implementation much faster
10. Notebook to high-quality codebase
Can go into more detail via DMs.
11. Adding capability papers to the Alignment Research Dataset
We didn’t do this initially to reduce exfohazards. The purpose of adding capability papers (and all the new alignment papers) is to improve the assistant.
We will not be open-sourcing this part of the work; this part of the dataset will be used strictly by the vetted alignment researchers using the assistant.
Specialized tooling (outside of VSCode)
Bulk fast content extraction
Create an extension to extract content from multiple tabs or papers
Simplify the process of feeding content to the VSCode backend for future use
Personalized Research Newsletter
Create a tool that extracts relevant information for researchers (papers, posts, other sources)
Generate personalized newsletters based on individual interests (open questions and research they care about)
Sends pro-active notification in VSCode and Email
Discord Bot for Project Proposals
Suggest relevant papers/posts/repos based on project proposals
We’re doing a hackathon with Apart Research on 26th. I created a list of problem statements for people to brainstorm off of.
Pro-active insight extraction from new research
Reading papers can take a long time and is often not worthwhile. As a result, researchers might read too many papers or almost none. However, there are still valuable nuggets in papers and posts. The issue is finding them. So, how might we design an AI research assistant that proactively looks at new papers (and old) and shares valuable information with researchers in a naturally consumable way? Part of this work involves presenting individual research with what they would personally find valuable and not overwhelm them with things they are less interested in.
How can we improve the LLM experience for researchers?
Many alignment researchers will use language models much less than they would like to because they don’t know how to prompt the models, it takes time to create a valuable prompt, the model doesn’t have enough context for their project, the model is not up-to-date on the latest techniques, etc. How might we make LLMs more useful for researchers by relieving them of those bottlenecks?
Simple experiments can be done quickly, but turning it into a full project can take a lot of time
One key bottleneck for alignment research is transitioning from an initial 24-hour simple experiment in a notebook to a set of complete experiments tested with different models, datasets, interventions, etc. How can we help researchers move through that second research phase much faster?
How might we use AI agents to automate alignment research?
As AI agents become more capable, we can use them to automate parts of alignment research. The paper “A Multimodal Automated Interpretability Agent” serves as an initial attempt at this. How might we use AI agents to help either speed up alignment research or unlock paths that were previously inaccessible?
How can we nudge research toward better objectives (agendas or short experiments) for their research?
Even if we make researchers highly efficient, it means nothing if they are not working on the right things. Choosing the right objectives (projects and next steps) through time can be the difference between 0x to 1x to +100x. How can we ensure that researchers are working on the most valuable things?
What can be done to accelerate implementation and iteration speed?
Implementation and iteration speed on the most informative experiments matter greatly. How can we nudge them to gain the most bits of information in the shortest time? This involves helping them work on the right agendas/projects and helping them break down their projects in ways that help them make progress faster (and avoiding ending up tunnel-visioned on the wrong project for months/years).
How can we connect all of the ideas in the field?
How can we integrate the open questions/projects in the field (with their critiques) in such a way that helps the researcher come up with well-grounded research directions faster? How can we aid them in choosing better directions and adjust throughout their research? This kind of work may eventually be a precursor to guiding AI agents to help us develop better ideas for alignment research.
This just got some massive downvotes. Would like to know why. My guess is “This can be dual-use. Therefore, it’s bad,” but if not, it would be nice to know.
Alignment Researcher Assistant update.
Hey everyone, my name is Jacques, I’m an independent technical alignment researcher, primarily focused on evaluations, interpretability, and scalable oversight (more on my alignment research soon!). I’m now focusing more of my attention on building an Alignment Research Assistant (I’ve been focusing on my alignment research for 95% of my time in the past year). I’m looking for people who would like to contribute to the project. This project will be private unless I say otherwise (though I’m listing some tasks); I understand the dual-use nature and most criticism against this kind of work.
How you can help:
Provide feedback on what features you think would be amazing in your workflow to produce high-quality research more efficiently.
Volunteer as a beta-tester for the assistant.
Contribute to one of the tasks below. (Send me a DM, and I’ll give you access to the private Discord to work on the project.)
Funding to hire full-time developers to build the features.
Here’s the vision for this project:
Side note: I helped build the Alignment Research Dataset ~2 years ago (here’s the extended project). It is now being continually updated, and SQL and vector databases (which will interface with the assistant) are also being used.
If you are interested in potentially helping out (or know someone who might be!), send me a DM with a bit of your background and why you’d like to help out. To keep things focused, I may or may not accept.
I’m also collaborating with different groups (Apart Research, AE Studio, and more). In 2-3 months, I want to get it to a place where I know whether this is useful for other researchers and if we should apply for additional funding to turn it into a serious project.
As an update to the Alignment Research Assistant I’m building, here is a set of shovel-ready tasks I would like people to contribute to (please DM if you’d like to contribute!). These tasks are the ones that are easier to articulate and pretty self-contained:
Core Features
1. Setup the Continue extension for research: https://www.continue.dev/
Design prompts in Continue that are suitable for a variety of alignment research tasks and make it easy to switch between these prompts
Figure out how to scaffold LLMs with Continue (instead of just prompting one LLM with additional context)
It can include agents, search, and more
Test out models to quickly help with paper writing
2. Data sourcing and management
Integrate with the Alignment Research Dataset (pulling from either the SQL database or Pinecone vector database): https://github.com/StampyAI/alignment-research-dataset
Integrate with other apps (Google Docs, Obsidian, Roam Research, Twitter, LessWrong)
Make it easy to look and edit long prompts for project context
3. Extract answers to questions across multiple papers/posts (feeds into Continue)
Develop high-quality chunking and scaffolding techniques
Implement multi-step interaction between researcher and LLM
4. Design Autoprompts for alignment research
Creates lengthy, high-quality prompts for researchers that get better responses from LLMs
5. Simulated Paper Reviewer
Fine-tune or prompt LLM to behave like an academic reviewer
Use OpenReview data for training
6. Jargon and Prerequisite Explainer
Design a sidebar feature to extract and explain important jargon
Could maybe integrate with some interface similar to https://delve.a9.io/
7. Setup automated “suggestion-LLM”
An LLM periodically looks through the project you are working on and tries to suggest *actually useful* things in the side-chat. It will be a delicate balance to make sure not to share too much and cause a loss of focus. This could be custom for the research with an option only to give automated suggestions post-research session.
8. Figure out if we can get a useable browser inside of VSCode (tried quickly with the Edge extension but couldn’t sign into the Claude chat website)
Could make use of new features other companies build (like Anthropic’s Artifact feature), but inside of VSCode to prevent context-switching in an actual browser
9. “Alignment Research Codebase” integration (can add as Continue backend)
Create an easily insertable set of repeatable code that researchers can quickly add to their project or LLM context
This includes code for Multi-GPU stuff, best practices for codebase, and more
Should make it easy to populate a new codebase
Pro-actively gives suggestions to improve the code
Generally makes common code implementation much faster
10. Notebook to high-quality codebase
Can go into more detail via DMs.
11. Adding capability papers to the Alignment Research Dataset
We didn’t do this initially to reduce exfohazards. The purpose of adding capability papers (and all the new alignment papers) is to improve the assistant.
We will not be open-sourcing this part of the work; this part of the dataset will be used strictly by the vetted alignment researchers using the assistant.
Specialized tooling (outside of VSCode)
Bulk fast content extraction
Create an extension to extract content from multiple tabs or papers
Simplify the process of feeding content to the VSCode backend for future use
Personalized Research Newsletter
Create a tool that extracts relevant information for researchers (papers, posts, other sources)
Generate personalized newsletters based on individual interests (open questions and research they care about)
Sends pro-active notification in VSCode and Email
Discord Bot for Project Proposals
Suggest relevant papers/posts/repos based on project proposals
Integrate with Apart Research Hackathons
We’re doing a hackathon with Apart Research on 26th. I created a list of problem statements for people to brainstorm off of.
Pro-active insight extraction from new research
Reading papers can take a long time and is often not worthwhile. As a result, researchers might read too many papers or almost none. However, there are still valuable nuggets in papers and posts. The issue is finding them. So, how might we design an AI research assistant that proactively looks at new papers (and old) and shares valuable information with researchers in a naturally consumable way? Part of this work involves presenting individual research with what they would personally find valuable and not overwhelm them with things they are less interested in.
How can we improve the LLM experience for researchers?
Many alignment researchers will use language models much less than they would like to because they don’t know how to prompt the models, it takes time to create a valuable prompt, the model doesn’t have enough context for their project, the model is not up-to-date on the latest techniques, etc. How might we make LLMs more useful for researchers by relieving them of those bottlenecks?
Simple experiments can be done quickly, but turning it into a full project can take a lot of time
One key bottleneck for alignment research is transitioning from an initial 24-hour simple experiment in a notebook to a set of complete experiments tested with different models, datasets, interventions, etc. How can we help researchers move through that second research phase much faster?
How might we use AI agents to automate alignment research?
As AI agents become more capable, we can use them to automate parts of alignment research. The paper “A Multimodal Automated Interpretability Agent” serves as an initial attempt at this. How might we use AI agents to help either speed up alignment research or unlock paths that were previously inaccessible?
How can we nudge research toward better objectives (agendas or short experiments) for their research?
Even if we make researchers highly efficient, it means nothing if they are not working on the right things. Choosing the right objectives (projects and next steps) through time can be the difference between 0x to 1x to +100x. How can we ensure that researchers are working on the most valuable things?
What can be done to accelerate implementation and iteration speed?
Implementation and iteration speed on the most informative experiments matter greatly. How can we nudge them to gain the most bits of information in the shortest time? This involves helping them work on the right agendas/projects and helping them break down their projects in ways that help them make progress faster (and avoiding ending up tunnel-visioned on the wrong project for months/years).
How can we connect all of the ideas in the field?
How can we integrate the open questions/projects in the field (with their critiques) in such a way that helps the researcher come up with well-grounded research directions faster? How can we aid them in choosing better directions and adjust throughout their research? This kind of work may eventually be a precursor to guiding AI agents to help us develop better ideas for alignment research.
This just got some massive downvotes. Would like to know why. My guess is “This can be dual-use. Therefore, it’s bad,” but if not, it would be nice to know.