LLMs aren’t that useful for alignment experts because it’s a highly specialized field and there isn’t much relevant training data. The AI Safety Chatbot partially solves this problem using retrieval-augmented generation (RAG) on a database of articles from https://aisafety.info. There also seem to beplansto fine-tune it on a dataset of alignment articles.
It’s not just from https://aisafety.info/. It also uses Arbital, any posts from the alignment forum, LW, EA forum that seem relevant and have a minimum karma, a bunch of arXiv papers, and a couple of other sources. This is a a relatively up to date list of the sources used (it also contains the actual data).
LLMs aren’t that useful for alignment experts because it’s a highly specialized field and there isn’t much relevant training data.
Seems plausibly true for the alignment specific philosophy/conceptual work, but many people attempting to improve safety also end up doing large amounts of relatively normal work in other domains (ML, math, etc.)
The post is more centrally talking about the very alignment specific use cases of course.
We’re likely to switch to Claude 3 soon, but currently GPT 3.5. We are mostly expecting it to be useful as a way to interface with existing knowledge initially, but we could make an alternate prompt which is more optimized for being a research assistant brainstorming new ideas if that was wanted.
Would it be useful to be able to set your own system prompt for this? Or have a default one?
I don’t have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge).
I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!
DMed a link to an interface which lets you select system prompt and model (including Claude). This is open to researchers to test, but not positing fully publicly as it is not very resistant to people who want to burn credits right now.
Other researchers feel free to DM me if you’d like access.
From reading the codebase, it seems to be a LangChain chatbot powered by the default LangChain OpenAI model which is gpt-3.5-turbo-instruct. The announcement blog post also says it’s based on gpt-3.5-turbo.
LLMs aren’t that useful for alignment experts because it’s a highly specialized field and there isn’t much relevant training data. The AI Safety Chatbot partially solves this problem using retrieval-augmented generation (RAG) on a database of articles from https://aisafety.info. There also seem to be plans to fine-tune it on a dataset of alignment articles.
It’s not just from https://aisafety.info/. It also uses Arbital, any posts from the alignment forum, LW, EA forum that seem relevant and have a minimum karma, a bunch of arXiv papers, and a couple of other sources. This is a a relatively up to date list of the sources used (it also contains the actual data).
Seems plausibly true for the alignment specific philosophy/conceptual work, but many people attempting to improve safety also end up doing large amounts of relatively normal work in other domains (ML, math, etc.)
The post is more centrally talking about the very alignment specific use cases of course.
Sounds pretty cool! What LLM powers it?
We’re likely to switch to Claude 3 soon, but currently GPT 3.5. We are mostly expecting it to be useful as a way to interface with existing knowledge initially, but we could make an alternate prompt which is more optimized for being a research assistant brainstorming new ideas if that was wanted.
Would it be useful to be able to set your own system prompt for this? Or have a default one?
I don’t have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge).
I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!
DMed a link to an interface which lets you select system prompt and model (including Claude). This is open to researchers to test, but not positing fully publicly as it is not very resistant to people who want to burn credits right now.
Other researchers feel free to DM me if you’d like access.
From reading the codebase, it seems to be a LangChain chatbot powered by the default LangChain OpenAI model which is gpt-3.5-turbo-instruct. The announcement blog post also says it’s based on gpt-3.5-turbo.