Stephen McAleese comments on LLMs for Alignment Research: a safety priority?

Stephen McAleese 5 Apr 2024 8:43 UTC
LW: 16 AF: 11
−1
AF
LLMs aren’t that useful for alignment experts because it’s a highly specialized field and there isn’t much relevant training data. The AI Safety Chatbot partially solves this problem using retrieval-augmented generation (RAG) on a database of articles from https://aisafety.info. There also seem to be plans to fine-tune it on a dataset of alignment articles.
- mruwnik 13 Apr 2024 18:49 UTC
  8 points
  2
  Parent
  It’s not just from https://aisafety.info/. It also uses Arbital, any posts from the alignment forum, LW, EA forum that seem relevant and have a minimum karma, a bunch of arXiv papers, and a couple of other sources. This is a a relatively up to date list of the sources used (it also contains the actual data).
- ryan_greenblatt 5 Apr 2024 16:47 UTC
  LW: 5 AF: 3
  2
  AF Parent
  
  LLMs aren’t that useful for alignment experts because it’s a highly specialized field and there isn’t much relevant training data.
  
  Seems plausibly true for the alignment specific philosophy/conceptual work, but many people attempting to improve safety also end up doing large amounts of relatively normal work in other domains (ML, math, etc.)
  
  The post is more centrally talking about the very alignment specific use cases of course.
- abramdemski 5 Apr 2024 17:39 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Sounds pretty cool! What LLM powers it?
  - plex 13 Apr 2024 12:54 UTC
    LW: 4 AF: 2
    0
    AF Parent
    We’re likely to switch to Claude 3 soon, but currently GPT 3.5. We are mostly expecting it to be useful as a way to interface with existing knowledge initially, but we could make an alternate prompt which is more optimized for being a research assistant brainstorming new ideas if that was wanted.
    Would it be useful to be able to set your own system prompt for this? Or have a default one?
    - abramdemski 17 Apr 2024 14:49 UTC
      LW: 4 AF: 3
      2
      AF Parent
      I don’t have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge).
      I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!
      - plex 20 Apr 2024 8:20 UTC
        3 points
        0
        Parent
        DMed a link to an interface which lets you select system prompt and model (including Claude). This is open to researchers to test, but not positing fully publicly as it is not very resistant to people who want to burn credits right now.
        Other researchers feel free to DM me if you’d like access.
  - Stephen McAleese 5 Apr 2024 19:08 UTC
    2 points
    0
    Parent
    From reading the codebase, it seems to be a LangChain chatbot powered by the default LangChain OpenAI model which is gpt-3.5-turbo-instruct. The announcement blog post also says it’s based on gpt-3.5-turbo.