Lauro Langosco comments on Introducing AlignmentSearch: An AI Alignment-Informed Conversional Agent

Lauro Langosco 1 Apr 2023 11:52 UTC
6 points
1
Thanks for doing this! I think this could be valuable. What’s your current plan for developing this further / onboarding collaborators?

Some observations / thoughts from interacting with the QA system for a few minutes:
- The current version has trouble answering basic AI safety questions. For example:
Why do people think AI is an existential risk? People think AI is an existential risk because of the possibility of a superintelligent AI system with recursive self-improvement capabilities, which could lead to catastrophic consequences like turning humans into paper clips, as highlighted in[1]. However, the perception of AI as an existential risk is not universal, and some people may discount it as a problem that is too vague or unlikely to outweigh other issues facing the world, as discussed in[1]. Nonetheless, technical AI safety research, defined as quality research aimed at long-term outcomes that are reliably beneficial, is necessary to mitigate the risks of AI[2]. Additionally, some argue that rapid increases in human power could be achieved without increasing existential risks, including through skillful AI development, but it is important to prioritize technical AI safety research to avoid catastrophic consequences[3].
- The search could be much improved by curating the dataset. An easy way to do this is to exclude posts with low upvote counts. Though eventually you’ll want to do more, eg by being opinionated about what to include.
- It might be worth having a chatbot that just talks people through the “extended bad alignment take bingo”, that is all the reasons why the easy solutions people like to come up with don’t work. Here you could just exclude all proposals for actual alignment solutions from the dataset (and you can avoid having to make calls about what agendas have promise vs. which ones are actually nonsensical)
- It would be very useful to have a feedback function where people can mark wrong answers. If we want to make this good, we’ll need to red-team the model and make sure it answers all the basic questions correctly, probably by curating a Question-Answer dataset
- BionicD0LPH1N 2 Apr 2023 20:07 UTC
  2 points
  0
  Parent
  Thanks for the comment!
  At this point, we don’t have a very clear plan, other than thinking of functionalities and adding them as fast as possible in an order that seems sensible. The functionalities we want to add include:
  - Automatic update of the dataset relatively often.
  - Stream completions.
  - Test embeddings using SentenceTransformers + Finetuning instead of OpenAI for cost and quality, and store them in Pinecone/Weaviate/Other (tbd); this will enable us to use the whole dataset for semantic search, and for the semantic similarity to have more ‘knowledge’ about technical terms used in the alignment space, which I expect to produce better results. We also want to test and add biases to favor ‘good’ sources to maximize the quality of semantic search. It’s also possible that we’ll make a smaller, more specialized dataset of curated content.
  - Add modes and options. HyDE, Debate, Comment, Synthesis, temp, etc. Possibly add options to make use of GPT-4, depending on feasibility.
  - Figure out how to make this scale without going bankrupt.
  - Add thumbs-up/down for A/B testing prompt, the bias terms, and curated vs uncurated datasets.
  - Add recommended next questions the user can ask, possibly taken from a question database.
  - Improve UX/UI.
  We have not taken much time (we were very pressed for it!) to consider the best way to onboard collaborators. We are communicating on our club’s Discord server at the moment, and would be happy to add people who want to contribute, especially if you have experience in any of the above. DM me on Discord at BionicD0LPH1N#5326 or on LW.
  The current version has trouble answering basic AI safety questions.
  That’s true sometimes, and a problem. We observe fewer such errors on the full dataset, and are currently working on having that up. Additional modes, like HyDE, and the bias mentioned earlier, might further improve results. Getting better embeddings + finetuning them on our dataset might improve search. Finally, when the thumbs up/down feature is up, we will be able to quickly search over a list of possible prompts we think might be more successful, and find the ones that reduce bad answers. Overall, I think that this is a very solvable problem, and are making rapid progress.
  About curating the dataset (or favoring some types of content), we agree and are currently investigating the best ways to do this.
  About walking people through the extended alignment bingo, this is a feature we’re planning to add. Something that might make sense is to have a slider for ‘level-of-expertise’, where beginners would have more detailed answers that assume less knowledge, and get recommended further questions that guide them through the bad takes bingo.
  The feedback function for wrong answers is one of our top priorities, and in the meantime we ask you give the failing question-answer pairs in our form.