Charlie Steiner comments on Beliefs and Disagreements about Automating Alignment Research

Charlie Steiner 25 Aug 2022 0:07 UTC
5 points
2
I kinda want to hear the level 2 comments where people just disagree with each others’ takes :P
My personal thoughts on the matter these days are about training data. Minerva shows that you can solve really tricky problems and help humans, if you just have a finetuning set of 5 billion tokens of people solving similar (and even harder) problems clearly and correctly.
What can/can’t you get that quality of training data on that would help alignment research? I think the bad news for Level 2 here is that we don’t even have 5 billion tokens of alignment research period (the alignment research dataset is something like 100M tokens), and the fraction of it that consists of clear and correct solutions to problems is quite small.
So either you content yourself with Level 1 tools like writing / coding assistants trained on broad data, or you get a few orders of magnitude better at learning how to generate useful ideas from limited data, which sounds… concerning.