gabrielrecc comments on Why we’re not founding a human-data-for-alignment org

gabrielrecc Sep 28, 2022, 12:29 PM
10 points
1
It’s worth knowing that there are some categories of data that Surge is not well positioned to provide. For example, while they have a substantial pool of participants with programming expertise, my understanding from speaking with a Surge rep is that they don’t really have access to a pool of participants with (say) medical expertise—although for small projects it sounds like they are willing to try to see who they might already have with relevant experience in their existing pool of ‘Surgers’. This kind of more niche expertise does seem likely to become increasingly relevant for sandwiching experiments. I’d be interested in learning more about companies or resources that can help collect RLHF data from people with uncommon (but not super-rare) kinds of expertise for exactly this reason.
- ChristianKl Sep 28, 2022, 1:23 PM
  2 points
  0
  Parent
  There are important business problems that require medical expertise to be solved. On the other hand, I wouldn’t expect it to be very helpful with the core alignment problem.
  - gabrielrecc Sep 28, 2022, 4:58 PM
    3 points
    2
    Parent
    I was using medical questions as just one example of the kind of task that’s relevant to sandwiching. More generally, what’s particularly useful for this research programme are
    tasks where we have “models which have the potential to be superhuman at [the] task”, and “for which we have no simple algorithmic-generated or hard-coded training signal that’s adequate”; and
    for which there is some set of reference humans who are currently better at the task than the model;
    and for which there is some set of reference humans for whom the task is difficult enough that they would have trouble even evaluating/recognizing good performance. (you also want this set of reference humans to be capable of being helped to evaluate/recognize good performance in some way)
    Prime examples are task types that require some kind of niche expertise to do and evaluate. Cotra’s examples involve “[fine-tuning] a model to answer long-form questions in a domain (e.g. economics or physics) using demonstrations and feedback collected from experts in the domain”, “[fine-tuning] a coding model to write short functions solving simple puzzles using demonstrations and feedback collected from expert software engineers”, “[fine-tuning] a model to translate between English and French using demonstrations and feedback collected from people who are fluent in both languages”. I was just making the point that Surge can help with this kind of thing in some domains (coding), but not in others.