There are important business problems that require medical expertise to be solved. On the other hand, I wouldn’t expect it to be very helpful with the core alignment problem.
I was using medical questions as just one example of the kind of task that’s relevant to sandwiching. More generally, what’s particularly useful for this research programme are
tasks where we have “models which have the potential to be superhuman at [the] task”, and “for which we have no simple algorithmic-generated or hard-coded training signal that’s adequate”; and
for which there is some set of reference humans who are currently better at the task than the model;
and for which there is some set of reference humans for whom the task is difficult enough that they would have trouble even evaluating/recognizing good performance. (you also want this set of reference humans to be capable of being helped to evaluate/recognize good performance in some way)
Prime examples are task types that require some kind of niche expertise to do and evaluate. Cotra’s examples involve “[fine-tuning] a model to answer long-form questions in a domain (e.g. economics or physics) using demonstrations and feedback collected from experts in the domain”, “[fine-tuning] a coding model to write short functions solving simple puzzles using demonstrations and feedback collected from expert software engineers”, “[fine-tuning] a model to translate between English and French using demonstrations and feedback collected from people who are fluent in both languages”. I was just making the point that Surge can help with this kind of thing in some domains (coding), but not in others.
There are important business problems that require medical expertise to be solved. On the other hand, I wouldn’t expect it to be very helpful with the core alignment problem.
I was using medical questions as just one example of the kind of task that’s relevant to sandwiching. More generally, what’s particularly useful for this research programme are
tasks where we have “models which have the potential to be superhuman at [the] task”, and “for which we have no simple algorithmic-generated or hard-coded training signal that’s adequate”; and
for which there is some set of reference humans who are currently better at the task than the model;
and for which there is some set of reference humans for whom the task is difficult enough that they would have trouble even evaluating/recognizing good performance. (you also want this set of reference humans to be capable of being helped to evaluate/recognize good performance in some way)
Prime examples are task types that require some kind of niche expertise to do and evaluate. Cotra’s examples involve “[fine-tuning] a model to answer long-form questions in a domain (e.g. economics or physics) using demonstrations and feedback collected from experts in the domain”, “[fine-tuning] a coding model to write short functions solving simple puzzles using demonstrations and feedback collected from expert software engineers”, “[fine-tuning] a model to translate between English and French using demonstrations and feedback collected from people who are fluent in both languages”. I was just making the point that Surge can help with this kind of thing in some domains (coding), but not in others.