Really appreciate this post! The recommendation “Evaluators should ensure that effective capability elicitation techniques are used for their evaluations” is especially important. Zero-shot, single-turn prompts with no transformations no longer seem representative of a model’s impact on the public (who, in aggregate or with scant determination, will be inflicting many variants of unsanctioned prompts with many shots or many turns)
Really appreciate this post! The recommendation “Evaluators should ensure that effective capability elicitation techniques are used for their evaluations” is especially important. Zero-shot, single-turn prompts with no transformations no longer seem representative of a model’s impact on the public (who, in aggregate or with scant determination, will be inflicting many variants of unsanctioned prompts with many shots or many turns)