keith_wynroe comments on OpenAI Superalignment: Weak-to-strong generalization

keith_wynroe 15 Dec 2023 15:07 UTC
2 points
0
I know they flag it in the paper, but seeing the performance curves for the strong model on zero- and few-shot attempts really makes me think the data leakage issue is doing a lot of the work here. If you get the majority(?) of the PGR from e.g. 5-shot prompting it seems like a natural takeaway is the strong model doesn’t actually need to be fine-tuned on the task, and the weak supervisor is just eliciting the knowledge that’s already there