jsteinhardt comments on Experimentally evaluating whether honesty generalizes

jsteinhardt 13 Jul 2021 1:19 UTC
LW: 2 AF: 1
AF
Actually, another issue is that unsupervised translation isn’t “that hard” relative to supervised translation—I think that you can get pretty far with simple heuristics, such that I’d guess making the model 10x bigger matters more than making the objective more aligned with getting the answer right (and that this will be true for at least a couple more 10x-ing of model size, although at some point the objective will matter more).
This might not matter as much if you’re actually outputting explanations and not just translating from one language to another. Although it is probably true that for tasks that are far away from the ceiling, “naive objective + 10x larger model” will outperform “correct objective”.
- paulfchristiano 14 Jul 2021 0:43 UTC
  LW: 2 AF: 2
  AF Parent
  I do expect “explanations of what’s going on in this sentence” to be a lot weaker than translations.
  For that task, I expect that the model trained on coherence + similar tasks will outperform a 10x larger pre-trained model. If the larger pre-trained model gets context stuffing on similar tasks, but no coherence training, then it’s less clear to me.
  But I guess the point is that the differences between various degrees of successful-generalization will be relatively small compared to model size effects. It doesn’t matter so much how good the transfer model is relative to the pre-trained baseline, it matters how large the differences between the possible worlds that we are hoping to distinguish are.
  I guess my main hope there is to try to understand whether there is some setting where transfer works quite well, either getting very close to the model fine-tuned on distribution, or at least converging as the pre-trained model grows. Hopefully that will make it easier to notice the effects we are looking for, and it’s OK if those effects are small relative to model doublings.
  (Also worth noting that “as good as increasing model size by 10%” is potentially quite economically relevant. So I’m mostly just thinking about the extent to which it can make effects hard to measure.)