TurnTrout comments on What specific dangers arise when asking GPT-N to write an Alignment Forum post?

TurnTrout 31 Jul 2020 13:38 UTC
7 points
we already see that; we’re constantly amazed by it, despite little meaning of created texts
But GPT-3 is only trained to minimize prediction loss, not to maximize response. GPT-N may be able to crowd-please if it’s trained on approval, but I don’t think that’s what’s currently happening.
- Jan Rzymkowski 2 Aug 2020 23:34 UTC
  1 point
  Parent
  Upon reflection, you’re right that it won’t be maximizing response per se.
  But as we get deeper it’s not so straightforward. GTP-3 models can be trained to minimize prediction loss (or, plainly speaking, to simply predict more accurately) on many different tasks, which usually are very simply stated (eg. choose a word that would fill the blank).
  But we end up with people taking models trained thusly and use them to generate a long texts based on some primer. And yes, in most cases such abuse of the model will end up with text that is simply coherent. But I would expect humans to have a tendency to conflate coherence and persuasiveness.
  I suppose one can fairly easily choose such prediction loss for GTP-3 models that the longer texts would have some desired characteristics. But also even standard tasks probably shape GTP-3 so that it would keep producing vague sentences that continue the primer and that give the reader a feel of “it making sense”. That would entail possibly producing fairly persuasive texts reinforcing primer thesis.