John_Maxwell comments on Why GPT wants to mesa-optimize & how we might change this

John_Maxwell 27 Sep 2020 5:45 UTC
LW: 2 AF: 1
AF
Your philosophical point is interesting; I have a post in the queue about that. However I don’t think it really proves what you want it to.

Having John_Maxwell in the byline makes it far more likely that I’m the author of the post.

If humans can make useful judgements re: whether this is something I wrote, vs something nostalgebraist wrote to make a point about bylines, I don’t see why a language model can’t do the same, in principle.

GPT is trying to be optimal at next-step prediction, and an optimal next-step predictor should not get improved by lookahead, it should already have those facts priced in to its next-step prediction.

A perfectly optimal next-step predictor would not be improved by lookahead or anything else, it’s perfectly optimal. I’m talking about computational structures which might be incentivized during training when the predictor is suboptimal. (It’s still going to be suboptimal after training with current technology, of course.)

In orthonormal’s post they wrote:

...GPT-3′s ability to write fiction is impressive- unlike GPT-2, it doesn’t lose track of the plot, it has sensible things happen, it just can’t plan its way to a satisfying resolution.

I’d be somewhat surprised if GPT-4 shared that last problem.

I suspect that either GPT-4 will still be unable to plan its way to a satisfying resolution, or GPT-4 will develop some kind of internal lookahead (probably not beam search, but beam search could be a useful model for understanding it) which is sufficiently general to be re-used across many different writing tasks. (Generality takes fewer parameters.) I don’t know what the relative likelihoods of those possibilities are. But the whole idea of AI safety is to ask what happens if we succeed.