gwern comments on How well did Manifold predict GPT-4?

gwern 16 Mar 2023 21:05 UTC
3 points
1
That’s interesting. Earlier, he was very explicitly identifying temperature with creativity in the Tweets I collated when commenting about how the controls worked. So now if the temperature is identical but he’s calling whatever it is ‘creative’, he’s completely flipped his position on “hallucinations = creativity”, apparently.

Hm. So it’s the same temperature, but it’s more expensive, which has ‘longer output, more expressive, slower’, requires more context… That could point to it being a different model under the hood. But it could also point to a different approach entirely, like implementing best-of sampling, or perhaps some inner-monologue-like approach like a hidden prompt generating several options and then another prompt to pick “the most creative” one. There were some earlier comments about Sydney possibly having a hidden inner-monologue scratchpad/buffer where it could do a bunch of outputs before returning only 1 visible answer to the user. (This could be parallelized if you generated the n suggestions in parallel and didn’t mind the possible redundancy, but is inherently still more serial steps than simply generating 1 answer immediately.) This could be ‘pick the most creative one’ for creative mode, or ‘pick the most correct one’ for ‘precise’ mode, etc. So this wouldn’t necessarily be anything new and could have been iterated very quickly (but as he says, it’d be inherently slower, generate longer responses, and be more expensive, and be hard to optimize much more).

This is something you could try to replicate with ChatGPT/GPT-4. Ask it to generate several different answers to the Monty Fall problem, and then ask it for the most correct one.