I agree, which is why I don’t expect OpenAI to do better. If both teams are tweaking equations here and there, based on their prior work I see DeepMind does it more efficiently. OpenAI has been historically luckier, but luck is not a quality I would extrapolate in time.
You may not expect OA to do better (neither did I, even if I expected someone somewhere to crack the problem within a few years of GPT-3), but that’s not the relevant fact here, now that you have observed that apparently there’s this “Q*” & “Zero” thing that OAers are super-excited about. It was hard work and required luck, but apparently they got lucky. It is what it is. (‘Update your priors using the evidence to obtain a new posterior’, as we like to say around here.)
How much does that help someone else get lucky? Well, it depends on how much they leak or publish. If it’s like the GPT-3 paper, then yeah, people can replicate it quickly and are sufficiently motivated these days that they probably will. If it’s like the GPT-4 “paper”, well… Knowing someone else has won the lottery of tweaking equations at random here & there doesn’t help you win the lottery yourself.
(The fact that self-play or LLM search of some sort works is not that useful—we all knew it has to work somehow! It’s the critical vital details which is the secret sauce that probably matters here. How exactly does their particular variant thread the needle’s eye to avoid diverging or plateauing etc? Remember Karpathy’s law: “neural nets want to work”. So even if your approach is badly broken, it can mislead you for a long time by working better than it has any right to.)
I agree, which is why I don’t expect OpenAI to do better. If both teams are tweaking equations here and there, based on their prior work I see DeepMind does it more efficiently. OpenAI has been historically luckier, but luck is not a quality I would extrapolate in time.
You may not expect OA to do better (neither did I, even if I expected someone somewhere to crack the problem within a few years of GPT-3), but that’s not the relevant fact here, now that you have observed that apparently there’s this “Q*” & “Zero” thing that OAers are super-excited about. It was hard work and required luck, but apparently they got lucky. It is what it is. (‘Update your priors using the evidence to obtain a new posterior’, as we like to say around here.)
How much does that help someone else get lucky? Well, it depends on how much they leak or publish. If it’s like the GPT-3 paper, then yeah, people can replicate it quickly and are sufficiently motivated these days that they probably will. If it’s like the GPT-4 “paper”, well… Knowing someone else has won the lottery of tweaking equations at random here & there doesn’t help you win the lottery yourself.
(The fact that self-play or LLM search of some sort works is not that useful—we all knew it has to work somehow! It’s the critical vital details which is the secret sauce that probably matters here. How exactly does their particular variant thread the needle’s eye to avoid diverging or plateauing etc? Remember Karpathy’s law: “neural nets want to work”. So even if your approach is badly broken, it can mislead you for a long time by working better than it has any right to.)