So far as I know, it is not the case that OpenAI had a slower-but-equally-functional version of GPT4 many months before announcement/release. What they did have is GPT4 itself, months before; but they did not have a slower version. They didn’t release a substantially distilled version. For example, the highest estimate I’ve seen is that they trained a 2-trillion-parameter model. And the lowest estimate I’ve seen is that they released a 200-billion-parameter model. If both are true, then they distilled 10x… but it’s much more likely that only one is true, and that they released what they trained, distilling later. (The parameter count is proportional to the inference cost.)
Previously, delays in release were believed to be about post-training improvements (e.g. RLHF) or safety testing. Sure, there were possibly mild infrastructure optimizations before release, but mostly to scale to many users; the models didn’t shrink.
This is for language models. For alphazero, I want to point out that it was announced 6 years ago (infinity by AI scale), and from my understanding we still don’t have a 1000x faster version, despite much interest in one.
For alphazero, I want to point out that it was announced 6 years ago (infinity by AI scale), and from my understanding we still don’t have a 1000x faster version, despite much interest in one.
I don’t know the details, but whatever the NN thing (derived from Lc0, a clone of AlphaZero) inside current Stockfish is can play on a laptop GPU.
And even if AlphaZero derivatives didn’t gain 3OOMs by themselves it doesn’t update me much that that’s something particularly hard. Google itself has no interest at improving it further and just moved on to MuZero, to AlphaFold etc.
So far as I know, it is not the case that OpenAI had a slower-but-equally-functional version of GPT4 many months before announcement/release. What they did have is GPT4 itself, months before; but they did not have a slower version. They didn’t release a substantially distilled version. For example, the highest estimate I’ve seen is that they trained a 2-trillion-parameter model. And the lowest estimate I’ve seen is that they released a 200-billion-parameter model. If both are true, then they distilled 10x… but it’s much more likely that only one is true, and that they released what they trained, distilling later. (The parameter count is proportional to the inference cost.)
Previously, delays in release were believed to be about post-training improvements (e.g. RLHF) or safety testing. Sure, there were possibly mild infrastructure optimizations before release, but mostly to scale to many users; the models didn’t shrink.
This is for language models. For alphazero, I want to point out that it was announced 6 years ago (infinity by AI scale), and from my understanding we still don’t have a 1000x faster version, despite much interest in one.
I don’t know the details, but whatever the NN thing (derived from Lc0, a clone of AlphaZero) inside current Stockfish is can play on a laptop GPU.
And even if AlphaZero derivatives didn’t gain 3OOMs by themselves it doesn’t update me much that that’s something particularly hard. Google itself has no interest at improving it further and just moved on to MuZero, to AlphaFold etc.