Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
This non-news seems like it might be the biggest news in the announcement? OpenAI is saying “oops publishing everything was too open, its gonna be more of a black box now”.
If you were willing to hypothesize a specific scaling law, sure. But it seems like the only safe one to hypothesize is ‘better than Scaling Transformer/Chinchilla/zero-shot’.
Better meaning more capability per unit of compute? If so, how can we be confident that it’s better than Chinchilla?
I can see an argument that it should be at least as good — if they were throwing so much money at it, they would surely do what is currently known best practice. But is there evidence to suggest that they figured out how to do things more efficiently than had ever been done before?
OA: https://cdn.openai.com/papers/gpt-4.pdf#page=2
This non-news seems like it might be the biggest news in the announcement? OpenAI is saying “oops publishing everything was too open, its gonna be more of a black box now”.
You couldn’t make it up.
Could we infer parameters’ number from scaling laws?
If you were willing to hypothesize a specific scaling law, sure. But it seems like the only safe one to hypothesize is ‘better than Scaling Transformer/Chinchilla/zero-shot’.
Better meaning more capability per unit of compute? If so, how can we be confident that it’s better than Chinchilla?
I can see an argument that it should be at least as good — if they were throwing so much money at it, they would surely do what is currently known best practice. But is there evidence to suggest that they figured out how to do things more efficiently than had ever been done before?