It’s unbelievable how similar/convergent the big LLMs are. Only a slight improvement with 100x compute?? People have much bigger differences with much less variation of the core inputs (eg number of neurons). I wonder what the best explanation is. I can think of a few mediocre explanations.
It’s unbelievable how similar/convergent the big LLMs are. Only a slight improvement with 100x compute?? People have much bigger differences with much less variation of the core inputs (eg number of neurons). I wonder what the best explanation is. I can think of a few mediocre explanations.