GPT-4 is the model that has been trained with the most training compute which suggests that compute is the most important factor for capabilities. If that wasn’t true, we would see some other company training models with more compute but worse performance which doesn’t seem to be happening.
Falcon-180b illustrates how throwing compute at an LLM can result in unusually poor capabilities. Epoch’s estimate puts it close to Claude 2 in compute, yet it’s nowhere near as good. Then there’s the even more expensive PaLM 2, though since weights are not published, it’s possible that unlike with Falcon the issue is that only smaller, overly quantized, or incompetently tuned models are being served.
GPT-4 is the model that has been trained with the most training compute which suggests that compute is the most important factor for capabilities. If that wasn’t true, we would see some other company training models with more compute but worse performance which doesn’t seem to be happening.
Falcon-180b illustrates how throwing compute at an LLM can result in unusually poor capabilities. Epoch’s estimate puts it close to Claude 2 in compute, yet it’s nowhere near as good. Then there’s the even more expensive PaLM 2, though since weights are not published, it’s possible that unlike with Falcon the issue is that only smaller, overly quantized, or incompetently tuned models are being served.