dhar174 comments on GPT-4 Specs: 1 Trillion Parameters?

dhar174 8 Apr 2023 17:24 UTC
1 point
0
You’re missing the possibility that parameters during training were larger than models used for inference. It is common practice now to train large, then distill into a series of smaller models that can be used based on the task need.