dhar174

Karma: 2

dhar174 Apr 8, 2023, 5:24 PM
1 point
0
in reply to: GödelPilled’s comment on: GPT-4 Specs: 1 Trillion Parameters?
You’re missing the possibility that parameters during training were larger than models used for inference. It is common practice now to train large, then distill into a series of smaller models that can be used based on the task need.

dhar174 Feb 22, 2023, 2:06 AM
3 points
4
on: The idea that ChatGPT is simply “predicting” the next word is, at best, misleading
To those that believe language models do not have internal representations of concepts:
I can help at least partially disprove the assumptions behind that.
There is convincing evidence otherwise, as demonstrated through an Othello in an actual experiment:
https://thegradient.pub/othello/ The researchers conclusion:
“Our experiment provides evidence supporting that these language models are developing world models and relying on the world model to generate sequences.” )