Taleuntum comments on The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

Taleuntum 21 Feb 2023 12:43 UTC
5 points
1
I think a key idea related to this topic and not yet mentioned in the comments (maybe because it is elementary?) is the probabilistic chain rule. A basic “theorem” of probability which, in our case, shows that the procedure of always sampling the next word conditioned on the previous words is mathematically equivalent to sampling from the joint of probability distribution of complete human texts. To me this almost fully explains why LLMs’ outputs seem to have been generated with global information in mind. What is missing is to see why our intuition of “merely” generating the next token differs from sampling from the joint distribution. My guess is that humans instinctively (but incorrectly) associate directional causality to conditional probability and because of this, it surprises us when we see dependencies running in the opposite direction in the generated text.
EDIT: My comment concerns transformer architectures, I don’t yet know how rlhf works.
- Bill Benzon 21 Feb 2023 13:10 UTC
  2 points
  0
  Parent
  Yeah, but all sorts of elementary things elude me. So thanks for the info.