tailcalled comments on The surprising parameter efficiency of vision models

tailcalled 8 Apr 2023 21:23 UTC
6 points
4
4.) Our assessment of LLM abilities is wrong and existing LLMs are just vastly superhuman and GPT-2 style models are actually at human parity. This seems strongly unlikely from actually interacting with these models, but on the other hand, even GPT-2 models possess a lot of arcane knowledge which is superhuman and it may be that the very powerful cognition of these small models is just smeared across such a wide range of weird internet data that it appears much weaker than us in any specific facet. Intuitively, this would be that a human and GPT-2 possess the same ‘cognitive/linguistic power’ but that since GPT-2′s cognition is spread over a much wider data range than a human, it’s ‘linguistic power density’ is lower and therefore appears much less intelligent in the much smaller human-relevant domain in which we test it. I am highly unclear whether these concepts are actually correct or a useful frame through which to view things.
I think LLMs are great and plausibly superhuman at language, it’s just that we don’t want them to do language, we want them to do useful real-world tasks, and hijacking a language model to do useful real-world tasks is hilariously inefficient.
If you consider pure language tasks like “Here’s some information in format X, please reshuffle it to the equivalent in format Y”, then GPT-4 seems vastly superhuman. (I’m somewhat abusing terms here since the “language” task of reshuffling information is somewhat different than the “language” task of autoregressively predicting information, but I think they are probably way more closely related tasks than if you want to apply it to something useful? Idk.) Can’t remember anything about how good GPT-2 was at this, not sure I even bothered to try it.
- DragonGod 8 Apr 2023 21:34 UTC
  8 points
  0
  Parent
  IIRC Redwood research investigated human performance on next token prediction, and humans were mostly worse than even small (by current standards) language models?
  - the gears to ascension 9 Apr 2023 18:20 UTC
    4 points
    0
    Parent
    sounds right, where “worse” here means “higher bit per word at predicting an existing sentence”, a very unnatural metric humans don’t spend significant effort on.
    - jacob_cannell 9 Apr 2023 20:47 UTC
      4 points
      2
      Parent
      That is actually a natural metric for the brain and close to what the linguistic cortex does internally. The comparison of having a human play a word prediction game and comparing logit scores of that to the native internal logit predictions of an LLM is kinda silly. The real comparison should be between a human playing that game and LLM playing the exact same game in the exact same way (ie asking GPT verbally to predict the logit score of the next word/token), or you should comapre internal low level transformer logit scores to linear readout models from brain neural probes/scans.
      - the gears to ascension 9 Apr 2023 21:07 UTC
        2 points
        0
        Parent
        oh interesting point, yeah.
- quanticle 9 Apr 2023 10:22 UTC
  2 points
  −5
  Parent
  
  I think LLMs are great and plausibly superhuman at language
  
  I think the problem might be that “language” encompasses a much broader variety of tasks than image generation. For example, generating poetry with a particular rhyming structure or meter seems to be a pretty “pure” language task, yet even GPT-4 struggles with it. Meanwhile, diffusion models with a quarter of the parameter count of GPT-4 can output art in a dizzying variety of styles, from Raphael-like neoclassical realism to Picasso-like cubism.