ryan_greenblatt comments on Possible miracles

ryan_greenblatt 9 Oct 2022 22:13 UTC
6 points
2

Interpretability might become easier as networks get closer to human level because they start using human abstractions.

Language models are already superhuman at next token prediction

I have some reasons for being optimistic about ‘white box heuristic reasoning’ (humans understanding models is a special case of this), but models becoming easier to understand as they get bigger isn’t one of them.
- jacob_cannell 10 Oct 2022 6:25 UTC
  5 points
  6
  Parent
  That’s not really the correct comparison. The correct comparison is neural outputs of linguistic cortex vs LLM neural outputs, because the LLM isnt’ having to learn a few shot mouse/keyboard minigame like the human is.
  - Quintin Pope 10 Oct 2022 9:14 UTC
    3 points
    −1
    Parent
    Humans do not have direct access to the implicit predictions of their brain’s language centers, any more than the characters simulated by a language model have access to the language model’s token probabilities.
    Really, the correct comparison is something like asking the LLM to make a zero shot prediction of the form:
    Consider the following sentence; “I am a very funny _”
    What word seems most likely to continue the sentence?
    Answer:
    I expect LLMs to do much worse when prompted like this, though I haven’t done the experiment myself.
    - jacob_cannell 10 Oct 2022 16:15 UTC
      6 points
      1
      Parent
      
      Humans do not have direct access to the implicit predictions of their brain’s language centers,
      
      But various other human brain modules do have direct access to the outputs of linguistic cortex, and that is the foundation of most of our linguistic abilities, which surpass those of LLM in many ways.
      
      Human linguistic cortex learns via word/token prediction, just like LLMs.
      Human linguistic cortical outputs are the foundation for various linguistic abilities, performance of which follows on performance on 1.
      Humans generally outperform LLMs on most downstream linguistic tasks.
      
      I’m merely responding to this statement:
      
      Language models are already superhuman at next token prediction
      
      Which is misleading—LLMs are superhuman than humans at the next token prediction game, but that does not establish that LLMs are superhuman than human linguistic cortex (establishing that would require comparing neural readouts)
    - Tao Lin 10 Oct 2022 16:23 UTC
      3 points
      0
      Parent
      I don’t think this sort of prompt actually gets at the conscious reasoning gap. It only takes one attention head to copy the exact next token prediction made at a previous token, and I’d expect if you used few shot prompting (especially filling the entire context with few shot prompts), it would use its induction-like heads to just copy its predictions and perform quite well.
      A better example would be to have the model describe its reasoning about predicting the next token, and then pass that to itself in an isolated prompt to predict the next token.
    - Leksu 10 Oct 2022 19:00 UTC
      2 points
      0
      Parent
      Here’s what GPT-3 output for me
      - Quintin Pope 10 Oct 2022 20:34 UTC
        2 points
        0
        Parent
        It’s distribution over continuations for the sentence itself is broader:
        I’d have expected it to become less confident of its answer when asked verbally.
    - ryan_greenblatt 10 Oct 2022 16:02 UTC
      2 points
      0
      Parent
      This sort of prompt shows up in the corpus and when it does it implies a different token distribution for the _ than the typical distribution on the corpus. Ofc, you could make the model quite good at prompts like this via finetuning.
  - ryan_greenblatt 10 Oct 2022 15:59 UTC
    2 points
    0
    Parent
    Imo, it is reasonably close to the right comparison for thinking about humans understanding how LLMs work (I make no claims about this being a reasonable comparison for other things). We care about how humans perform using conscious reasoning.
    
    Similarly, I’d claim that trying to do interpretability on your own linguistic cortex is made difficult by the fact the the linguistic cortex (probably) implicitly represents probability distributions over language which are much better than those that you can conciously compute.
    - ryan_greenblatt 10 Oct 2022 16:09 UTC
      1 point
      0
      Parent
      More generally, it’s worth thinking about the conscious reasoning gap—this gap happens to be smaller in vision for various reasons.
      
      This gap will also ofc exist in language models trying to interpret themselves, but fine-tuning might be very helpful for at least partially removing this gap.
      - Tao Lin 10 Oct 2022 16:28 UTC
        2 points
        2
        Parent
        isn’t this about generation vs classification, not language vs vision?