nostalgebraist comments on chinchilla’s wild implications

nostalgebraist Jul 31, 2022, 3:56 PM
15 points
1
When you say “irreducible”, does that mean “irreducible under current techniques” or “mathematically irreducible”, or something else?
Closer to the former, and even more restrictive: “irreducible with this type of model, trained in this fashion on this data distribution.”
Because language is a communication channel, there is presumably also some nonzero lower bound on the loss that any language model could ever achieve. This is different from the “irreducible” term here, and presumably lower than it, although little is known about this issue.
Do we have any idea what a model with, say, 1.7 loss (i.e, a model almost arbitrarily big in compute and data, but with the same 1.69 irreducible) would look like?
Not really, although section 5 of this post expresses some of my own intuitions about what this limit looks like.
Keep in mind, also, that we’re talking about LMs trained on a specific data distribution, and only evaluating their loss on data sampled from that same distribution.
So if an LM achieved 1.69 loss on MassiveText (or a scaled-up corpus that looked like MassiveText in all respects but size), it would do very well at mimicking all the types of text present in MassiveText, but that does not mean it could mimic every existing kind of text (much less every conceivable kind of text).
- Yitz Jul 31, 2022, 6:06 PM
  1 point
  1
  Parent
  Do we have a sense of what the level of loss is in the human brain? If I’m understanding correctly, if the amount of loss in a model is known to be finitely large, then will be incapable of perfectly modeling the world on principle (implying that to such a model physics is non-computable?)
  - Lone Pine Jul 31, 2022, 10:24 PM
    2 points
    0
    Parent
    Theroetically we could measure it by having humans play “the language model game” where you try to predict the next word in a text, repeatedly. How often you would get the next word wrong is a function of your natural loss. Of course, you’d get better at this game as you went along, just like LMs do, so what we’d want to measure is how well you’d do after playing for a few days.
    
    There might have been a psychological study that resembles this. (I don’t know.) We could probably also replicate it via citizen science: create a website where you play this game, and get people to play it. My prediction is that DL LMs are already far superior to even the best humans at this game. (Note that this doesn’t mean I think DL is smarter than humans.)
    - Yitz Aug 1, 2022, 3:50 AM
      15 points
      0
      Parent
      Such a game already exists! See https://rr-lm-game.herokuapp.com/whichonescored2 and https://rr-lm-game.herokuapp.com/. I’ve been told humans tend to do pretty badly at the games (I didn’t do too well myself), so if you feel discouraged playing and want a similar style of game that’s perhaps a bit more fun (if slightly less relevant to the question at hand), I recommend https://www.redactle.com/. Regardless, I guess I’m thinking of loss (in humans) in the more abstract sense of “what’s the distance between the correct and human-given answer [to an arbitrary question about the real world]?” If there’s some mathematically necessary positive amount of loss humans must have at a minimum, that would seemingly imply that there are fundamental limits to the ability of human cognition to model reality.
      - Buck Aug 1, 2022, 8:14 PM
        7 points
        1
        Parent
        Yes, humans are way worse than even GPT-1 at next-token prediction, even after practicing for an hour.
        EDIT: These results are now posted here
        Yitz Aug 2, 2022, 4:41 AM
        3 points
        1
        Parent
        Is there some reasonable-ish way to think about loss in the domain(s) that humans are (currently) superior at? (This might be equivalent to asking for a test of general intelligence, if one wants to be fully comprehensive)
      - JBlack Aug 1, 2022, 8:58 AM
        4 points
        −2
        Parent
        The scoring for that first game is downright bizarre. The optimal strategy for picking probabilities does not reflect the actual relative likelihoods of the options, but says “don’t overthink it”. In order to do well, you must overthink it.
        Buck Aug 1, 2022, 8:13 PM
        6 points
        0
        Parent
        (I run the team that created that game. I made the guess-most-likely-next-token game and Fabien Roger made the other one.)
        The optimal strategy for picking probabilities in that game is to say what your probability for those two next tokens would have been if you hadn’t updated on being asked about them. What’s your problem with this?
        It’s kind of sad that this scoring system is kind of complicated. But I don’t know how to construct simpler games such that we can unbiasedly infer human perplexity from what the humans do.
        Yitz Aug 1, 2022, 5:54 PM
        2 points
        0
        Parent
        Yeah, if anyone builds a better version of this game, please let me know!