Vaniver comments on Can you get AGI from a Transformer?

Vaniver 23 Jul 2020 15:40 UTC
LW: 17 AF: 10
AF
There are types of information processing that cannot be cast in the form of Deep Neural Net (DNN)-type calculations (matrix multiplications, ReLUs, etc.), except with an exorbitant performance penalty.
Sure… but humans can’t do those either, without an exorbitant performance penalty! Does this imply that humans alone aren’t general intelligences (and thus the threshold we should be worried about is lower), or that they’re not actually important for general intelligence?
will eventually be surpassed in AGI-type capabilities by a different kind of information processing
“Surpassed” seems strange to me; I’ll bet that the first AGI system will have a very GPT-like module, that will be critical to its performance, that will nevertheless not be “the whole story.” Like, by analogy to AlphaGo, the interesting thing was the structure they built around the convnets, but I don’t think it would have worked nearly as well without the convnets.
- Steven Byrnes 23 Jul 2020 16:02 UTC
  LW: 9 AF: 3
  AF Parent
  
  Sure… but humans can’t do those either, without an exorbitant performance penalty!
  
  Well, a big part of this post is an argument that the human neocortex is doing a different type of information processing than a DNN, with the neocortex’s algorithms being more similar to the algorithms underlying probabilistic programming, message-passing, etc. Therefore I don’t accept the premise that in general, if a DNN can’t do a certain type of information processing efficiently, then neither can the human brain.
  
  Do you think DNNs and human brains are doing essentially the same type of information processing? If not, how did you conclude “humans can’t do those either”? Thanks!
  - Vaniver 30 Apr 2021 14:29 UTC
    LW: 3 AF: 1
    AF Parent
    Do you think DNNs and human brains are doing essentially the same type of information processing? If not, how did you conclude “humans can’t do those either”? Thanks!
    Sorry for the late reply, but I was talking from personal experience. Multiplying matrices is hard! Even for extremely tiny ones, I was sped up tremendously by pencil and paper. It was much harder than driving a car, or recognizing whether a image depicts a dog or not. Given the underlying computational complexity of the various tasks, I can only conclude that I’m paying an exorbitant performance penalty for the matmul. (And I’m in the top few percentiles of calculation ability, so this isn’t me being bad at it by human standards.)
    The general version of this is Moravec’s Paradox.
    [edit] Also if you look at the best training I’m aware of to solve a simpler arithmetic problems (the mental abacus method), it too demonstrates this sort of exorbitant performance penalty. They’re exapting the ability to do fine motions in 3d space to multiply and add!
    - Steven Byrnes 30 Apr 2021 15:55 UTC
      LW: 3 AF: 1
      AF Parent
      Oh OK I think I misunderstood you.
      So the context was: I think there’s an open question about the extent to which the algorithms underlying human intelligence in particular, and/or AGI more generally, can be built from operations similar to matrix multiplication (and a couple other operations). I’m kinda saying “no, it probably can’t” while the scaling-is-all-you-need DNN enthusiasts are kinda saying “yes, it probably can”.
      Then your response is that humans can’t multiply matrices in their heads. Correct? But I don’t think that’s relevant to this question. Like, we don’t have low-level access to our own brains. If you ask GPT-3 (through its API) to simulate a self-attention layer, it wouldn’t do particularly well, right? So I don’t think it’s any evidence either way.
      “Surpassed” seems strange to me; I’ll bet that the first AGI system will have a very GPT-like module, that will be critical to its performance, that will nevertheless not be “the whole story.” Like, by analogy to AlphaGo, the interesting thing was the structure they built around the convnets, but I don’t think it would have worked nearly as well without the convnets.
      I dunno, certainly that’s possible, but also sometimes new algorithms outright replace old algorithms. Like GPT-3 doesn’t have any LSTM modules in it, let alone HHMM modules, or syntax tree modules, or GOFAI production rule modules. :-P
      - Vaniver 30 Apr 2021 16:58 UTC
        LW: 3 AF: 1
        AF Parent
        Ah, I now suspect that I misunderstood you as well earlier: you wanted your list to be an example of “what you mean by DNN-style calculations” but I maybe interpreted as “a list of things that are hard to do with DNNs”. And under that reading, it seemed unfair because the difficulty that even high-quality DNNs have in doing simple arithmetic is mirrored by the difficulty that humans have in doing simple arithmetic.
        Similarly, I agree with you that there are lots of things that seem very inefficient to implement via DNNs rather than directly (like MCTS, or simple arithmetic, or so on), but it wouldn’t surprise me if it’s not that difficult to have a DNN-ish architecture that can more easily implement MCTS than our current ones. The sorts of computations that you can implement with transformers are more complicated than the ones you could implement with convnets, which are more complicated than the ones you could implement with fully connected nets; obviously you can’t gradient descent a fully connected net into a convnet, or a convnet into a transformer, but you can still train a transformer with gradient descent.
        It’s also not obvious to me that humans are doing the more sophisticated thinking ‘the smart way’ instead of ‘the dumb way’. Suppose our planning algorithms are something like MCTS; is it ‘coded in directly’ like AlphaGo’s, or is it more like a massive transformer that gradient-descented its way into doing something like MCTS? Well, for things like arithmetic and propositional logic, it seems pretty clearly done ‘the dumb way’, for things like planning and causal identification it feels more like an open question, and so I don’t want to confidently assert that our brains are doing it the dumb way. My best guess is they have some good tricks, but won’t be ‘optimal’ according to future engineers who understand all of this stuff.
        Steven Byrnes 30 Apr 2021 17:44 UTC
        LW: 3 AF: 1
        AF Parent
        I slightly edited that section header to make it clearer what the parenthetical “(matrix multiplications, ReLUs, etc.)” is referring to. Thanks!
        I agree that it’s hard to make highly-confident categorical statements about all current and future DNN-ish architectures.
        I don’t think the human planning algorithm is very much like MCTS, although you can learn to do MCTS (just like you can learn to mentally run any other algorithm—people can learn strategies about what thoughts to think, just like they can strategies about what actions to execute). I think the built-in capability is that compositional-generative-model-based processing I was talking about in this post.
        Like, if I tell you “I have a banana blanket”, you have a constraint (namely, I just said that I have a banana blanket) and you spend a couple seconds searching through generative models until you find one that is maximally consistent with both that constraint and also all your prior beliefs about the world. You’re probably imagining me with a blanket that has pictures of bananas on it, or less likely with a blanket made of banana peels, or maybe you figure I’m just being silly.
        So by the same token, imagine you want to squeeze a book into a mostly-full bag. You have a constraint (the book winds up in the bag), and you spend a couple seconds searching through generative models until you find one that’s maximally consistent with both that constraint and also all your prior beliefs and demands about the world. You imagine a plausible way to slide the book in without ripping the bag or squishing the other content, and flesh that out into a very specific action plan, and then you pick the book up and do it.
        When we need a multi-step plan, too much to search for in one go, we start needing to also rely on other built-in capabilities like chunking stuff together into single units, analogical reasoning (which is really just a special case of compositional-generative-model-based processing), and RL (as mentioned above, RL plays a role in learning to use metacognitive problem-solving strategies). Maybe other things too.
        I don’t think causality per se is a built-in feature, but I think it comes out pretty quickly from the innate ability to learn (and chunk) time-sequences, and then incorporate those learned sequences into the compositional-generative-model-based processing framework. Like, “I swing my foot and then kick the ball and then the ball is flying away” is a memorized temporal sequence, but it’s also awfully close to a causal belief that “kicking the ball causes it to fly away”. (...at least in conjunction with a second memorized temporal sequence where I don’t kick the ball and it just stays put.) (See also counterfactuals.)
        I’m less confident about any of this than I sound :)