Rohin Shah comments on Some Arguments Against Strong Scaling

Rohin Shah 19 Jan 2023 11:29 UTC
LW: 2 AF: 2
0
AF
However, I don’t see why these arguments would apply to humans
Okay, I’ll take a stab at this.
6. Word Prediction is not Intelligence
“The kinds of humans that we are worried about are the kinds of humans that can do original scientific research and autonomously form plans for taking over the world. Human brains learn to take actions and plans that previously led to high rewards (outcomes like eating food when hungry, having sex, etc)*. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former?”
*I expect that this isn’t a fully accurate description of human brains, but I expect that if we did write the full description the argument would sound the same.
7. The Language of Thought
“This, in turn, suggests a data structure that is discrete and combinatorial, with syntax trees, etc, and humans do (according to the argument) not use such representations. We should therefore expect humans to at some point hit a wall or limit to what they are able to do.”
(I find it hard to make the argument here because there is no argument—it’s just flatly asserted that neural networks don’t use such representations, so all I can do is flatly assert that humans don’t use such representations. If I had to guess, you would say something like “matrix multiplications don’t seem like they can be discrete and combinatorial”, to which I would say “the strength of brain neuron synapse firings doesn’t seem like it can be discrete and combinatorial”.)
8. Programs vs Circuits
We know, from computer science, that it is very powerful to be able to reason in terms of variables and operations on variables. It seems hard to see how you could have human-level intelligence without this ability. However, humans do not typically have this ability, with most human brains instead being more analogous to Boolean circuits, given their finite size and architecture of neuron connections.
9. Generalisation vs Memorisation
In this one I’d give the protein folding example, but apparently you think you’d be able to fold proteins just as well as you’d be able to play chess if they had similar state space sizes, which seems pretty wild to me.
Do you perhaps agree that you would have a hard time navigating in a 10-D space? Clearly you have simply memorized a bunch of heuristics that together are barely sufficient for navigating 3-D space, rather than truly understanding the underlying algorithm for navigating spaces.
10. Catastrophic Forgetting
(Discussed previously, I think humans are not very deliberate / selective about what they do / don’t forget, except when they use external tools.)
In some other parts, I feel like in many places you are being one-sidedly skeptical, e.g.
- “In particular, they could be much more shallow than they seem.”
  - They could also be much more general than they seem.
- I don’t rule out the possibility, but it seems unlikely that such a system could learn representations and circuits that would enable sufficiently strong out-of-distribution generalisation.
  - Perhaps it would enable even stronger OOD generalisation than we have currently.
There could be good reasons for being one-sidedly skeptical, but I think you need to actually say what the reasons are. E.g. I directionally agree with you on the random forests case, but my reason for being one-sidedly skeptical is “we probably would have noticed if random forests generalized better and used them instead of neural nets, so probably they don’t generalize better”. Another potential argument is “decision trees learn arbitrary piecewise linear decision boundaries, whereas neural nets learn manifolds, reality seems more likely to be the second one” (tbc I don’t necessarily agree with this).
- Joar Skalse 23 Jan 2023 13:05 UTC
  LW: 1 AF: 1
  0
  AF Parent
  “”
  The kinds of humans that we are worried about are the kinds of humans that can do original scientific research and autonomously form plans for taking over the world. Human brains learn to take actions and plans that previously led to high rewards (outcomes like eating food when hungry, having sex, etc)*. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former?”
  “”
  
  This feels like a bit of a digression, but we do have concrete examples of systems that are good at eating food when hungry, having sex, and etc, without being able to do original scientific research and autonomously form plans for taking over the world; animals. And the difference between humans and animals isn’t just that humans have more training data (or even that we are that much better at survival and reproduction in the environment of evolutionary adaptation). But I should also note that I consider argument 6 to be one of the weaker arguments I know of.
  
  ””
  We know, from computer science, that it is very powerful to be able to reason in terms of variables and operations on variables. It seems hard to see how you could have human-level intelligence without this ability. However, humans do not typically have this ability, with most human brains instead being more analogous to Boolean circuits, given their finite size and architecture of neuron connections.
  ”“
  
  The fact that human brains have a finite size and architecture of neuron connections does not mean that they are well-modelled as Boolean circuits. For example, a (real-world) computer is better modelled as a Turing machine than as a finite-state automaton, even though there is a sense in which they actually are finite-state automata.
  
  The brain is made out of neurons, yes, but it matters a great deal how those neurons are connected. Depending on the answer to that question, you could end up with a system that behaves more like Boolean circuits, or more like a Turing machine, or more like something else.
  
  With neural networks, the training algorihtm and the architecture together determine how the neurons end up connected, and therefore, if the resulting system is better thought of as a Boolean circuit, or a Turing machine, or otherwise. If the wiring of the brain is determined by a different mechanism than what determines the wiring of a deep learning system, then the two systems could end up with very different properties, even if they are made out of similar kinds of parts.
  
  With the brain, we don’t know what determines the wiring. This makes it difficult to draw strong conclusions about the high-level behaviour of brains from their low-level physiology. With deep learning, it is easier to do this.
  
  ””
  I find it hard to make the argument here because there is no argument—it’s just flatly asserted that neural networks don’t use such representations, so all I can do is flatly assert that humans don’t use such representations. If I had to guess, you would say something like “matrix multiplications don’t seem like they can be discrete and combinatorial”, to which I would say “the strength of brain neuron synapse firings doesn’t seem like it can be discrete and combinatorial”.
  ”″
  
  What representations you end up with does not just depend on the model space, it also depends on the learning algorithm. Matrix multiplications can be discrete and combinatorial. The question is if those are the kinds of representations that you in fact would end up with when you train a neural network by gradient descent, which to me seems unlikely. The brain does (most likely) not use gradient descent, so the argument does not apply to the brain.
  “”
  Do you perhaps agree that you would have a hard time navigating in a 10-D space? Clearly you have simply memorized a bunch of heuristics that together are barely sufficient for navigating 3-D space, rather than truly understanding the underlying algorithm for navigating spaces.
  ”“
  
  It would obviously be harder for me to do this, and narrow heuristics are obviously an important part of intelligence. But I could do it, which suggests a stronger transfer ability than what would be suggested if I couldn’t do this.
  
  ””
  In some other parts, I feel like in many places you are being one-sidedly skeptical.
  ”″
  
  Yes, as I said, my goal with this post is not to present a balanced view of the issue. Rather, my goal is just to summarise as many arguments as possible for being skeptical of strong scaling. This makes the skepticism one-sided in some places.