Ivan Vendrov comments on Gradient descent doesn’t select for inner search

Ivan Vendrov 15 Aug 2022 23:50 UTC
3 points
2
I disagree that performing search is central to human capabilities relative to other species. The cultural intelligence hypothesis seems much more plausible: humans are successful because our language and ability to mimic allow us to accumulate knowledge and coordinate at massive scale across both space and time. Not because individual humans are particularly good at thinking or optimizing or performing search. (Not sure what the implications of this are for AI).
You’re right though, I didn’t say much about alternative algorithms other than point vaguely in the direction of hierarchical control. I mostly want to warn people not to reason about inner optimizers the way they would about search algorithms. But if it helps, I think AlphaStar is a good example of an algorithm that is superhuman in a very complex strategic domain but is very likely not doing anything like “evaluating many possibilities before settling on an action”. In contrast to AlphaZero (with rollouts), which considers tens of thousands of positions before selecting an action. AlphaZero (just the policy network) I’m more confused about… I expect it still isn’t doing search, but it is literally trained to imitate the outcome of a search so it might have similar mis-generalization properties?
- Lauro Langosco 16 Aug 2022 15:13 UTC
  3 points
  0
  Parent
  (Note that I’m not making a claim about how search is central to human capabilities relative to other species; I’m just saying search is useful in general. Plausibly also for other species, though it is more obvious for humans)
  
  From my POV, the “cultural intelligence hypothesis” is not a counterpoint to importance of search. It’s obvious that culture is important for human capabilities, but it also seems obvious to me that search is important. Building printing presses or steam engines is not something that a bundle of heuristics can do, IMO, without gaining those heuristics via a long process of evolutionary trial-and-error. And it seems important that humans can build steam engines without generations of breeding better steam-engine-engineers.
  
  Re AlphaStar and AlphaZero: I’ve never played Starcraft, so I don’t have good intuitions for what capabilities are needed. But on the definitions of search that I use, the AlphaZero policy network definitely performs search. In fact out of current systems it’s probably the one that most clearly performs search!
  
  ...Now I’m wondering whether our disagreement just comes from having different definitions of search in mind. Skimming your other comments above, it seems like you take a more narrow view of search = literally iterating through solutions and picking a good one. This is fine by me definitionally, but I don’t think the fact that models will not learn search(narrow) is very interesting for alignment, or has the implications that you list in the post? Though ofc I might still be misunderstanding you here.
  - Ivan Vendrov 16 Aug 2022 15:56 UTC
    1 point
    0
    Parent
    Yeah it’s probably definitions. With the caveat that I don’t mean the narrow “literally iterates over solutions”, but roughly “behaves (especially off the training distribution) as if it’s iterating over solutions”, like Abram Demski’s term selection.
- Vladimir_Nesov 16 Aug 2022 9:44 UTC
  2 points
  0
  Parent
  
  AlphaZero (just the policy network) I’m more confused about… I expect it still isn’t doing search, but it is literally trained to imitate the outcome of a search so it might have similar mis-generalization properties?
  
  This suggests that the choice of decision theory that amplifies a decision making model (in the sense of IDA/HCH, or just the way MCTS is used in training AlphaZero) might influence robustness of its behavior far off-distribution, even if its behavior around the training distribution is not visibly sensitive to choice of decision theory used for amplification.
  
  Though perhaps this sense of “robustness” is not very appropriate, and a better one should be explicitly based on reflection/extrapolation from behavior in familiar situations, with the expectation that all models fail to be robust sufficiently far off-distribution (in the crash space), and new models must always be prepared in advance of going there.
- Noosphere89 16 Aug 2022 0:08 UTC
  1 point
  1
  Parent
  My thinking is that one of the biggest reasons humans managed to dominate is basically 3x more brainpower combined with ways to get rid of the heat necessary to support brainpower, which requires sweating all over the body.
  
  Essentially it’s the scaling hypothesis applied to biological systems.
  
  And since intelligence can be used for any goal, it’s not surprising that intelligence’s main function was cultural.