Rohin Shah comments on Reply to Paul Christiano on Inaccessible Information

Rohin Shah 7 Jun 2020 18:22 UTC
LW: 3 AF: 2
AF
I’m confused by this distinction.
I can see why you’d say AlphaZero has more of a “design” element than MuZero, because of the MCTS. But if on some absolute scale you say that AlphaZero is a design / search hybrid, then presumably you should also say the OpenAI Five is a design / search hybrid, since it uses PPO at the outer layer, which is a designed algorithm. This seems wrong. (Also, it seems like many current proposals for building AGI out of ML would be classified as design / search hybrids.)
Maybe the distinction is that AlphaZero uses MCTS at test time? Would AlphaZero without MCTS at test time be only search? (Aside: it’s not great that when we remove Monte Carlo Tree Search we’re now saying that the design is gone and only search remains)
More generally, I don’t see how AlphaZero is making any headway on this problem:
It seems to me that Christiano’s write-up is a fairly general and compelling knock-down of the black-box approach to design in which we build an evaluation procedure and then rely on search to find a policy that our evaluation procedure ranks highly.
- Vaniver 7 Jun 2020 21:24 UTC
  LW: 8 AF: 5
  AF Parent
  But if on some absolute scale you say that AlphaZero is a design / search hybrid, then presumably you should also say the OpenAI Five is a design / search hybrid, since it uses PPO at the outer layer, which is a designed algorithm. This seems wrong.
  I think I’m willing to bite that bullet; like, as far as we know the only stuff that’s “search all the way up” is biological evolution.
  But ‘hybrid’ seems a little strange; like, I think design normally has search as a subcomponent (in imaginary space, at least, and I think often also search through reality), and so in some sense any design that isn’t a fully formed vision from God is a design/search hybrid. (If my networks use RELU activations ‘by design’, isn’t that really by the search process of the ML community as a whole? And yet it’s still useful to distinguish networks which determine what nonlinearity to use from local data, which which networks have it determined for them by an external process, which potentially has a story for why that’s the right thing to do.)
  Total horse takeover seems relevant as another way to think about intervening to ‘control’ things at varying levels of abstraction.
  [The core thing about design that seems important and relevant here is that there’s a “story for why the design will work”, whereas search is more of an observational fact of what was out there when you looked. It seems like it might be easier to build a ‘safe design’ out of smaller sub-designs, whereas trying to search for a safe algorithm using search runs into all the anthropic problems of empiricism.]