Steven Byrnes comments on Reply to Paul Christiano on Inaccessible Information

Steven Byrnes 6 Jun 2020 2:29 UTC
LW: 8 AF: 5
AF

finding a solution to the design problem for intelligent systems that does not rest on a blind search for policies that satisfy some evaluation procedure

I’m a bit confused by this. If you want your AI to come up with new ideas that you hadn’t already thought of, then it kinda has to do something like running a search over a space of possible ideas. If you want your AI to understand concepts that you don’t already have yourself and didn’t put in by hand, then it kinda has to be at least a little bit black-box-ish.

In other words, let’s say you design a beautiful AGI architecture, and you understand every part of it when it starts (I’m actually kinda optimistic that this part is possible), and then you tell the AGI to go read a book. After having read that book, the AGI has morphed into a new smarter system which is closer to “black-box discovered by a search process” (where the learning algorithm itself is the search process).

Right? Or sorry if I’m being confused.
- Alex Flint 6 Jun 2020 14:40 UTC
  LW: 5 AF: 2
  AF Parent
  Thanks for this question. No you’re not confused!
  
  There are two levels of search that we need to think about here: at the outer level, we use machine learning to search for an AI design that works at all. Then, at the inner level, when we deploy this AI into the world, it most likely uses search to find good explanations of its sensor data (i.e. to understand things that we didn’t put in by hand) and most likely also uses search to find plans that lead to fulfilment of its goals.
  
  It seems to me that design at least needs to be part of the story for how we do the outer-level construction of a basic AI architecture. Any good architecture very likely then uses search in some way at the inner level.
  
  Evan wrote a great sequence about inner and outer optimization
  - Steven Byrnes 6 Jun 2020 17:17 UTC
    LW: 2 AF: 1
    AF Parent
    OK, well I spend most of my time thinking about a particular AGI architecture (1 2 etc.) in which the learning algorithm is legible and hand-coded … and let me tell you, in that case, all the problems of AGI safety and alignment are still really really hard, including the “inaccessible information” stuff that Paul was talking about here.
    
    If you’re saying that it would be even worse if, on top of that, the learning algorithm itself is opaque, because it was discovered from a search through algorithm-space … well OK, yeah sure, that does seem even worse.