ozziegooen comments on Understanding Iterated Distillation and Amplification: Claims and Oversight

ozziegooen 20 Apr 2018 1:33 UTC
1 point
I think the distinction in this paragraph is quite messy on inspection:
I think that IDA with a low bandwidth overseer is not accurately described as “AI learns to reason from humans”, rather more “Humans figure out how to reason explicitly, then the AI learns from the explicit reasoning”. As Wei Dai has pointed out, amplified low bandwidth oversight will not actually end up reasoning like a human. Humans have implicit knowledge that helps them perform tasks when they see the whole task. But not all of this knowledge can be understood and break into smaller pieces. Low bandwidth oversight requires that the overseer not use any of this knowledge.
First, I think that even if one were to split tasks into 15-minute increments, there would have to be a lot of explicit reasoning (especially to scale to arbitrarily-high quality levels with extra resources.)
Second, the issue really shouldn’t be quality here. If it comes out that “implicit knowledge” creates demonstrably superior results (in any way) compared with “explicit work”, and this cannot be fixed or minimized with scale, that seems like a giant hole in the general feasibility of HCH.
Third, humans already use a lot of explicit tasks. Is a human that breaks down a math problem acting “less like a human”, or sacrificing anything, outside of one who solves it intuitively? There are some kinds of “lossy” explicit reasoning, but other kinds seem superior to me to the ways that humans do things.
I generally expect that “explicit reasoning that humans decide is better than their own intuitive work”, generally is better, according to human values.
- Wei Dai 20 Apr 2018 3:29 UTC
  4 points
  Parent
  I think you’re right that it’s not a binary HBO=human like, LBO=not human like, but rather that as the overseer’s bandwidth goes down from an unrestricted human to HBO to LBO, our naive intuitions about what a large group of humans can do become less and less applicable, so we need to use explicit reasoning and/or develop new intuitions about what HBO-based and especially LBO-based IDA is ultimately capable of.
  
  I generally expect that “explicit reasoning that humans decide is better than their own intuitive work”, generally is better, according to human values.
  
  Sure, I agree with this, but in Paul’s scheme you may be forced to use explicit reasoning even if you don’t think it’s better than intuitive work, so I’m not sure what the relevance of this statement is.
  
  EDIT: Also (in case you’re objecting to this part) I think in the case of LBO, the explicit reasoning is at such a small granularity (see Paul’s example here) that calling it “not human like” is justified and can perhaps help break people out of a possible complacency where they’re thinking of IDA as roughly like a very large group of humans doing typical human reasoning/deliberation.
  - ozziegooen 20 Apr 2018 4:01 UTC
    3 points
    Parent
    [Note: I probably like explicit reasoning a lot more than most people, so keep that in mind.]
    “our naive intuitions about what a large group of humans can do become less and less applicable, so we need to use explicit reasoning and/or develop new intuitions about what HBO-based and especially LBO-based IDA is ultimately capable of.”
    One great things about explicit reasoning is that it does seem easier to reason/develop intuitions about than human decisions made in more intuitive ways (to me). It’s different, but I assume far more predictable. I think it could be worth outlining what the most obvious forms of explicit reasoning that will happen are at some point (I’ve been thinking about this.)
    To me it seems like there are two scenarios:
    1. Group explicit reasoning cannot perform as well as longer individual intuitive reasoning, even with incredibly large budgets.
    2. Group explicit reasoning outperforms other reasoning.
    I believe that if #1 is one of the main cruxes towards the competitiveness and general viability of IDA. If it’s not true, then the whole scheme has some serious problems.
    But in the case of #2, it seems like things are overall in a much better place. The system may cost a lot (until it can be distilled enough), but it could be easier to reason about, easier to predict, and able to produce superior results on essentially all axis.
    - Wei Dai 20 Apr 2018 6:29 UTC
      7 points
      Parent
      
      [Note: I probably like explicit reasoning a lot more than most people, so keep that in mind.]
      
      Great! We need more people like you to help drive this forward. For example I think we desperately need explicit, worked out examples of meta-execution (see my request to Paul here) but Paul seems too busy to do that (the reply he gave wasn’t really at the level of detail and completeness that I was hoping for) and I find it hard to motivate myself to do it because I expect that I’ll get stuck at some point and it will be hard to tell if it’s because I didn’t try hard enough, or don’t have the requisite skills, or made a wrong decision several branches up, or if it’s just impossible.
      
      One great things about explicit reasoning is that it does seem easier to reason/develop intuitions about than human decisions made in more intuitive ways (to me).
      
      That’s an interesting perspective. I hope you’re right. :)
      - ozziegooen 1 May 2018 23:42 UTC
        4 points
        Parent
        After some more reflection on this, I’m now more in the middle. I’ve come to think more and more that these tasks will be absolutely tiny; and if so, they will need insane amounts of structure. Like, the system may need to effectively create incredibly advanced theories of expected value, and many of these many be near incomprehensible to us humans, perhaps even with a very large amount of explanation.
        I intend to spend some time writing more of how I would expect a system like this to work in practice.
  - ozziegooen 20 Apr 2018 3:49 UTC
    2 points
    Parent
    When you say “better”, do you mean in an empirical way that will be apparent with testing? If so, would it result in answers that are rated worse by users, or in answers that are less corrigible/safe?
    My personal guess is that explicit reasoning with very large budgets can outperform reasonably-bound (or maybe even largely-bound) systems using intuitive approaches, on enough types of problems for this approach to be very considerable.