Wei Dai comments on Understanding Iterated Distillation and Amplification: Claims and Oversight

Wei Dai 2 May 2018 0:35 UTC
10 points
It occurs to me that a hybrid approach combining both HBO and LBO is possible. One simple thing we can do is build an HBO-IDA, and ask it to construct a “core for reasoning” for LBO, or just let the HBO-IDA agent directly act as the overseer for LBO-IDA. This potentially gets around a problem where LBO-IDA would work, except that unenhanced humans are not smart enough to build or act as the overseer in such a scheme. I’m guessing there are probably cleverer things we can do along these lines.
- Liam Donovan 25 Oct 2019 9:12 UTC
  1 point
  Parent
  How does this address the security issues with HBO? Is the idea that only using the HBO system to construct a “core for reasoning” reduces the chances of failure by exposing it to less inputs/using it for less total time? I feel like I’m missing something...
  - William_S 3 Nov 2019 0:54 UTC
    5 points
    Parent
    I’d interpreted it as “using the HBO system to construct a “core for reasoning” reduces the chances of failure by exposing it to less inputs/using it for less total time”, plus maybe other properties (eg. maybe we could look at and verify an LBO overseer, even if we couldn’t construct it ourselves)
    - Liam Donovan 3 Nov 2019 16:57 UTC
      1 point
      Parent
      plus maybe other properties
      That makes sense; I hadn’t thought of the possibility that a security failure in the HBO tree might be acceptable in this context. OTOH, if there’s an input that corrupts the HBO tree, isn’t it possible that the corrupted tree could output a supposed “LBO overseer” that embeds the malicious input and corrupts us when we try to verify it? If the HBO tree is insecure, it seems like a manual process that verifies its output must be insecure as well.
      - William_S 30 Nov 2019 21:45 UTC
        2 points
        Parent
        One situation is: maybe an HBO tree of size 10^20 runs into a security failure with high probability, but an HBO tree of size 10^15 doesn’t and is sufficient to output a good LBO overseer.