evhub comments on Outer alignment and imitative amplification

evhub 3 Feb 2020 21:19 UTC
LW: 2 AF: 1
AF
Another thing that maybe I didn’t make clear previously:

I believe the point about Turing machines was that given Low Bandwidth Overseer, it’s not clear how to get HCH/IA to do complex tasks without making it instantiate arbitrary Turing machines.

I agree, but if you’re instructing your humans not to instantiate arbitrary Turing machines, then that’s a competitiveness claim, not an alignment claim. I think there are lots of very valid reasons for thinking that HCH is not competitive—I only said I was skeptical of the reasons for thinking it wouldn’t be aligned.
- Wei Dai 4 Feb 2020 1:50 UTC
  LW: 5 AF: 4
  AF Parent
  
  I think there are lots of very valid reasons for thinking that HCH is not competitive—I only said I was skeptical of the reasons for thinking it wouldn’t be aligned.
  
  But if you put aside competitiveness, can’t HCH be trivially aligned? E.g., you give the humans making up HCH instructions to cause it to not be able to answer anything except simple arithmetic questions. So it seems that a claim of HCH being aligned is meaningless unless the claim is about being aligned at some level of competitiveness.
  - evhub 4 Feb 2020 6:18 UTC
    LW: 2 AF: 1
    AF Parent
    That’s a good point. What I really mean is that I think the sort of HCH that you get out of taking actual humans and giving them careful instructions is more likely to be uncompetitive than it is to be unaligned. Also, I think that “HCH for a specific H” is more meaningful than “HCH for a specific level of competitiveness,” since we don’t really know what weird things you might need to do to produce an HCH with a given level of competitiveness.