Wei Dai comments on Outer alignment and imitative amplification

Wei Dai 4 Feb 2020 1:50 UTC
LW: 5 AF: 4
AF

I think there are lots of very valid reasons for thinking that HCH is not competitive—I only said I was skeptical of the reasons for thinking it wouldn’t be aligned.

But if you put aside competitiveness, can’t HCH be trivially aligned? E.g., you give the humans making up HCH instructions to cause it to not be able to answer anything except simple arithmetic questions. So it seems that a claim of HCH being aligned is meaningless unless the claim is about being aligned at some level of competitiveness.
- evhub 4 Feb 2020 6:18 UTC
  LW: 2 AF: 1
  AF Parent
  That’s a good point. What I really mean is that I think the sort of HCH that you get out of taking actual humans and giving them careful instructions is more likely to be uncompetitive than it is to be unaligned. Also, I think that “HCH for a specific H” is more meaningful than “HCH for a specific level of competitiveness,” since we don’t really know what weird things you might need to do to produce an HCH with a given level of competitiveness.