paulfchristiano comments on [Link] A minimal viable product for alignment

paulfchristiano 9 Apr 2022 22:55 UTC
LW: 4 AF: 2
AF
It feels to me like there’s basically no question that recognizing good cryptosystems is easier than generating them. And recognizing attacks on cryptosystems is easier than coming up with attacks (even if they work by exploiting holes in the formalisms). And recognizing good abstract arguments for why formalisms are inadequate is easier than generating them. And recognizing good formalisms is easier than generating them.
This is all true notwithstanding the fact that we often make mistakes. (Though as we’ve discussed before, I think that a lot of the examples you point to in cryptography are cases where there were pretty obvious gaps in formalisms or possible improvements in systems, and those would have motivated a search for better alternatives if doing so was cheap with AI labor.)
- Wei Dai 10 Apr 2022 1:21 UTC
  LW: 2 AF: 2
  AF Parent
  The example of cryptography was mainly intended to make the point that humans are by default too credulous when it comes to informal arguments. But consider your statement:
  
  It feels to me like there’s basically no question that recognizing good cryptosystems is easier than generating them.
  
  Consider some cryptosystem widely considered to be secure, like AES. How much time did humanity spend on learning / figuring out how to recognize good cryptosystems (e.g. finding all the attacks one has to worry about, like differential cryptanalysis), versus specifically generating AES with the background knowledge in mind? Maybe the latter is on the order of 10% of the former?
  
  Then consider that we don’t actually know that AES is secure, because we don’t know all the possible attacks and we don’t know how to prove it secure, i.e., we don’t know how to recognize a good cryptosystem. Suppose one day we figure that out, wouldn’t finding an actually good cryptosystem be trivial at that point compared to all the previous effort?
  
  Some of your other points are valid, I think, but cryptography is just easier than alignment (don’t have time to say more as my flight is about to take off), and philosophy is perhaps a better analogy for the more general point.