paulfchristiano comments on Cryptographic Boxes for Unfriendly AI

paulfchristiano 19 Dec 2010 18:40 UTC
4 points
I’m sorry, I misunderstood your intention completely, probably because of the italics :)

I personally am paranoid enough about ambivalent transhumans that I would be very afraid of giving them such a powerful channel to the outside world, even if they had to pass through many levels of iterative improvement (if I could make a textbook that hijacked your mind, then I could just have you write a similarly destructive textbook to the next researcher, etc.).

I think it is possible to exploit an unfriendly boxed AI to bootstrap to friendliness in a theoretically safe way. Minimally, the problem of doing it safely is very interesting and difficult, and I think I have a solid first step to a solution.