Perplexed comments on Cryptographic Boxes for Unfriendly AI

Perplexed 18 Dec 2010 16:18 UTC
11 points

the sort of proof techniques I currently have in mind … would not work for verifying Friendliness of a finished AI that was handed you by a hostile superintelligence.

But what if the hostile superintelligence handed you a finished AI together with a purported proof of its Friendliness. Would you have enough trust in the soundness of your proof system to check the purported proof and act on the results of that check?
- JamesAndrix 19 Dec 2010 17:44 UTC
  1 point
  Parent
  That would then be something you’d have to read and likely show to dozens of other people to verify reliably, leaving opportunities for all kinds of mindhacks. the OP proposal requires us to have an automatic verifier ready to run, that can return reliably without human intervention.
  - jsalvatier 19 Dec 2010 20:54 UTC
    7 points
    Parent
    Actually computers can mechanically check proofs for any formal system.
    - wedrifid 20 Dec 2010 5:56 UTC
      0 points
      Parent
      Is there something missing from the parent? It does not seem to parse.
      - jsalvatier 20 Dec 2010 6:54 UTC
        0 points
        Parent
        Yes, edited. thanks.
        wedrifid 20 Dec 2010 7:04 UTC
        0 points
        Parent
        And upvoted. :)
  - Pfft 19 Dec 2010 20:55 UTC
    5 points
    Parent
    Yes, but the point is that the automatic verifier gets to verify a proof that the AI-in-the-box produced—it doesn’t have to examine an arbitrary program and try to proof friendliness from scratch.
    
    In a comment below, paulfchristiano makes the point that any theory of friendliness at all should give us such a proof system, for some restricted class of programs. For example, Eliezer envisions a theory about how to let programs evolve without losing friendliness. The corresponding class of proofs have the form “the program under consideration can be derived from the known-friendly program X by the sequence Y of friendliness-preserving transformations”.