whpearson comments on Re-formalizing PD

whpearson 28 Apr 2009 14:43 UTC
0 points
True, we had different uses in mind. I was trying to forestall any people using the idea as an argument that future AI swapping source code should necessarily cooperate.

By the way, I liked the math at the end, although I don’t have time to sit down and check my intuitions.
- Eliezer Yudkowsky 29 Apr 2009 1:03 UTC
  2 points
  Parent
  Then the question is whether AIs can (a) trustworthily verify each other’s source code by e.g. sending in probes to do random inspections which include verifying that the inspected software is not deliberately opaque nor is it set to accept coded overrides from outside, or (b) don’t need to verify each other’s source code because the vast majority of initial conditions converge on the same obvious answer to PD problems.
  - Wei Dai 29 Apr 2009 2:37 UTC
    0 points
    Parent
    (a) Random inspections probably won’t work. It’s easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won’t detect the backdoor unless you can see the entire system as a whole.
    
    Tim Freeman’s “proof by construction” method is the only viable solution to the “prove your source code” problem that I’ve seen so far.
    
    (b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other’s source code, won’t they want to modify their source code to play Defect again?
    - Eliezer Yudkowsky 29 Apr 2009 6:21 UTC
      3 points
      Parent
      Look innocent to a cursory human inspection, yes. But if hardware is designed to be deterministically cooperative/coordinating and to provably not be a backdoor in combination with larger hardware, that sounds like something that should be provable if the hardware was designed with that provability in mind.
      - Wei Dai 29 Apr 2009 19:31 UTC
        2 points
        Parent
        Many governments, including the US, are concerned right now that their computers have hardware backdoors, so the current lack of research results on this topic is not just due to lack of interest, but probably intrinsic difficulty. Even if provable hardware is physically possible and technically feasible in the future, there is likely a cost attached, for example running slower than non-provable hardware or using more resources.
        
        Instead of confidently predicting that AIs will Cooperate in one-shot PD, wouldn’t it be more reasonable to say that this is a possibility, which may or may not occur, depending on the feasibility and economics of various future technologies?
        Vladimir_Nesov 29 Apr 2009 19:40 UTC
        0 points
        Parent
        The singleton scenario seems overwhelmingly likely, so whatever multiple AIs will exist, they’ll play by the singleton’s rules, with native physics becoming irrelevant. (I know, I know...)
      - cousin_it 29 Apr 2009 7:23 UTC
        2 points
        Parent
        I believe this stuff bottoms out in physics—it’s either possible or impossible to make a physically provable analog to the PREFIX program. The idea is fascinating, but I don’t know enough physics to determine whether it’s crazy.
        whpearson 29 Apr 2009 10:18 UTC
        0 points
        Parent
        The difficulty would be to make sure nothing could interact with the atoms/physical constituents of the prefix in a way that distorts the prefix. Prefixes of programs have the benefit they go first, and in the serial nature of most programs, things that go first have complete control.
        
        So it is a question of isolating the prefix. I’m going to read this paper on isolation and physics, before making any comments on the subject.
        cousin_it 29 Apr 2009 10:57 UTC
        0 points
        Parent
        I read the paper, and it seemed to me to be useless. We want a physically inviolable guarantee of isolation.
        whpearson 29 Apr 2009 11:57 UTC
        0 points
        Parent
        It gave some ideas. It suggests we might start with specifying time limits, e.g. specifying a system will be effectively isolated for a certain time, by scanning a region of space around that system.
- cousin_it 28 Apr 2009 16:19 UTC
  0 points
  Parent
  Though my knowledge of physics is near nonexistent, I imagine future AIs wouldn’t inspect each other’s source code; they would instead agree to set up a similar tournament and use some sort of inviolable physical prohibitions in place of tournament rules. This sounds like an exciting idea for someone’s future post: physical models of PD with light, gravity etc., and mechanisms to enforce cooperation in those models (if there are any).