Gunnar_Zarncke comments on How could AIs ‘see’ each other’s source code?

Gunnar_Zarncke 6 Jun 2023 0:18 UTC
2 points
0
To me, there seems to be a big difference between
- you let an external agent inspect everything you do (while you inspect them) and where your incentive is to deceive them; where there is an asymmetry between your control of the environment, your lost effort, and your risk of espionage (and vice versa for them)
- both of you observe the construction of a third agent on neutral ground where both parties have equally low control of the environment, low lost effort, and low risk of espionage (except for the coordination technology, which you probably want to reveal).
- O O 6 Jun 2023 1:28 UTC
  3 points
  −2
  Parent
  Hm, I think I am unclear on the specifics of this proposal, do you have a link explicitly stating it? When you said successor agent, I assumed a third agent would form describing their merged interests, which would then be aligned, and then the two adversaries would deactivate somehow. If this is wrong or it misses details, can you state or link the exact proposal?
  
  And this is still isomorphic in my opinion because the agent can build a defector (as in defects from cooperating) successor behind the back of the adversary or not actually deactivate. (For the last one I can see it instructing the successor to forcefully deactivate the predecessors, but this seems like they have to solve alignment).
  - Gunnar_Zarncke 6 Jun 2023 8:24 UTC
    4 points
    0
    Parent
    I am not aware of a proposal notably more specific than what I wrote above. It is my interpretation of what Yudkowsky has written here and there about it.
    The adversaries don’t need to deactivate because the successor will be able to more effectively cooperate with both agents. The originals will simply be outcompeted.
    Building a defective successor will not work either because the other agent will not trust it, and the agent building it will not trust it fully either if it isn’t fully embodying its interests, so why do it to begin with?
    You may be on something regarding how to identify the identity of an agent, though.
    - O O 6 Jun 2023 8:39 UTC
      3 points
      0
      Parent
      Ok say successor embodies the (C,C) decision. Why can’t one agent make (D,C) behind the other agents back. Likewise, why can’t the agent just choose to (D,C). It could be higher EV to say decisively strike these cooperators while they’re putting resources to laying paper clips and eliminate them.
      
      This is completely off topic but is there a reason why your comments are immediately upvoted?
      - Gunnar_Zarncke 6 Jun 2023 13:55 UTC
        4 points
        1
        Parent
        My comments are not upvoted but my normal comments count as two to begin with because I’m a user with sufficiently high karma. See Strong Votes [Update: Deployed].
      - Gunnar_Zarncke 6 Jun 2023 14:11 UTC
        2 points
        0
        Parent
        I’m not sure what you mean by the “successor embodies the (C,C) decision”. It embodies the negotiated interest combination.
        I think the comparison with the United Nations is quite good, actually. Once the UN has been set up, say, the USA can still try to hurt Russia directly, but the UN will try to find a compromise between the two (or more) actors—to the benefit of all.
        O O 6 Jun 2023 16:26 UTC
        3 points
        2
        Parent
        I think the comparison illustrates my point because the UN is typically not seen as enforceable and negotiated interests only exist when cooperating is the best choice. For human entities that don’t have maximization goals, both cooperating is often better than both defecting.
        
        (C,C) means both cooperate. (D,C) means defect and cooperate. In classic prisoners dilemma. (D,C) offers a higher EV than (C,C) but less than (D,D).
        
        I don’t think there is any way to weasel around a true prisoners dilemma with adversaries. It’s a simple situation and arises naturally everywhere.
        Gunnar_Zarncke 6 Jun 2023 23:33 UTC
        2 points
        0
        Parent
        I agree that the prisoners’ dilemma occurs frequently, but I don’t think you can use it the way you seem to in the building-a-joint-successor agent. I guess we are operating with different operationalization in mind, and until those are spelled out, we will probably not agree.
        Maybe we can make some quick progress on that by going with the pay-off matrix but for each agents choice adding a probability that the choice is detected before execution. We also at least need a no-op case because presubly you can refrain from building a successor agent (in reality there would be many in-between options but to keep it manageable). I think if you multiply things out the build-agent-in-neutral place comes out on top.
        O O 6 Jun 2023 20:00 UTC
        1 point
        0
        Parent
        The (C,C) decision is what you’re describing, a decision to cooperate on negotiated interests. The (D,C) decision is if one party cooperates but the other party does not cooperate.
        
        I don’t think it is. The UN is a constant example of prisoners dilemma given the number of wars it has stopped (exactly 0).
        Gunnar_Zarncke 6 Jun 2023 20:05 UTC
        2 points
        −1
        Parent
        There are fewer wars since about its foundation in 1945:
        https://www.vox.com/2015/6/23/8832311/war-casualties-600-years
        O O 6 Jun 2023 22:20 UTC
        1 point
        0
        Parent
        I believe this is largely due to the globalization of the economy, MAD, and proxy conflicts. Globalization makes cooperating extremely beneficial. MAD makes (D,C) states very costly in real wars (before nuclear and long ranged automated weapons, a decisive strike could result in a large advantage, now there is little advantage to a decisive strike, see Pearl Harbor as an example in the past). Most human entities are also not so called fanatical maximizers (tho some were, for example the Nazis who wanted endless conquest and extermination).
        Gunnar_Zarncke 6 Jun 2023 23:35 UTC
        2 points
        0
        Parent
        Maybe the argument doesn’t work for the UN, though that could also be bad luck. But people found organizations together all the time and I would be very surprised if that were not profitable.
        O O 7 Jun 2023 0:28 UTC
        1 point
        0
        Parent
        People are not fanatical immortal maximizers that are robustly distributed with near unlimited regenerative properties. If we were I’d expect there to be exactly one person left on earth after an arbitrary amount of time.
        Gunnar_Zarncke 7 Jun 2023 1:13 UTC
        2 points
        0
        Parent
        That seems like an unrelated argument to me. The agents we are talking about here are also physically limited. Maybe they are more powerful, but they are presumably more powerful in some kind of sphere of influence, and they need to cooperate too. Sure, any analogy has to be proven tight, but I have proposed a model for that in the other comment.
        Expand this thread
        O O 7 Jun 2023 1:18 UTC
        1 point
        0
        Parent
        Isn’t there a base assumption that agents are super intelligent, don’t “decay” I.e. they have infinite time horizons, they are maximizing EV, and would work fine alone?
        Gunnar_Zarncke 7 Jun 2023 9:31 UTC
        2 points
        0
        Parent
        No?
        And even if they do not decay and have long time horizon, they would still benefit from collaborating with each other. This is about how they do that.
    - [ ]
      [deleted]