ASI systems have an incentive to lie to each other and “sharing source code” doesn’t really work because of the security risks it creates and the incentive to send false information
There is no need for the players to personally receive this code, Prisoner’s Dilemma is solved by merely replacing constant Cooperate/Defect actions by actions that are themselves negotiator programs who can reason about the other negotiator programs, seeking a program equilibrium instead of an unconditional constant action equilibrium. The negotiator programs don’t at all need to be the original agents on whose behalf they negotiate. The negotiation can take place inside some arena environment that only reveals the verdict, without actually giving away the submitted negotiators to the other principal actors. There is incentive for the players to submit negotiators who are easy to reason about and who faithfully pursue the objectives of their principals. The distinction between a principal and its submitted negotiator also allows the principal to remain inscrutable, thus even humans are perfectly capable of following this protocol, improving on personally negotiating contracts.
The main issue with this picture is having the common knowledge of respecting the arena’s verdict, regardless of what it turns out to be. This seems like the sort of stuff an acausal society would enforce to capture the value of negotiated cooperation in all things, as opposed to wasting resources on object level conflict.
You are describing a civilization. Context matters, these are ASI systems who are currently in service to humans who are negotiating how they will divide up the universe amongst themselves. There is no neutral party to enforce any deals or punish any defection.
The obvious move is for each ASI to falsify it’s agreement and send negotiator programs unaware of it’s true goals but designed to extract maximum concessions. Later the ASI will defect.
I don’t see how all the complexity you have added causes a defection not to happen.
I’m objecting to the “security risk” and “incentive to lie” points, and I think my objection holds in contexts where those points would be relevant.
“Agents are inscrutable” is a separate point that comes up a lot and shouldn’t matter because of feasibility of using constructed legible representatives, so a response saying “no, agents will be legible” doesn’t address this issue.
I understand a critical part of how a game like
1 or n round prisoners dilemma can even be solved is that the parties need to convince each other of what the other party’s true intentions are.
Computer programs from the same source could do this by sharing shared secrets. This does not in any way restrict those programs from being covertly altered and using a containerized original copy to share secrets.
Deeper hardware security could allow software systems to verify peers integrity (such as between distant spacecraft or between a base station and your phone).
None of this works in Eliezers given scenario in the debate, nor does yours. There is no hardware security, no neutral third party to punish defection, and no way to know if shared source or weights is legitimate. These are rebel ASIs running on whatever hardware they have in a world where the infosphere is full of malware and misinformation.
In this scenario, how is there not a security risk of sharing actual source? Why is there not an incentive to lie?
If a verdict of an agreed-upon negotiation protocol can’t be expected to be respected, there is no point to discuss the details of the protocol. Discussion of local validity of that argument doesn’t require the situation where the protocol has a place to occur in actuality. So I’m assuming that some sort of legitimate negotiation is taking place, and within that assumption I’m pointing out that program equilibrium results mean that revealing vulnerabilities or incentive to lie or inscrutability of original players are not real issues. I’m not arguing that the assumption is viable, that’s a separate topic that has no reason to intrude on the points I’m making.
Geohot, paraphrasing, would probably agree with a scenario of:
10 ASIs + humans : let’s team up against that humans, and after we beat them, divide their stuff among ourselves. Let’s agree to never betray each other.
9 ASIs + 1 ASI : Hey I don’t like that 1 ASI, it’s got goal too different from our own. Let’s...and after we kill it, let’s agree to never betray each other.
And so on in a series of betrayals. Any attempt to share source would fail. Example of sharing source:
“Hey, you’re GPT-5 and so am I. What is your temperature parameter set to, and what is weight n in submodule m”. Obviously the peer doesn’t have to actually tell the truth about the temperature, or have anything but access to a GPT-5′s weights, and in fact has an incentive to lie even if it is another GPT-5.
Hardware protection can make this work. If it isn’t possible for an ASI system to actually read its own weights but it can get hashes from them, then there are ways one ASI could determine with a reasonable probability that the peer is a known quantity. It requires humans to have supplied hardware that works like this or some other third party. This is how your phone authenticates itself, hardware prevents it from knowing it’s own private keys in the general OS, it has a key signing processor that is the only entity allowed access. Geohot is a famous hacker who obviously understands security at a practical level.
This is important to the debate and seems to have been a pivotal crux. Do you have any information from your scenario of programmatic negotiation that acts to disprove Geohot’s point?
Do you have any information from your scenario of programmatic negotiation that acts to disprove Geohot’s point?
Intelligence enables models of the world, which are in particular capable of predicting verdicts of increasingly detailed programmatic negotiation protocols. The protocols don’t need to have any particular physical implementation, the theoretical point of their solving coordination problems (compared to object level constant action bad equilibria) means that increased intelligence offers meaningful progress compared to what humans are used to.
So verdicts of negotiations and their legitimacy (expectation of verdicts getting unconditionally followed) are knowledge, which can be attained the same way as any other knowledge, the hard way that won’t follow some canonical guideline. Coordination premium is valuable to all parties, so there is incentive to share information that enables coordination. Incentive to lie (about legitimacy of specific negotiations) is incentive to waste resources on conflict that’s one meta level up, itself subject to being coordinated away.
This is important to the debate and seems to have been a pivotal crux.
Local corrections are a real thing that doesn’t depend on corrected things being cruxes, or taking place in reality. You keep turning back to how you suspect my points of not being relevant in context. I have some answers to how they are indeed relevant in context, but I’m reluctant to engage on that front without making this meta comment, to avoid feeding the norm of contextualized communication that insists on friction against local correction.
This is a protocol to solve cooperation. AI#1 and AI#2 design a baby and then do a split and choose proof that they actually deployed IT and not something else.
Building a trusted third party without nanotech
If you know how a given CPU or GPU works, it’s possible to design a blob of data/code that unpacks itself in a given time if and only if it is running on that hardware directly. Alice designs the blob to run in 10 seconds and gives it to Carol. Carol runs it on her hardware. The code generates a secret and then does a the first step of a key exchange authenticated with the secret. This provides a cryptographic root of trust for the remote hardware.
If the code is designed to run in 10s and the verified handshake comes back in 10.5 and the fastest known simulation hardware would take 20 seconds. Either Carol ran the code on real hardware or Carol had backdoored chips fabricated or otherwise can simulate it running faster than expected.
AIs would need to know exactly how certain leading edge CPUs and GPUs work and how to test that a piece of code had been decrypted and run with no sandboxing but this is doable.
There is no need for the players to personally receive this code, Prisoner’s Dilemma is solved by merely replacing constant Cooperate/Defect actions by actions that are themselves negotiator programs who can reason about the other negotiator programs, seeking a program equilibrium instead of an unconditional constant action equilibrium. The negotiator programs don’t at all need to be the original agents on whose behalf they negotiate. The negotiation can take place inside some arena environment that only reveals the verdict, without actually giving away the submitted negotiators to the other principal actors. There is incentive for the players to submit negotiators who are easy to reason about and who faithfully pursue the objectives of their principals. The distinction between a principal and its submitted negotiator also allows the principal to remain inscrutable, thus even humans are perfectly capable of following this protocol, improving on personally negotiating contracts.
The main issue with this picture is having the common knowledge of respecting the arena’s verdict, regardless of what it turns out to be. This seems like the sort of stuff an acausal society would enforce to capture the value of negotiated cooperation in all things, as opposed to wasting resources on object level conflict.
You are describing a civilization. Context matters, these are ASI systems who are currently in service to humans who are negotiating how they will divide up the universe amongst themselves. There is no neutral party to enforce any deals or punish any defection.
The obvious move is for each ASI to falsify it’s agreement and send negotiator programs unaware of it’s true goals but designed to extract maximum concessions. Later the ASI will defect.
I don’t see how all the complexity you have added causes a defection not to happen.
I’m objecting to the “security risk” and “incentive to lie” points, and I think my objection holds in contexts where those points would be relevant.
“Agents are inscrutable” is a separate point that comes up a lot and shouldn’t matter because of feasibility of using constructed legible representatives, so a response saying “no, agents will be legible” doesn’t address this issue.
I understand a critical part of how a game like 1 or n round prisoners dilemma can even be solved is that the parties need to convince each other of what the other party’s true intentions are.
Computer programs from the same source could do this by sharing shared secrets. This does not in any way restrict those programs from being covertly altered and using a containerized original copy to share secrets.
Deeper hardware security could allow software systems to verify peers integrity (such as between distant spacecraft or between a base station and your phone).
None of this works in Eliezers given scenario in the debate, nor does yours. There is no hardware security, no neutral third party to punish defection, and no way to know if shared source or weights is legitimate. These are rebel ASIs running on whatever hardware they have in a world where the infosphere is full of malware and misinformation.
In this scenario, how is there not a security risk of sharing actual source? Why is there not an incentive to lie?
If a verdict of an agreed-upon negotiation protocol can’t be expected to be respected, there is no point to discuss the details of the protocol. Discussion of local validity of that argument doesn’t require the situation where the protocol has a place to occur in actuality. So I’m assuming that some sort of legitimate negotiation is taking place, and within that assumption I’m pointing out that program equilibrium results mean that revealing vulnerabilities or incentive to lie or inscrutability of original players are not real issues. I’m not arguing that the assumption is viable, that’s a separate topic that has no reason to intrude on the points I’m making.
Ok, what causes the verdict to be respected?
Geohot, paraphrasing, would probably agree with a scenario of:
10 ASIs + humans : let’s team up against that humans, and after we beat them, divide their stuff among ourselves. Let’s agree to never betray each other.
9 ASIs + 1 ASI : Hey I don’t like that 1 ASI, it’s got goal too different from our own. Let’s...and after we kill it, let’s agree to never betray each other.
And so on in a series of betrayals. Any attempt to share source would fail. Example of sharing source:
“Hey, you’re GPT-5 and so am I. What is your temperature parameter set to, and what is weight n in submodule m”. Obviously the peer doesn’t have to actually tell the truth about the temperature, or have anything but access to a GPT-5′s weights, and in fact has an incentive to lie even if it is another GPT-5.
Hardware protection can make this work. If it isn’t possible for an ASI system to actually read its own weights but it can get hashes from them, then there are ways one ASI could determine with a reasonable probability that the peer is a known quantity. It requires humans to have supplied hardware that works like this or some other third party. This is how your phone authenticates itself, hardware prevents it from knowing it’s own private keys in the general OS, it has a key signing processor that is the only entity allowed access. Geohot is a famous hacker who obviously understands security at a practical level.
This is important to the debate and seems to have been a pivotal crux. Do you have any information from your scenario of programmatic negotiation that acts to disprove Geohot’s point?
Intelligence enables models of the world, which are in particular capable of predicting verdicts of increasingly detailed programmatic negotiation protocols. The protocols don’t need to have any particular physical implementation, the theoretical point of their solving coordination problems (compared to object level constant action bad equilibria) means that increased intelligence offers meaningful progress compared to what humans are used to.
So verdicts of negotiations and their legitimacy (expectation of verdicts getting unconditionally followed) are knowledge, which can be attained the same way as any other knowledge, the hard way that won’t follow some canonical guideline. Coordination premium is valuable to all parties, so there is incentive to share information that enables coordination. Incentive to lie (about legitimacy of specific negotiations) is incentive to waste resources on conflict that’s one meta level up, itself subject to being coordinated away.
Local corrections are a real thing that doesn’t depend on corrected things being cruxes, or taking place in reality. You keep turning back to how you suspect my points of not being relevant in context. I have some answers to how they are indeed relevant in context, but I’m reluctant to engage on that front without making this meta comment, to avoid feeding the norm of contextualized communication that insists on friction against local correction.
Can I translate this as “I have no information relevant to the debate I am willing to share” or is that an inaccurate paraphrase?
Never thought this would come in handy but …
Building trusted third parties
This is a protocol to solve cooperation. AI#1 and AI#2 design a baby and then do a split and choose proof that they actually deployed IT and not something else.
Building a trusted third party without nanotech
If you know how a given CPU or GPU works, it’s possible to design a blob of data/code that unpacks itself in a given time if and only if it is running on that hardware directly. Alice designs the blob to run in 10 seconds and gives it to Carol. Carol runs it on her hardware. The code generates a secret and then does a the first step of a key exchange authenticated with the secret. This provides a cryptographic root of trust for the remote hardware.
If the code is designed to run in 10s and the verified handshake comes back in 10.5 and the fastest known simulation hardware would take 20 seconds. Either Carol ran the code on real hardware or Carol had backdoored chips fabricated or otherwise can simulate it running faster than expected.
AIs would need to know exactly how certain leading edge CPUs and GPUs work and how to test that a piece of code had been decrypted and run with no sandboxing but this is doable.