True, we had different uses in mind. I was trying to forestall any people using the idea as an argument that future AI swapping source code should necessarily cooperate.
By the way, I liked the math at the end, although I don’t have time to sit down and check my intuitions.
Then the question is whether AIs can (a) trustworthily verify each other’s source code by e.g. sending in probes to do random inspections which include verifying that the inspected software is not deliberately opaque nor is it set to accept coded overrides from outside, or (b) don’t need to verify each other’s source code because the vast majority of initial conditions converge on the same obvious answer to PD problems.
(a) Random inspections probably won’t work. It’s easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won’t detect the backdoor unless you can see the entire system as a whole.
Tim Freeman’s “proof by construction” method is the only viable solution to the “prove your source code” problem that I’ve seen so far.
(b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other’s source code, won’t they want to modify their source code to play Defect again?
Look innocent to a cursory human inspection, yes. But if hardware is designed to be deterministically cooperative/coordinating and to provably not be a backdoor in combination with larger hardware, that sounds like something that should be provable if the hardware was designed with that provability in mind.
Many governments, including the US, are concerned right now that their computers have hardware backdoors, so the current lack of research results on this topic is not just due to lack of interest, but probably intrinsic difficulty. Even if provable hardware is physically possible and technically feasible in the future, there is likely a cost attached, for example running slower than non-provable hardware or using more resources.
Instead of confidently predicting that AIs will Cooperate in one-shot PD, wouldn’t it be more reasonable to say that this is a possibility, which may or may not occur, depending on the feasibility and economics of various future technologies?
The singleton scenario seems overwhelmingly likely, so whatever multiple AIs will exist, they’ll play by the singleton’s rules, with native physics becoming irrelevant. (I know, I know...)
I believe this stuff bottoms out in physics—it’s either possible or impossible to make a physically provable analog to the PREFIX program. The idea is fascinating, but I don’t know enough physics to determine whether it’s crazy.
The difficulty would be to make sure nothing could interact with the atoms/physical constituents of the prefix in a way that distorts the prefix. Prefixes of programs have the benefit they go first, and in the serial nature of most programs, things that go first have complete control.
So it is a question of isolating the prefix. I’m going to read this paper on isolation and physics, before making any comments on the subject.
It gave some ideas. It suggests we might start with specifying time limits, e.g. specifying a system will be effectively isolated for a certain time, by scanning a region of space around that system.
Though my knowledge of physics is near nonexistent, I imagine future AIs wouldn’t inspect each other’s source code; they would instead agree to set up a similar tournament and use some sort of inviolable physical prohibitions in place of tournament rules. This sounds like an exciting idea for someone’s future post: physical models of PD with light, gravity etc., and mechanisms to enforce cooperation in those models (if there are any).
True, we had different uses in mind. I was trying to forestall any people using the idea as an argument that future AI swapping source code should necessarily cooperate.
By the way, I liked the math at the end, although I don’t have time to sit down and check my intuitions.
Then the question is whether AIs can (a) trustworthily verify each other’s source code by e.g. sending in probes to do random inspections which include verifying that the inspected software is not deliberately opaque nor is it set to accept coded overrides from outside, or (b) don’t need to verify each other’s source code because the vast majority of initial conditions converge on the same obvious answer to PD problems.
(a) Random inspections probably won’t work. It’s easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won’t detect the backdoor unless you can see the entire system as a whole.
Tim Freeman’s “proof by construction” method is the only viable solution to the “prove your source code” problem that I’ve seen so far.
(b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other’s source code, won’t they want to modify their source code to play Defect again?
Look innocent to a cursory human inspection, yes. But if hardware is designed to be deterministically cooperative/coordinating and to provably not be a backdoor in combination with larger hardware, that sounds like something that should be provable if the hardware was designed with that provability in mind.
Many governments, including the US, are concerned right now that their computers have hardware backdoors, so the current lack of research results on this topic is not just due to lack of interest, but probably intrinsic difficulty. Even if provable hardware is physically possible and technically feasible in the future, there is likely a cost attached, for example running slower than non-provable hardware or using more resources.
Instead of confidently predicting that AIs will Cooperate in one-shot PD, wouldn’t it be more reasonable to say that this is a possibility, which may or may not occur, depending on the feasibility and economics of various future technologies?
The singleton scenario seems overwhelmingly likely, so whatever multiple AIs will exist, they’ll play by the singleton’s rules, with native physics becoming irrelevant. (I know, I know...)
I believe this stuff bottoms out in physics—it’s either possible or impossible to make a physically provable analog to the PREFIX program. The idea is fascinating, but I don’t know enough physics to determine whether it’s crazy.
The difficulty would be to make sure nothing could interact with the atoms/physical constituents of the prefix in a way that distorts the prefix. Prefixes of programs have the benefit they go first, and in the serial nature of most programs, things that go first have complete control.
So it is a question of isolating the prefix. I’m going to read this paper on isolation and physics, before making any comments on the subject.
I read the paper, and it seemed to me to be useless. We want a physically inviolable guarantee of isolation.
It gave some ideas. It suggests we might start with specifying time limits, e.g. specifying a system will be effectively isolated for a certain time, by scanning a region of space around that system.
Though my knowledge of physics is near nonexistent, I imagine future AIs wouldn’t inspect each other’s source code; they would instead agree to set up a similar tournament and use some sort of inviolable physical prohibitions in place of tournament rules. This sounds like an exciting idea for someone’s future post: physical models of PD with light, gravity etc., and mechanisms to enforce cooperation in those models (if there are any).