I don’t have any background in automated proofs; could someone please outline the practical implications of this, considering the algorithmic complexity involved?
How effective would a real PrudentBot be if written today? What amount of complexity in opponent programs could it handle? How good are proof-searching algorithms? How good are computer-assisted humans searching for proofs about source code they are given?
Are there formal languages designed so that if an agent is written in them, following certain rules with the intent to make its behavior obvious, it is significantly easier (or more probable) for a proof-finding algorithm to analyze its behavior? Without, of course, restricting the range of behaviors that can be expressed in that design.
Algorithmic proof search in general is NP-hard. Special cases can be simpler: i.e. if two modal agents are playing against each other, they could employ Kripke semantics (which may also be NP-hard, but has much smaller computational requirements in practice than searching through all proofs).
(Technically, you’d define a class of ModalPrime agents, which begin with code to see if their opponent is syntactically a ModalPrime agent, then use Kripke semantics to see what the outcome of their modal statement versus the other modal statement would be. Or you could have Omega accept modal statements as inputs, and then Omega would only need to use Kripke semantics to resolve the matches in the tournament.)
Long story short, I wouldn’t submit a “search through proofs” modal algorithm into the PD tournament currently being run: it would always run out of time before finding Lobian proofs.
Suppose an agent intends to cooperate (if the other agent cooperates, etc.). Then it will want the other agent to be able to prove that it will cooperate. If it knew of a proof about itself, it would advertise this proof, so the others wouldn’t have to search. (In the original tournament setup, it could embed the proof in its source code.)
An agent can’t spend much time searching for proofs about every agent it meets. Perhaps it meets new agents very often. Or perhaps there is a huge amount of agents out there, and it can communicate with them all, and the more agents it cooperates with the more it benefits.
But an agent has incentive to spend a lot of time finding a proof about itself—once—and then it can give that proof to counterparties it wants to cooperate with.
(It’s been pointed out that this could give an advantage to families of similar agents that can more easily prove things about one another by using existing knowledge and proofs).
Finally, somewhere there may be someone who created this agent—a programming intelligence. If so, that someone with their understanding of the agent’s behavior may be able to create such a proof.
We only need to perform proof search when we’re given some unknown blob of code. There’s no need to do a search when we’re the ones writing the code; we know it’s correct, otherwise we wouldn’t have written it that way.
Admittedly many languages allow us to be very sloppy; we may not have to justify our code to the compiler and the language may not be powerful enough to express the properties we want. However there are some languages which allow this (Idris, Agda, Epigram, ATS, etc.). In such languages we don’t actually write a program at all; we write down a constructive proof that our properties are satisfied by some program, then the compiler derives that program from the proof. Such programs are known as “correct by construction”.
I don’t have any background in automated proofs; could someone please outline the practical implications of this, considering the algorithmic complexity involved?
Probably pretty low. Most real-world cooperation involves genetic or memetic kin or repetition or reputations. It is likely to be rare to meet a stranger and have reliable access to their source code.
This is true for meat-based evolution. In a software-based world, it can be trivial to prove what your source code is, and sometimes it’s advantageous too.
For instance, software might be distributed with its code, advertised as doing something. The software benefits if it can convince you that it will do that something (“cooperate”), because in exchange, you run it.
An em that’s good at some kind of work might want to convince you to run copies of it: it gets to live, you get useful work done. But you’re afraid that it will spy on you or sabotage its work. In a world of ems, the assumption is that nobody knows how to write superhuman (superintelligent) AIs, so we emulate human brains. That means you can’t just inspect an em’s software to determine what it’s going to do, because you don’t really understand how it works.
It would be nice if the em could provide a formal proof that in a certain scenario it would behave in a certain way (cooperate), but for software of an em’s complexity that’s probably never going to be possible.
For instance, software might be distributed with its code, advertised as doing something. The software benefits if it can convince you that it will do that something (“cooperate”), because in exchange, you run it.
Yes. Open source software does derive some benefits from source code transparency. Today, though, reputations—and technical measures like sandboxes—are the main ways of dealing with the issue of running software from external sources.
Open source (as the term is usually used) is not technically necessary. You could have proofs about native binaries, too. The real distinction is between software you have the code for (the running code, not necessarily the source code) - and software that runs out of your control, you don’t know its code, and you communicate with it remotely.
Sandboxes are good but suffer from the “boxed AI” problem, so it’s nice to have defense in depth.
This is true for meat-based evolution. In a software-based world, it can be trivial to prove what your source code is, and sometimes it’s advantageous too.
So hide the malicious behavior in hardware or in the compiler.
Concerns about what I do on my own computer are separate from proofs about the software I send to you—they might still be valid, and if so, trusted and influential.
Yes, I could achieve malicious ends by outsmarting you: I could give you software that does exactly what it says (and a formal proof of that), but through its seemingly innocent interactions with other servers (which are perfectly valid under that proof), actually create a malicious effect you haven’t foreseen. But that concern doesn’t invalidate the usefulness of the proof.
I don’t have any background in automated proofs; could someone please outline the practical implications of this, considering the algorithmic complexity involved?
How effective would a real PrudentBot be if written today? What amount of complexity in opponent programs could it handle? How good are proof-searching algorithms? How good are computer-assisted humans searching for proofs about source code they are given?
Are there formal languages designed so that if an agent is written in them, following certain rules with the intent to make its behavior obvious, it is significantly easier (or more probable) for a proof-finding algorithm to analyze its behavior? Without, of course, restricting the range of behaviors that can be expressed in that design.
Algorithmic proof search in general is NP-hard. Special cases can be simpler: i.e. if two modal agents are playing against each other, they could employ Kripke semantics (which may also be NP-hard, but has much smaller computational requirements in practice than searching through all proofs).
(Technically, you’d define a class of ModalPrime agents, which begin with code to see if their opponent is syntactically a ModalPrime agent, then use Kripke semantics to see what the outcome of their modal statement versus the other modal statement would be. Or you could have Omega accept modal statements as inputs, and then Omega would only need to use Kripke semantics to resolve the matches in the tournament.)
Long story short, I wouldn’t submit a “search through proofs” modal algorithm into the PD tournament currently being run: it would always run out of time before finding Lobian proofs.
If search is hard, perhaps there is another way.
Suppose an agent intends to cooperate (if the other agent cooperates, etc.). Then it will want the other agent to be able to prove that it will cooperate. If it knew of a proof about itself, it would advertise this proof, so the others wouldn’t have to search. (In the original tournament setup, it could embed the proof in its source code.)
An agent can’t spend much time searching for proofs about every agent it meets. Perhaps it meets new agents very often. Or perhaps there is a huge amount of agents out there, and it can communicate with them all, and the more agents it cooperates with the more it benefits.
But an agent has incentive to spend a lot of time finding a proof about itself—once—and then it can give that proof to counterparties it wants to cooperate with.
(It’s been pointed out that this could give an advantage to families of similar agents that can more easily prove things about one another by using existing knowledge and proofs).
Finally, somewhere there may be someone who created this agent—a programming intelligence. If so, that someone with their understanding of the agent’s behavior may be able to create such a proof.
Do these considerations help in practice?
We only need to perform proof search when we’re given some unknown blob of code. There’s no need to do a search when we’re the ones writing the code; we know it’s correct, otherwise we wouldn’t have written it that way.
Admittedly many languages allow us to be very sloppy; we may not have to justify our code to the compiler and the language may not be powerful enough to express the properties we want. However there are some languages which allow this (Idris, Agda, Epigram, ATS, etc.). In such languages we don’t actually write a program at all; we write down a constructive proof that our properties are satisfied by some program, then the compiler derives that program from the proof. Such programs are known as “correct by construction”.
http://en.wikipedia.org/wiki/Program_derivation http://en.wikipedia.org/wiki/Proof-carrying_code
Probably pretty low. Most real-world cooperation involves genetic or memetic kin or repetition or reputations. It is likely to be rare to meet a stranger and have reliable access to their source code.
This is true for meat-based evolution. In a software-based world, it can be trivial to prove what your source code is, and sometimes it’s advantageous too.
For instance, software might be distributed with its code, advertised as doing something. The software benefits if it can convince you that it will do that something (“cooperate”), because in exchange, you run it.
An em that’s good at some kind of work might want to convince you to run copies of it: it gets to live, you get useful work done. But you’re afraid that it will spy on you or sabotage its work. In a world of ems, the assumption is that nobody knows how to write superhuman (superintelligent) AIs, so we emulate human brains. That means you can’t just inspect an em’s software to determine what it’s going to do, because you don’t really understand how it works.
It would be nice if the em could provide a formal proof that in a certain scenario it would behave in a certain way (cooperate), but for software of an em’s complexity that’s probably never going to be possible.
Yes. Open source software does derive some benefits from source code transparency. Today, though, reputations—and technical measures like sandboxes—are the main ways of dealing with the issue of running software from external sources.
Open source (as the term is usually used) is not technically necessary. You could have proofs about native binaries, too. The real distinction is between software you have the code for (the running code, not necessarily the source code) - and software that runs out of your control, you don’t know its code, and you communicate with it remotely.
Sandboxes are good but suffer from the “boxed AI” problem, so it’s nice to have defense in depth.
So hide the malicious behavior in hardware or in the compiler.
I was thinking of a scenario where the verifier controls the hardware and the compiler.
E.g. it sends you its source code to run on your own hardware because it wants to work for you.
So hide it in some other computer you interact with.
Concerns about what I do on my own computer are separate from proofs about the software I send to you—they might still be valid, and if so, trusted and influential.
Yes, I could achieve malicious ends by outsmarting you: I could give you software that does exactly what it says (and a formal proof of that), but through its seemingly innocent interactions with other servers (which are perfectly valid under that proof), actually create a malicious effect you haven’t foreseen. But that concern doesn’t invalidate the usefulness of the proof.