creating mathematical proofs based on the algorithm that allow generalizable conclusions about things that the other agent will or will not do.
It’s precisely this part which is impossible in a general case. You can reason only about a subset of algorithms which are compatible with your conclusion-making algorithm.
Proof by contradiction: Let’s suppose we have a method “Prophet.willStop(program)” that predicts whether given program will stop. How about this program? It would behave contrary to what the prediction says about it.
program Contrarian { … if (Prophet.willStop(Contrarian)) { … … loop_forever(); … } else { … … // do nothing … } }
2) For any behavior “B” imagine a function “f” which you cannot predict whether it will stop or not. Will the following program exhibit the behavior “B”?
It’s precisely this part which is impossible in a general case. You can reason only about a subset of algorithms which are compatible with your conclusion-making algorithm.
Yes, which why:
An agent that wishes to facilitate cooperation—or that wishes to prove credible threat—will actually prefer to structure their own code such that it is as easy as possible to make proofs and draw conclusions from that code.
Some agents really are impossible to cooperate with even when it would be mutually beneficial. Either because they are irrational in an absolute sense or because their algorithm is intractable to you. That doesn’t prevent you from cooperating with the rest.
Interesting. So a self-modifying agent might want to modify their own code to be easier to inspect, because this could make other agents trust them and cooperate with them. Two questions:
What would be the cost of such modification? You cannot just rewrite any algorithm to a more legible form. If the agent modifies themselves to e.g. a regular expression (just joking), it will be able to do only what the regular expressions are able to do, which may be not enough for a complex situation. Limiting one’s own cognitive abilities seems like a dangerous move.
Even if I want to reprogram myself to make myself more legible, I need to know what algorithm will the other party use to read my code. How can I guess it? Or perhaps is it enough to meet the other agent, explain to each other our reading algorithms, and only then self-modify to become compatible with them? I am suspicious whether such process can be iterated—my intuition is that by conforming to one agent’s code analysis routines, I lose part of my abilities, which may make me unable to conform to other agent’s code analysis routines.
my intuition is that by conforming to one agent’s code analysis routines, I lose part of my abilities, which may make me unable to conform to other agent’s code analysis routines.
Any decision restricts what happens, for all you knew before making the decision, but doesn’t necessarily make future decisions more difficult. Coordinating with other agents requires deciding some properties of your behavior, which may as well constrain only the actions that need to be coordinated with other agents.
For example, strategy is a kind of generalized action, which could take the form of a straightforwardly represented algorithm chosen for a certain situation (to act in response to possible future observations). After a strategy is played out, or if some condition indicates that it’s no longer applicable, decision making may resume its normal more general operation, so the mode of operation where your behavior becomes more tractable may be temporary. If this strategy includes a procedure for deciding whether to cooperate with similarly chosen strategies of other agents, it will do the trick, without taking on much more responsibility than a single action. It will just be the kind of action that’s smart enough to be able to cooperate with other agents’ actions.
So it is not necessary to change my whole code, just to create a new transparent “cooperation routine” and let it guide my behavior, with a possibility of ending this routine in case the other agents stop cooperating or something unexpected happens. That makes sense.
(Though in real life I would be rather afraid to self-modify in this way, because an imperfection in the cooperation routine could be exploited. Even if other agents’ cooperation routines contain no bug exploits for my routine, maybe they have already created some hidden sub-agents that will try to find and exploit bugs in my routine.)
Interesting. So a self-modifying agent might want to modify their own code to be easier to inspect, because this could make other agents trust them and cooperate with them.
Sometimes.
Even if I want to reprogram myself to make myself more legible, I need to know what algorithm will the other party use to read my code.
You could limit yourself to simply not actively obfuscating your own code.
It’s precisely this part which is impossible in a general case. You can reason only about a subset of algorithms which are compatible with your conclusion-making algorithm.
Proof:
1) It is impossible to guess if the program will stop computation if a finite time in a general case.
Proof by contradiction: Let’s suppose we have a method “Prophet.willStop(program)” that predicts whether given program will stop. How about this program? It would behave contrary to what the prediction says about it.
program Contrarian {
… if (Prophet.willStop(Contrarian)) {
… … loop_forever();
… } else {
… … // do nothing
… }
}
2) For any behavior “B” imagine a function “f” which you cannot predict whether it will stop or not. Will the following program exhibit the behavior “B”?
program Mysterious {
… f();
… B();
}
Yes, which why:
Some agents really are impossible to cooperate with even when it would be mutually beneficial. Either because they are irrational in an absolute sense or because their algorithm is intractable to you. That doesn’t prevent you from cooperating with the rest.
Interesting. So a self-modifying agent might want to modify their own code to be easier to inspect, because this could make other agents trust them and cooperate with them. Two questions:
What would be the cost of such modification? You cannot just rewrite any algorithm to a more legible form. If the agent modifies themselves to e.g. a regular expression (just joking), it will be able to do only what the regular expressions are able to do, which may be not enough for a complex situation. Limiting one’s own cognitive abilities seems like a dangerous move.
Even if I want to reprogram myself to make myself more legible, I need to know what algorithm will the other party use to read my code. How can I guess it? Or perhaps is it enough to meet the other agent, explain to each other our reading algorithms, and only then self-modify to become compatible with them? I am suspicious whether such process can be iterated—my intuition is that by conforming to one agent’s code analysis routines, I lose part of my abilities, which may make me unable to conform to other agent’s code analysis routines.
Any decision restricts what happens, for all you knew before making the decision, but doesn’t necessarily make future decisions more difficult. Coordinating with other agents requires deciding some properties of your behavior, which may as well constrain only the actions that need to be coordinated with other agents.
For example, strategy is a kind of generalized action, which could take the form of a straightforwardly represented algorithm chosen for a certain situation (to act in response to possible future observations). After a strategy is played out, or if some condition indicates that it’s no longer applicable, decision making may resume its normal more general operation, so the mode of operation where your behavior becomes more tractable may be temporary. If this strategy includes a procedure for deciding whether to cooperate with similarly chosen strategies of other agents, it will do the trick, without taking on much more responsibility than a single action. It will just be the kind of action that’s smart enough to be able to cooperate with other agents’ actions.
So it is not necessary to change my whole code, just to create a new transparent “cooperation routine” and let it guide my behavior, with a possibility of ending this routine in case the other agents stop cooperating or something unexpected happens. That makes sense.
(Though in real life I would be rather afraid to self-modify in this way, because an imperfection in the cooperation routine could be exploited. Even if other agents’ cooperation routines contain no bug exploits for my routine, maybe they have already created some hidden sub-agents that will try to find and exploit bugs in my routine.)
A real life analogy is a contract, with powerful government enforcing your precommitments.
Sometimes.
You could limit yourself to simply not actively obfuscating your own code.