Is it possible to change the definition of A() to something like this: Let UX(X) be the function that results from the source of U() if all calls “A()” were changed to “eval(X)”. For some action p, prove that for any possible agent X, UX(X) ⇐ UX(“return p”). return p.
Then the agent would be able to prove that it returns a specific action, and yet there would be no contradictions.
It seems to me this should be better than the “chicken rule” (it’s really funny and clever, though! :)). But I’m getting no response to similarcomments in other posts. Am I making some ridiculous mistake here?
So, one of the classical examples of what we want this decision algorithm to do is the following: if it is playing a Prisoner’s dilemma game with an identical algorithm, then it should cooperate. The informal reasoning (which this post is trying to formalize) is the following: the agent proves that the output of A() and of A’() are always equal, compares the values of “A() = A’() = cooperate” and “A() = A’() = defect” and decides to cooperate.
The key point is that the agent is not initially told which other agents are identical to it. So your suggestion fails to work because if we replace “A()” with “eval(cooperate)” or “eval(defect)” then we end up proving things about how the agent A’() plays against a CooperateBot or a DefectBot. We conclude that A’() defects against both of these, and end up defecting ourselves.
def A(): Forall i try to prove A()==A_i() Let UX(X) be the function that results from the source of U() if all calls “A_i()”, for which the system was able to prove A()==A_i() were changed to “eval(X)”. For some action p, prove that for any possible action q, UX(“return q”) ⇐ UX(“return p”). Return p.
What results should be equivalent to TDT, I think.
If every step in that code is valid, then that would work. In my opinion the shakiest step in your algorithm is the “Forall i” in the first line of A(). I’ll probably reason myself into a hole if I try to figure out whether or not it’s valid.
Well, at least in some cases (e.g., if A’s code is equal to A_i’s code) the proof is immediate. Proof is also possible for minor obvious modifications, like variable renaming, so the agent should still behave better than a quining agent for PD.
My goal for inventing this is to have a consistent agent that can know (can prove) its decision beforehand, at least sometimes, without getting into contradictions...
Is it possible to change the definition of A() to something like this:
Let UX(X) be the function that results from the source of U() if all calls “A()” were changed to “eval(X)”.
For some action p, prove that for any possible agent X, UX(X) ⇐ UX(“return p”).
return p.
Then the agent would be able to prove that it returns a specific action, and yet there would be no contradictions.
It seems to me this should be better than the “chicken rule” (it’s really funny and clever, though! :)). But I’m getting no response to similar comments in other posts. Am I making some ridiculous mistake here?
So, one of the classical examples of what we want this decision algorithm to do is the following: if it is playing a Prisoner’s dilemma game with an identical algorithm, then it should cooperate. The informal reasoning (which this post is trying to formalize) is the following: the agent proves that the output of A() and of A’() are always equal, compares the values of “A() = A’() = cooperate” and “A() = A’() = defect” and decides to cooperate.
The key point is that the agent is not initially told which other agents are identical to it. So your suggestion fails to work because if we replace “A()” with “eval(cooperate)” or “eval(defect)” then we end up proving things about how the agent A’() plays against a CooperateBot or a DefectBot. We conclude that A’() defects against both of these, and end up defecting ourselves.
Ok, modification:
def U():
box1 = 1000
box2 = (A_1() == 2) ? 0 : 1000000
return box2 + ((A_2() == 2) ? box1 : 0)
or
def U():
if(A_1() == ‘C’ and A_2() == ‘C’) return 2;
if(A_1() == ‘D’ and A_2() == ‘C’) return 3;
…
def A():
Forall i try to prove A()==A_i()
Let UX(X) be the function that results from the source of U() if all calls “A_i()”, for which the system was able to prove A()==A_i() were changed to “eval(X)”.
For some action p, prove that for any possible action q, UX(“return q”) ⇐ UX(“return p”).
Return p.
What results should be equivalent to TDT, I think.
If every step in that code is valid, then that would work. In my opinion the shakiest step in your algorithm is the “Forall i” in the first line of A(). I’ll probably reason myself into a hole if I try to figure out whether or not it’s valid.
Well, at least in some cases (e.g., if A’s code is equal to A_i’s code) the proof is immediate. Proof is also possible for minor obvious modifications, like variable renaming, so the agent should still behave better than a quining agent for PD.
My goal for inventing this is to have a consistent agent that can know (can prove) its decision beforehand, at least sometimes, without getting into contradictions...
No, it wouldn’t defect against itself, because UX will call the same eval(“return p”) twice:
UX(X) = { return PDPayoffMatrix[eval(X), eval(X)]; }
The payoff with p=cooperate is greater, so the agent will cooperate.
EDIT: Sorry, that was all wrong, and missed your point. Thinking now...