I still have doubts about the consistency of this architecture. What if the agent sees a copy of itself perform some action in a situation exactly equal to the situation in which the agent finds itself now. Would it not mean that the agent can now prove that it would perform the same action? (There would be a difference between the agent and the copy, but only an “additive” difference—the agent will have additional knowledge that the copy wouldn’t—so whatever the copy proved, the agent must also be able to prove. [And this fact would be provable to the agent!]).
An example situation would be the Parfit’s hitchhiker. If the system finds itself in town, saved by the driver, would it not be able to prove that it will cooperate?
In fact, the system has its own source code, so won’t it directly simulate itself in any case, as a side-effect of running S? Guess it’s just a standard proof why S is impossible...
I still have doubts about the consistency of this architecture. What if the agent sees a copy of itself perform some action in a situation exactly equal to the situation in which the agent finds itself now. Would it not mean that the agent can now prove that it would perform the same action? (There would be a difference between the agent and the copy, but only an “additive” difference—the agent will have additional knowledge that the copy wouldn’t—so whatever the copy proved, the agent must also be able to prove. [And this fact would be provable to the agent!]).
An example situation would be the Parfit’s hitchhiker. If the system finds itself in town, saved by the driver, would it not be able to prove that it will cooperate?
In fact, the system has its own source code, so won’t it directly simulate itself in any case, as a side-effect of running S? Guess it’s just a standard proof why S is impossible...