Truly logical counterfactuals really only make sense in the context of bounded rationality. That is, cases where there is a logically necessary proposition, but the agent cannot determine it within their resource bounds. Essentially all aspects of bounded rationality have no satisfactory treatment as yet.
The prisoners’ dilemma question does not appear to require dealing with logical counterfactuals. It is not logically contradictory for two agents to make different choices in the same situation, or even for the same agent to make different decisions given the same situation, though the setup of some scenarios may make it very unlikely or even direct you to ignore such possibilities.
If they have source code, then they are not perfectly rational and cannot in general implement LDT. They can at best implement a boundedly rational subset of LDT, which will have flaws.
Assume the contrary: Then each agent can verify that the other implements LDT, since perfect knowledge of the other’s source code includes the knowledge that it implements LDT. In particular, each can verify that the other’s code implements a consistent system that includes arithmetic, and can run the other on their own source to consequently verify that they themselves implement a consistent system that includes arithmetic. This is not possible for any consistent system.
The only way that consistency can be preserved is that at least one cannot actually verify that the other has a consistent deduction system including arithmetic. So at least one of those agents is not a LDT agent with perfect knowledge of each other’s source code.
We can in principle assume perfectly rational agents that implement LDT, but they cannot be described by any algorithm and we should be extremely careful in making suppositions about what they can deduce about each other and themselves.
The Halting problem is a worst case result. Most agents aren’t maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don’t halt.
There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement “if I cooperate, then they cooperate” and cooperating if they found a proof.
(Ie searching all proofs containing <10^100 symbols)
There are set ups where each agent is using an nonphysically large but finite amount of compute.
In a situation where you are asking a question about an ideal reasoner, having the agents be finite means you are no longer asking it about an ideal reasoner. If you put an ideal reasoner in a Newcomb problem, he may very well think “I’ll simulate Omega and act according to what I find”. (Or more likely, some more complicated algorithm that indirectly amounts to that.) If the agent can’t do this, he may not be able to solve the problem. Of course, real humans can’t, but this may just mean that real humans are, because they are finite, unable to solve some problems.
Truly logical counterfactuals really only make sense in the context of bounded rationality. That is, cases where there is a logically necessary proposition, but the agent cannot determine it within their resource bounds. Essentially all aspects of bounded rationality have no satisfactory treatment as yet.
The prisoners’ dilemma question does not appear to require dealing with logical counterfactuals. It is not logically contradictory for two agents to make different choices in the same situation, or even for the same agent to make different decisions given the same situation, though the setup of some scenarios may make it very unlikely or even direct you to ignore such possibilities.
If two Logical Decision Theory agents with perfect knowledge of each other’s source code play prisoners dilemma, theoretically they should cooperate.
LDT uses logical counterfactuals in the decision making.
If the agents are CDT, then logical counterfactuals are not involved.
If they have source code, then they are not perfectly rational and cannot in general implement LDT. They can at best implement a boundedly rational subset of LDT, which will have flaws.
Assume the contrary: Then each agent can verify that the other implements LDT, since perfect knowledge of the other’s source code includes the knowledge that it implements LDT. In particular, each can verify that the other’s code implements a consistent system that includes arithmetic, and can run the other on their own source to consequently verify that they themselves implement a consistent system that includes arithmetic. This is not possible for any consistent system.
The only way that consistency can be preserved is that at least one cannot actually verify that the other has a consistent deduction system including arithmetic. So at least one of those agents is not a LDT agent with perfect knowledge of each other’s source code.
We can in principle assume perfectly rational agents that implement LDT, but they cannot be described by any algorithm and we should be extremely careful in making suppositions about what they can deduce about each other and themselves.
I get the impression that “has the agent’s source code” is some Yudkowskyism which people use without thinking.
Every time someone says that, I always wonder “are you claiming that the agent that reads the source code is able to solve the Halting Problem?”
The Halting problem is a worst case result. Most agents aren’t maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don’t halt.
There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement “if I cooperate, then they cooperate” and cooperating if they found a proof.
(Ie searching all proofs containing <10^100 symbols)
In a situation where you are asking a question about an ideal reasoner, having the agents be finite means you are no longer asking it about an ideal reasoner. If you put an ideal reasoner in a Newcomb problem, he may very well think “I’ll simulate Omega and act according to what I find”. (Or more likely, some more complicated algorithm that indirectly amounts to that.) If the agent can’t do this, he may not be able to solve the problem. Of course, real humans can’t, but this may just mean that real humans are, because they are finite, unable to solve some problems.
There is a model of bounded rationality, logical induction.
Can that be used to handle logical counterfactuals?