“By the way, in this view there is no problem with perfect predictors, since they are just equivalent to the agent and become one of the locations where the agent finds itself”—Well, this still runs into issues as the simulated agent encounters an impossible situation, so aren’t we still required to use the work around (or another workaround if you’ve got one)?
“This shows that even equivalence of programs is too strong when searching for yourself in the world, or at least the proof of equivalence shouldn’t be irrelevant in the resulting dependence”—Hmm, agents may take multiple actions in a decision problem. So aren’t agents only equivalent to programs that take the same action in each situation? Anyway, I was talking about equivalence of worlds, not of agents, but this is still an interesting point that I need to think through. (Further, are you saying that agents should only be considered to have their behaviour linked to agents they are provably equivalent too and instead of all agents they are equivalent to?)
“A useful sense of an “impossible situation” won’t make it impossible to reason about”—That’s true. My first thought was to consider how the program represents its model the world and imagining running the program with impossible world model representations. However, the nice thing about modelling the inputs and treating model representations as integers rather than specific structures, is that it allows us to abstract away from these kinds of internal details. Is there a specific reason why you might want to avoid this abstraction?
UPDATE: I just re-read your comment and found that I significantly misunderstood it, so I’ve made some large edits to this comment. I’m still not completely sure that I understand what you were driving at.
Well, this still runs into issues as the simulated agent encounters an impossible situation
The simulated agent, together with the original agent, are removed from the world to form a dependence, which is a world with holes (free variables). If we substitute the agent term for the variables in the dependence, the result is equivalent (not necessarily syntactically equal) to the world term as originally given. To test a possible action, this possible action is substituted for the variables in the dependence. The resulting term no longer includes instances of the agent, instead it includes an action, so there is no contradiction.
Hmm, agents may take multiple actions in a decision problem. So aren’t agents only equivalent to programs that take the same action in each situation?
A protocol for interacting with environment can be expressed with the type of decision. So if an agent makes an action of type A depending on an observation of type O, we can instead consider (O->A) as the type of its decision, so that the only thing that it needs to do is produce a decision in this way, with interaction being something that happens to the decision and not the agent.
Requiring that only programs completely equivalent to the agent are to be considered its instances may seem too strong, and it probably is, but the problem is that it’s also not strong enough, because even with this requirement there are spurious dependencies that say that an agent is equivalent to a piece of paper that happens to contain a decision that coincides with agent’s own. So it’s a good simplification for focusing on logical counterfactuals (in the logical direction, which I believe is less hopeless than finding answers in probability).
Further, are you saying that agents should only be considered to have their behaviour linked to agents they are provably equivalent [to] instead of all agents they are equivalent to?
Not sure what the distinction you are making is. How would you define equivalence? By equivalence I meant equivalence of lambda terms, where one can be rewritten into the other with a sequence of alpha, reduction and expansion rules, or something like that. It’s judgemental/computational/reductional equality of type theory, as opposed to propositional equality, which can be weaker, but since judgemental equality is already too weak, it’s probably the wrong place to look for an improvement.
The simulated agent, together with the original agent, are removed from the world to form a dependence, which is a world with holes (free variables)
I’m still having difficulty understanding the process that you’re following, but let’s see if I can correctly guess this. Firstly you make a list of all potential situations that an agent may experience or for which an agent may be simulated. Decisions are included in this list, even if they might be incoherent for particular agents. In this example, these are:
Actual_Decision → Co-operate/Defect
Simulated_Decision → Co-operate/Defect
We then group all necessarily linked decisions together:
You then consider the tuple (equivalent to an observation-action map) that leads to the best outcome.
I agree that this provides the correct outcome, but I’m not persuaded that the reasoning is particularly solid. At some point we’ll want to be able to tie these models back to the real world and explain exactly what kind of hitchhiker corresponds to a (Defect, Defect) tuple. A hitchhiker that doesn’t get a lift? Sure, but what property of the hitchhiker makes it not get a lift?
We can’t talk about any actions it chooses in the actual world history, as it is never given the chance to make this decision. Next we could try constructing a counterfactual as per CDT and consider what the hitchhiker does in the world model where we’ve performed model surgery to make the hitchhiker arrive in town. However, as this is an impossible situation, there’s no guarantee that this decision is connected to any decision the agent makes in a possible situation. TDT counterfactuals don’t help either as they are equivalent to these tuples.
Alternatively, we could take the approach that you seem to favour and say that the agent makes the decision to defect in a paraconsistent situation where it is in town. But this assumes that the agent has the ability to handle paraconsistent situations when only some agents have this ability. It’s not clear how to interpret this for other agents. However, inputs have neither of these problems—all real world agents must do something given an input even if it is doing nothing or crashing and these are easy to interpret. So modelling inputs allows us to more rigorously justify the use of these maps. I’m beginning to think that there would be a whole post worth of material if I expanded upon this comment.
How would you define equivalence?
I think I was using the wrong term. I meant linked in the logical counterfactual sense, say two identical calculators. Is there a term for this? I was trying to understand whether you were saying that we only care about the provable linkages, rather than all such linkages.
Edit: Actually, after rereading over UDT, I can see that it is much more similar than I realised. For example, it also separates inputs from models. More detailed information is included at the bottom of the post.
Firstly you make a list of all potential situations that an agent may experience or for which an agent may be simulated. Decisions are included in this list, even if they might be incoherent for particular agents.
No? Situations are not evaluated, they contain instances of the agent, but when they are considered, it’s not yet known what the decision will be, so decisions are unknown, even if in principle determined by the (agents in the) situation. There is no matiching or assignment of possible decisions when we identify instances of the agent. Next, the instances are removed from the situation. At this point, decisions are no longer determined in the situations-with-holes (dependencies), since there are no agents and no decisions remaining in them. So there won’t be a contradiction in putting in any decisions after that (without the agents!) and seeing what happens.
I meant linked in the logical counterfactual sense, say two identical calculators.
That doesn’t seem different from what I meant, if appropriately formulated.
“By the way, in this view there is no problem with perfect predictors, since they are just equivalent to the agent and become one of the locations where the agent finds itself”—Well, this still runs into issues as the simulated agent encounters an impossible situation, so aren’t we still required to use the work around (or another workaround if you’ve got one)?
“This shows that even equivalence of programs is too strong when searching for yourself in the world, or at least the proof of equivalence shouldn’t be irrelevant in the resulting dependence”—Hmm, agents may take multiple actions in a decision problem. So aren’t agents only equivalent to programs that take the same action in each situation? Anyway, I was talking about equivalence of worlds, not of agents, but this is still an interesting point that I need to think through. (Further, are you saying that agents should only be considered to have their behaviour linked to agents they are provably equivalent too and instead of all agents they are equivalent to?)
“A useful sense of an “impossible situation” won’t make it impossible to reason about”—That’s true. My first thought was to consider how the program represents its model the world and imagining running the program with impossible world model representations. However, the nice thing about modelling the inputs and treating model representations as integers rather than specific structures, is that it allows us to abstract away from these kinds of internal details. Is there a specific reason why you might want to avoid this abstraction?
UPDATE: I just re-read your comment and found that I significantly misunderstood it, so I’ve made some large edits to this comment. I’m still not completely sure that I understand what you were driving at.
The simulated agent, together with the original agent, are removed from the world to form a dependence, which is a world with holes (free variables). If we substitute the agent term for the variables in the dependence, the result is equivalent (not necessarily syntactically equal) to the world term as originally given. To test a possible action, this possible action is substituted for the variables in the dependence. The resulting term no longer includes instances of the agent, instead it includes an action, so there is no contradiction.
A protocol for interacting with environment can be expressed with the type of decision. So if an agent makes an action of type A depending on an observation of type O, we can instead consider (O->A) as the type of its decision, so that the only thing that it needs to do is produce a decision in this way, with interaction being something that happens to the decision and not the agent.
Requiring that only programs completely equivalent to the agent are to be considered its instances may seem too strong, and it probably is, but the problem is that it’s also not strong enough, because even with this requirement there are spurious dependencies that say that an agent is equivalent to a piece of paper that happens to contain a decision that coincides with agent’s own. So it’s a good simplification for focusing on logical counterfactuals (in the logical direction, which I believe is less hopeless than finding answers in probability).
Not sure what the distinction you are making is. How would you define equivalence? By equivalence I meant equivalence of lambda terms, where one can be rewritten into the other with a sequence of alpha, reduction and expansion rules, or something like that. It’s judgemental/computational/reductional equality of type theory, as opposed to propositional equality, which can be weaker, but since judgemental equality is already too weak, it’s probably the wrong place to look for an improvement.
I’m still having difficulty understanding the process that you’re following, but let’s see if I can correctly guess this. Firstly you make a list of all potential situations that an agent may experience or for which an agent may be simulated. Decisions are included in this list, even if they might be incoherent for particular agents. In this example, these are:
Actual_Decision → Co-operate/Defect
Simulated_Decision → Co-operate/Defect
We then group all necessarily linked decisions together:
(Actual_Decision, Simulated_Decision) → (Co-operate, Co-operate)/(Defect, Defect)
You then consider the tuple (equivalent to an observation-action map) that leads to the best outcome.
I agree that this provides the correct outcome, but I’m not persuaded that the reasoning is particularly solid. At some point we’ll want to be able to tie these models back to the real world and explain exactly what kind of hitchhiker corresponds to a (Defect, Defect) tuple. A hitchhiker that doesn’t get a lift? Sure, but what property of the hitchhiker makes it not get a lift?
We can’t talk about any actions it chooses in the actual world history, as it is never given the chance to make this decision. Next we could try constructing a counterfactual as per CDT and consider what the hitchhiker does in the world model where we’ve performed model surgery to make the hitchhiker arrive in town. However, as this is an impossible situation, there’s no guarantee that this decision is connected to any decision the agent makes in a possible situation. TDT counterfactuals don’t help either as they are equivalent to these tuples.
Alternatively, we could take the approach that you seem to favour and say that the agent makes the decision to defect in a paraconsistent situation where it is in town. But this assumes that the agent has the ability to handle paraconsistent situations when only some agents have this ability. It’s not clear how to interpret this for other agents. However, inputs have neither of these problems—all real world agents must do something given an input even if it is doing nothing or crashing and these are easy to interpret. So modelling inputs allows us to more rigorously justify the use of these maps. I’m beginning to think that there would be a whole post worth of material if I expanded upon this comment.
I think I was using the wrong term. I meant linked in the logical counterfactual sense, say two identical calculators. Is there a term for this? I was trying to understand whether you were saying that we only care about the provable linkages, rather than all such linkages.
Edit: Actually, after rereading over UDT, I can see that it is much more similar than I realised. For example, it also separates inputs from models. More detailed information is included at the bottom of the post.
No? Situations are not evaluated, they contain instances of the agent, but when they are considered, it’s not yet known what the decision will be, so decisions are unknown, even if in principle determined by the (agents in the) situation. There is no matiching or assignment of possible decisions when we identify instances of the agent. Next, the instances are removed from the situation. At this point, decisions are no longer determined in the situations-with-holes (dependencies), since there are no agents and no decisions remaining in them. So there won’t be a contradiction in putting in any decisions after that (without the agents!) and seeing what happens.
That doesn’t seem different from what I meant, if appropriately formulated.