If you’re in a situation, then it must be possible?
There is a sense in which you can’t conclude this. For a given notion of reasoning about potentially impossible situations, you can reason about such situations that contain agents, and you can see how these agents in the impossible situations think. If the situation doesn’t tell the agent whether it’s possible or impossible (say, through observations), the agent inside won’t be able to tell if it’s an impossible situation. Always concluding that the present situation is possible will result in error in the impossible situations (so it’s even worse than being unjustified). Errors in impossible situations may matter if something that matters depends on how you reason in impossible situations (for example, a “predictor” in a possible situation that asks what you would do in impossible situations).
We could look at, “I commit to making <situation> impossible”, but that doesn’t mean anything either.
A useful sense of an “impossible situation” won’t make it impossible to reason about. There’s probably something wrong with it, but not to the extent that it can’t be considered. Maybe it falls apart if you look too closely, or maybe it has no moral worth and so should be discarded from decision making. But even in these cases it might be instrumentally valuable, because something in morally relevant worlds depends on this kind of reasoning. You might not approve of this kind of reasoning and call it meaningless, but other things in the world can perform it regardless of your judgement, and it’s useful to understand how that happens to be able to control them.
Finally, some notions of “impossible situation” will say that a “situation” is possible/impossible depending on what happens inside it, and there may be agents inside it. In that case, their decisions may affect whether a given situation is considered “possible” or “impossible”, and if these agents are familiar with this notion they can aim to make a given situation they find themselves in possible or impossible.
“There is a sense in which you can’t conclude this”—Well this paragraph is pretty much an informal description of how my technique works. Only I differentiate between world models and representations of world models. Agents can’t operate on incoherent world models, but they can operate on representations of world models that are incoherent for this agent. It’s also the reason why I separated out observations from models.
“In that case, their decisions may affect whether a given situation is considered “possible” or “impossible”, and if these agents are familiar with this notion they can aim to make a given situation they find themselfs in possible or impossible”—My answer to this question is that it is meaningless to ask what an agent does given an impossible situation, but meaningful to ask what it does given an impossible input (which ultimately represents an impossible situation).
I get the impression that you didn’t quite grasp the general point of this post. I suspect that the reason may be that the formal description may be less skippable than I originally thought.
I was replying specifically to those remarks, on their use of terminology, not to the thesis of the post. I disagree with the framing of “impossible situations” and “meaningless” for the reasons I described. I think it’s useful to let these words (in the context of decision theory) take default meaning that makes the statements I quoted misleading.
My answer to this question is that it is meaningless to ask what an agent does given an impossible situation, but meaningful to ask what it does given an impossible input (which ultimately represents an impossible situation).
That’s the thing: if this “impossible input” represents an “impossible situation”, and it’s possible to ask what happens for this input, that gives a way of reasoning about the “impossible situation”, in which case it’s misleading to say that “it is meaningless to ask what an agent does given an impossible situation”. I of course agree that you can make a technical distinction, but even then it’s not clear what you mean by calling an idea “meaningless” when you immediately proceed to give a way of reasoning about (a technical reformulation of) that idea.
If an idea is confused in some way, even significantly, that shouldn’t be enough to declare it “meaningless”. Perhaps “hopelessly confused” and “useless”, but not yet “meaningless”. Unless you are talking about a more specific sense of “meaning”, which you didn’t stipulate. My guess is that by “meaningless” you meant that you don’t see how it could ever be made clear in its original form, or that in the context of this post it’s not at all clear compared to the idea of “impossible input” that’s actually clarified. But that’s an unusual sense for that word.
I guess I saw those mainly as framing remarks, so I may have been less careful with my language than elsewhere. Maybe “meaningless” is a strong word, but I only meant it in a specific way that I hoped was clear enough from context.
I was using situations to refer to objects where the equivalence function is logical equivalence, whilst I was using representations to refer to objects where the equivalence function is the specific formulation. My point was that all impossible situations are logically equivalent, so asking what an agent does in this situation is of limited use. An agent that operates directly on such impossible situations can only have one such response to these situations, even across multiple problems. On the other hand, representations don’t have this limitation.
My point was that all impossible situations are logically equivalent
Yes, the way you are formulating this, as a theory that includes claims about agent’s action or other counterfactual things together with things from the original setting that contradict them such as agent’s program. It’s also very natural to excise parts of a situation (just as you do in the post) and replace them with the alternatives you are considering. It’s what happens with causal surgery.
An agent that operates directly on such impossible situations can only have one such response to these situations, even across multiple problems.
If it respects equivalence of theories (which is in general impossible to decide) and doesn’t know where the theories came from, so that this essential data is somehow lost before that point. I think it’s useful to split this process into two phases, where first the agent looks for itself in the worlds it cares about, and only then considers the consequences of alternative actions. The first phase gives a world that has all discovered instances of the agent excised from it (a “dependence” of world on agent), so that on the second phase we can plug in alternative actions (or strategies, maps from observations to actions, as the type of the excised agent will be something like exponential if the agent expects input).
At that point the difficulty is mostly on the first phase, formulation of dependence. (By the way, in this view there is no problem with perfect predictors, since they are just equivalent to the agent and become one of the locations where the agent finds itself, no different from any other. It’s the imperfect predictors, such as too-weak predictors of Agent-Simulates-Predictor (ASP) or other such things that cause trouble.) The main difficulty here is spurious dependencies, since in principle the agent is equivalent to their actual action, and so conversely the value of their actual action found somewhere in the world is equivalent to the agent. So the agent finds itself behind all answers “No” in the world (uttered by anyone and anything) if it turns out that their actual action is “No” etc., and the consequences of answering “Yes” then involve changing all answers “No” to “Yes” everywhere in the world. (When running the search, the agent won’t actually encounter spurious dependencies under certain circumstances, but that’s a bit flimsy.)
This shows that even equivalence of programs is too strong when searching for yourself in the world, or at least the proof of equivalence shouldn’t be irrelevant in the resulting dependence. So this framing doesn’t actually help with logical counterfactuals, but at least the second phase where we consider alternative actions is spared the trouble, if we somehow manage to find useful dependencies.
“By the way, in this view there is no problem with perfect predictors, since they are just equivalent to the agent and become one of the locations where the agent finds itself”—Well, this still runs into issues as the simulated agent encounters an impossible situation, so aren’t we still required to use the work around (or another workaround if you’ve got one)?
“This shows that even equivalence of programs is too strong when searching for yourself in the world, or at least the proof of equivalence shouldn’t be irrelevant in the resulting dependence”—Hmm, agents may take multiple actions in a decision problem. So aren’t agents only equivalent to programs that take the same action in each situation? Anyway, I was talking about equivalence of worlds, not of agents, but this is still an interesting point that I need to think through. (Further, are you saying that agents should only be considered to have their behaviour linked to agents they are provably equivalent too and instead of all agents they are equivalent to?)
“A useful sense of an “impossible situation” won’t make it impossible to reason about”—That’s true. My first thought was to consider how the program represents its model the world and imagining running the program with impossible world model representations. However, the nice thing about modelling the inputs and treating model representations as integers rather than specific structures, is that it allows us to abstract away from these kinds of internal details. Is there a specific reason why you might want to avoid this abstraction?
UPDATE: I just re-read your comment and found that I significantly misunderstood it, so I’ve made some large edits to this comment. I’m still not completely sure that I understand what you were driving at.
Well, this still runs into issues as the simulated agent encounters an impossible situation
The simulated agent, together with the original agent, are removed from the world to form a dependence, which is a world with holes (free variables). If we substitute the agent term for the variables in the dependence, the result is equivalent (not necessarily syntactically equal) to the world term as originally given. To test a possible action, this possible action is substituted for the variables in the dependence. The resulting term no longer includes instances of the agent, instead it includes an action, so there is no contradiction.
Hmm, agents may take multiple actions in a decision problem. So aren’t agents only equivalent to programs that take the same action in each situation?
A protocol for interacting with environment can be expressed with the type of decision. So if an agent makes an action of type A depending on an observation of type O, we can instead consider (O->A) as the type of its decision, so that the only thing that it needs to do is produce a decision in this way, with interaction being something that happens to the decision and not the agent.
Requiring that only programs completely equivalent to the agent are to be considered its instances may seem too strong, and it probably is, but the problem is that it’s also not strong enough, because even with this requirement there are spurious dependencies that say that an agent is equivalent to a piece of paper that happens to contain a decision that coincides with agent’s own. So it’s a good simplification for focusing on logical counterfactuals (in the logical direction, which I believe is less hopeless than finding answers in probability).
Further, are you saying that agents should only be considered to have their behaviour linked to agents they are provably equivalent [to] instead of all agents they are equivalent to?
Not sure what the distinction you are making is. How would you define equivalence? By equivalence I meant equivalence of lambda terms, where one can be rewritten into the other with a sequence of alpha, reduction and expansion rules, or something like that. It’s judgemental/computational/reductional equality of type theory, as opposed to propositional equality, which can be weaker, but since judgemental equality is already too weak, it’s probably the wrong place to look for an improvement.
The simulated agent, together with the original agent, are removed from the world to form a dependence, which is a world with holes (free variables)
I’m still having difficulty understanding the process that you’re following, but let’s see if I can correctly guess this. Firstly you make a list of all potential situations that an agent may experience or for which an agent may be simulated. Decisions are included in this list, even if they might be incoherent for particular agents. In this example, these are:
Actual_Decision → Co-operate/Defect
Simulated_Decision → Co-operate/Defect
We then group all necessarily linked decisions together:
You then consider the tuple (equivalent to an observation-action map) that leads to the best outcome.
I agree that this provides the correct outcome, but I’m not persuaded that the reasoning is particularly solid. At some point we’ll want to be able to tie these models back to the real world and explain exactly what kind of hitchhiker corresponds to a (Defect, Defect) tuple. A hitchhiker that doesn’t get a lift? Sure, but what property of the hitchhiker makes it not get a lift?
We can’t talk about any actions it chooses in the actual world history, as it is never given the chance to make this decision. Next we could try constructing a counterfactual as per CDT and consider what the hitchhiker does in the world model where we’ve performed model surgery to make the hitchhiker arrive in town. However, as this is an impossible situation, there’s no guarantee that this decision is connected to any decision the agent makes in a possible situation. TDT counterfactuals don’t help either as they are equivalent to these tuples.
Alternatively, we could take the approach that you seem to favour and say that the agent makes the decision to defect in a paraconsistent situation where it is in town. But this assumes that the agent has the ability to handle paraconsistent situations when only some agents have this ability. It’s not clear how to interpret this for other agents. However, inputs have neither of these problems—all real world agents must do something given an input even if it is doing nothing or crashing and these are easy to interpret. So modelling inputs allows us to more rigorously justify the use of these maps. I’m beginning to think that there would be a whole post worth of material if I expanded upon this comment.
How would you define equivalence?
I think I was using the wrong term. I meant linked in the logical counterfactual sense, say two identical calculators. Is there a term for this? I was trying to understand whether you were saying that we only care about the provable linkages, rather than all such linkages.
Edit: Actually, after rereading over UDT, I can see that it is much more similar than I realised. For example, it also separates inputs from models. More detailed information is included at the bottom of the post.
Firstly you make a list of all potential situations that an agent may experience or for which an agent may be simulated. Decisions are included in this list, even if they might be incoherent for particular agents.
No? Situations are not evaluated, they contain instances of the agent, but when they are considered, it’s not yet known what the decision will be, so decisions are unknown, even if in principle determined by the (agents in the) situation. There is no matiching or assignment of possible decisions when we identify instances of the agent. Next, the instances are removed from the situation. At this point, decisions are no longer determined in the situations-with-holes (dependencies), since there are no agents and no decisions remaining in them. So there won’t be a contradiction in putting in any decisions after that (without the agents!) and seeing what happens.
I meant linked in the logical counterfactual sense, say two identical calculators.
That doesn’t seem different from what I meant, if appropriately formulated.
There is a sense in which you can’t conclude this. For a given notion of reasoning about potentially impossible situations, you can reason about such situations that contain agents, and you can see how these agents in the impossible situations think. If the situation doesn’t tell the agent whether it’s possible or impossible (say, through observations), the agent inside won’t be able to tell if it’s an impossible situation. Always concluding that the present situation is possible will result in error in the impossible situations (so it’s even worse than being unjustified). Errors in impossible situations may matter if something that matters depends on how you reason in impossible situations (for example, a “predictor” in a possible situation that asks what you would do in impossible situations).
A useful sense of an “impossible situation” won’t make it impossible to reason about. There’s probably something wrong with it, but not to the extent that it can’t be considered. Maybe it falls apart if you look too closely, or maybe it has no moral worth and so should be discarded from decision making. But even in these cases it might be instrumentally valuable, because something in morally relevant worlds depends on this kind of reasoning. You might not approve of this kind of reasoning and call it meaningless, but other things in the world can perform it regardless of your judgement, and it’s useful to understand how that happens to be able to control them.
Finally, some notions of “impossible situation” will say that a “situation” is possible/impossible depending on what happens inside it, and there may be agents inside it. In that case, their decisions may affect whether a given situation is considered “possible” or “impossible”, and if these agents are familiar with this notion they can aim to make a given situation they find themselves in possible or impossible.
“There is a sense in which you can’t conclude this”—Well this paragraph is pretty much an informal description of how my technique works. Only I differentiate between world models and representations of world models. Agents can’t operate on incoherent world models, but they can operate on representations of world models that are incoherent for this agent. It’s also the reason why I separated out observations from models.
“In that case, their decisions may affect whether a given situation is considered “possible” or “impossible”, and if these agents are familiar with this notion they can aim to make a given situation they find themselfs in possible or impossible”—My answer to this question is that it is meaningless to ask what an agent does given an impossible situation, but meaningful to ask what it does given an impossible input (which ultimately represents an impossible situation).
I get the impression that you didn’t quite grasp the general point of this post. I suspect that the reason may be that the formal description may be less skippable than I originally thought.
I was replying specifically to those remarks, on their use of terminology, not to the thesis of the post. I disagree with the framing of “impossible situations” and “meaningless” for the reasons I described. I think it’s useful to let these words (in the context of decision theory) take default meaning that makes the statements I quoted misleading.
That’s the thing: if this “impossible input” represents an “impossible situation”, and it’s possible to ask what happens for this input, that gives a way of reasoning about the “impossible situation”, in which case it’s misleading to say that “it is meaningless to ask what an agent does given an impossible situation”. I of course agree that you can make a technical distinction, but even then it’s not clear what you mean by calling an idea “meaningless” when you immediately proceed to give a way of reasoning about (a technical reformulation of) that idea.
If an idea is confused in some way, even significantly, that shouldn’t be enough to declare it “meaningless”. Perhaps “hopelessly confused” and “useless”, but not yet “meaningless”. Unless you are talking about a more specific sense of “meaning”, which you didn’t stipulate. My guess is that by “meaningless” you meant that you don’t see how it could ever be made clear in its original form, or that in the context of this post it’s not at all clear compared to the idea of “impossible input” that’s actually clarified. But that’s an unusual sense for that word.
I guess I saw those mainly as framing remarks, so I may have been less careful with my language than elsewhere. Maybe “meaningless” is a strong word, but I only meant it in a specific way that I hoped was clear enough from context.
I was using situations to refer to objects where the equivalence function is logical equivalence, whilst I was using representations to refer to objects where the equivalence function is the specific formulation. My point was that all impossible situations are logically equivalent, so asking what an agent does in this situation is of limited use. An agent that operates directly on such impossible situations can only have one such response to these situations, even across multiple problems. On the other hand, representations don’t have this limitation.
Yes, the way you are formulating this, as a theory that includes claims about agent’s action or other counterfactual things together with things from the original setting that contradict them such as agent’s program. It’s also very natural to excise parts of a situation (just as you do in the post) and replace them with the alternatives you are considering. It’s what happens with causal surgery.
If it respects equivalence of theories (which is in general impossible to decide) and doesn’t know where the theories came from, so that this essential data is somehow lost before that point. I think it’s useful to split this process into two phases, where first the agent looks for itself in the worlds it cares about, and only then considers the consequences of alternative actions. The first phase gives a world that has all discovered instances of the agent excised from it (a “dependence” of world on agent), so that on the second phase we can plug in alternative actions (or strategies, maps from observations to actions, as the type of the excised agent will be something like exponential if the agent expects input).
At that point the difficulty is mostly on the first phase, formulation of dependence. (By the way, in this view there is no problem with perfect predictors, since they are just equivalent to the agent and become one of the locations where the agent finds itself, no different from any other. It’s the imperfect predictors, such as too-weak predictors of Agent-Simulates-Predictor (ASP) or other such things that cause trouble.) The main difficulty here is spurious dependencies, since in principle the agent is equivalent to their actual action, and so conversely the value of their actual action found somewhere in the world is equivalent to the agent. So the agent finds itself behind all answers “No” in the world (uttered by anyone and anything) if it turns out that their actual action is “No” etc., and the consequences of answering “Yes” then involve changing all answers “No” to “Yes” everywhere in the world. (When running the search, the agent won’t actually encounter spurious dependencies under certain circumstances, but that’s a bit flimsy.)
This shows that even equivalence of programs is too strong when searching for yourself in the world, or at least the proof of equivalence shouldn’t be irrelevant in the resulting dependence. So this framing doesn’t actually help with logical counterfactuals, but at least the second phase where we consider alternative actions is spared the trouble, if we somehow manage to find useful dependencies.
“By the way, in this view there is no problem with perfect predictors, since they are just equivalent to the agent and become one of the locations where the agent finds itself”—Well, this still runs into issues as the simulated agent encounters an impossible situation, so aren’t we still required to use the work around (or another workaround if you’ve got one)?
“This shows that even equivalence of programs is too strong when searching for yourself in the world, or at least the proof of equivalence shouldn’t be irrelevant in the resulting dependence”—Hmm, agents may take multiple actions in a decision problem. So aren’t agents only equivalent to programs that take the same action in each situation? Anyway, I was talking about equivalence of worlds, not of agents, but this is still an interesting point that I need to think through. (Further, are you saying that agents should only be considered to have their behaviour linked to agents they are provably equivalent too and instead of all agents they are equivalent to?)
“A useful sense of an “impossible situation” won’t make it impossible to reason about”—That’s true. My first thought was to consider how the program represents its model the world and imagining running the program with impossible world model representations. However, the nice thing about modelling the inputs and treating model representations as integers rather than specific structures, is that it allows us to abstract away from these kinds of internal details. Is there a specific reason why you might want to avoid this abstraction?
UPDATE: I just re-read your comment and found that I significantly misunderstood it, so I’ve made some large edits to this comment. I’m still not completely sure that I understand what you were driving at.
The simulated agent, together with the original agent, are removed from the world to form a dependence, which is a world with holes (free variables). If we substitute the agent term for the variables in the dependence, the result is equivalent (not necessarily syntactically equal) to the world term as originally given. To test a possible action, this possible action is substituted for the variables in the dependence. The resulting term no longer includes instances of the agent, instead it includes an action, so there is no contradiction.
A protocol for interacting with environment can be expressed with the type of decision. So if an agent makes an action of type A depending on an observation of type O, we can instead consider (O->A) as the type of its decision, so that the only thing that it needs to do is produce a decision in this way, with interaction being something that happens to the decision and not the agent.
Requiring that only programs completely equivalent to the agent are to be considered its instances may seem too strong, and it probably is, but the problem is that it’s also not strong enough, because even with this requirement there are spurious dependencies that say that an agent is equivalent to a piece of paper that happens to contain a decision that coincides with agent’s own. So it’s a good simplification for focusing on logical counterfactuals (in the logical direction, which I believe is less hopeless than finding answers in probability).
Not sure what the distinction you are making is. How would you define equivalence? By equivalence I meant equivalence of lambda terms, where one can be rewritten into the other with a sequence of alpha, reduction and expansion rules, or something like that. It’s judgemental/computational/reductional equality of type theory, as opposed to propositional equality, which can be weaker, but since judgemental equality is already too weak, it’s probably the wrong place to look for an improvement.
I’m still having difficulty understanding the process that you’re following, but let’s see if I can correctly guess this. Firstly you make a list of all potential situations that an agent may experience or for which an agent may be simulated. Decisions are included in this list, even if they might be incoherent for particular agents. In this example, these are:
Actual_Decision → Co-operate/Defect
Simulated_Decision → Co-operate/Defect
We then group all necessarily linked decisions together:
(Actual_Decision, Simulated_Decision) → (Co-operate, Co-operate)/(Defect, Defect)
You then consider the tuple (equivalent to an observation-action map) that leads to the best outcome.
I agree that this provides the correct outcome, but I’m not persuaded that the reasoning is particularly solid. At some point we’ll want to be able to tie these models back to the real world and explain exactly what kind of hitchhiker corresponds to a (Defect, Defect) tuple. A hitchhiker that doesn’t get a lift? Sure, but what property of the hitchhiker makes it not get a lift?
We can’t talk about any actions it chooses in the actual world history, as it is never given the chance to make this decision. Next we could try constructing a counterfactual as per CDT and consider what the hitchhiker does in the world model where we’ve performed model surgery to make the hitchhiker arrive in town. However, as this is an impossible situation, there’s no guarantee that this decision is connected to any decision the agent makes in a possible situation. TDT counterfactuals don’t help either as they are equivalent to these tuples.
Alternatively, we could take the approach that you seem to favour and say that the agent makes the decision to defect in a paraconsistent situation where it is in town. But this assumes that the agent has the ability to handle paraconsistent situations when only some agents have this ability. It’s not clear how to interpret this for other agents. However, inputs have neither of these problems—all real world agents must do something given an input even if it is doing nothing or crashing and these are easy to interpret. So modelling inputs allows us to more rigorously justify the use of these maps. I’m beginning to think that there would be a whole post worth of material if I expanded upon this comment.
I think I was using the wrong term. I meant linked in the logical counterfactual sense, say two identical calculators. Is there a term for this? I was trying to understand whether you were saying that we only care about the provable linkages, rather than all such linkages.
Edit: Actually, after rereading over UDT, I can see that it is much more similar than I realised. For example, it also separates inputs from models. More detailed information is included at the bottom of the post.
No? Situations are not evaluated, they contain instances of the agent, but when they are considered, it’s not yet known what the decision will be, so decisions are unknown, even if in principle determined by the (agents in the) situation. There is no matiching or assignment of possible decisions when we identify instances of the agent. Next, the instances are removed from the situation. At this point, decisions are no longer determined in the situations-with-holes (dependencies), since there are no agents and no decisions remaining in them. So there won’t be a contradiction in putting in any decisions after that (without the agents!) and seeing what happens.
That doesn’t seem different from what I meant, if appropriately formulated.