The interpretation issue of a decision problem should be mostly gone when we formally specify it
In order to formally specify a problem, you will have already explicitly or implicitly expressed what an interpretation of what decision theory problems are. But this doesn’t make the question, “Is this interpretation valid?” disappear. If we take my approach, we will need to provide a philosophical justification for the forgetting; if we take yours, we’ll need to provide a philosophical justification that we care about the results of these kinds of paraconsistent situations. Either way, there will be further work beyond the formularisation.
The decision algorithm considers each output from a given set… It’s a property of the formalism, but it doesn’t seem like a particularly concerning one
This ties into the point I’ll discuss later about how I think being able to ask an external observer to evaluate whether an actual real agent took the optimal decision is the core problem in tying real world decision theory problems to the more abstract theoretical decision theory problems. Further down you write:
The agent already considers what it considers (just like it already does what it does)
But I’m trying to find a way of evaluating an agent from the external perspective. Here, it is valid to criticise an agent for not selecting as action that it didn’t consider. Further, it isn’t always clear what actions are “considered” as not all agent might have a loop over all actions and they may use shortcuts to avoid explicitly evaluating a certain action.
I feel like I’m over-stating my position a bit in the following, but: this doesn’t seem any different from saying that if we provide a logical counterfactual, we solve decision theory for free
“Forgetting” has a large number of free parameters, but so does “deontology” or “virtue ethics”. I’ve provided some examples and key details about how this would proceed, but I don’t think you can expect too much more in this very preliminary stage. When I said that a forgetting criteria would solve the problem of logical counterfactuals for free, this was a slight exaggeration. We would still have to justify why we care about raw counterfactuals, but, actually being consistent, this would seem to be a much easier task than arguing that we should care about what happens in the kind of inconsistent situations generated by paraconsistent approaches.
I disagree with your foundations foundations post in so far as it describes what I’m interested in as not being agent foundations foundations
Your version of the 5&10 problem… The agent takes some action, since it is fully defined, and the problem is that the decision theorist doesn’t know how to judge the agent’s decision.
That’s exactly how I’d put it. Except I would say I’m interested in the problem from the external perspective and the reflective perspective. I just see the external perspective as easier to understand first.
From the agent’s perspective, the 5&10 problem does not necessarily look like a problem of how to think about inconsistent actions
Sure. But the agent is thinking about inconsistent actions beneath the surface which is why we have to worry about spurious counterfactuals. And this is important for having a way of determining if it is doing what it should be doing. (This becomes more important in the edge cases like Troll Bridge—https://agentfoundations.org/item?id=1711)
My interest is in how to construct them from scratch
Consider the following types of situations:
1) A complete description of a world, with an agent identified
2) A theoretical decision theory problem viewed by an external observer
3) A theoretical decision theory problem viewed reflectively
I’m trying to get from 1->2, while you are trying to get from 2->3. Whatever formalisations we use need to ultimately relate to the real world in some way, which is why I believe that we need to understand the connection from 1->2. We could also try connecting 1->3 directly, although that seems much more challenging. If we ignore the link from 1->2 and focus solely on a link from 2->3, then we will end up implicitly assuming a link from 1->2 which could involve assumptions that we don’t actually want.
Sounds like the disagreement has mostly landed in the area of questions of what to investigate first, which is pretty firmly “you do you” territory—whatever most improves your own picture of what’s going on, that is very likely what you should be thinking about.
On the other hand, I’m still left feeling like your approach is not going to be embedded enough. You say that investigating 2->3 first risks implicitly assuming too much about 1->2. My sketchy response is that what we want in the end is not a picture which is necessarily even consistent with having any 1->2 view. Everything is embedded, and implicitly reflective, even the decision theorist thinking about what decision theory an agent should have. So, a firm 1->2 view can hurt rather than help, due to overly non-embedded assumptions which have to be discarded later.
Using some of the ideas from the embedded agency sequence: a decision theorist may, in the course of evaluating a decision theory, consider a lot of #1-type situations. However, since the decision theorist is embedded as well, the decision theorist does not want to assume realizability even with respect to their own ontology. So, ultimately, the decision theorist wants a decision theory to have “good behavior” on problems where no #1-type view is available (meaning some sort of optimality for non-realizable cases).
In order to formally specify a problem, you will have already explicitly or implicitly expressed what an interpretation of what decision theory problems are. But this doesn’t make the question, “Is this interpretation valid?” disappear. If we take my approach, we will need to provide a philosophical justification for the forgetting; if we take yours, we’ll need to provide a philosophical justification that we care about the results of these kinds of paraconsistent situations. Either way, there will be further work beyond the formularisation.
This ties into the point I’ll discuss later about how I think being able to ask an external observer to evaluate whether an actual real agent took the optimal decision is the core problem in tying real world decision theory problems to the more abstract theoretical decision theory problems. Further down you write:
But I’m trying to find a way of evaluating an agent from the external perspective. Here, it is valid to criticise an agent for not selecting as action that it didn’t consider. Further, it isn’t always clear what actions are “considered” as not all agent might have a loop over all actions and they may use shortcuts to avoid explicitly evaluating a certain action.
“Forgetting” has a large number of free parameters, but so does “deontology” or “virtue ethics”. I’ve provided some examples and key details about how this would proceed, but I don’t think you can expect too much more in this very preliminary stage. When I said that a forgetting criteria would solve the problem of logical counterfactuals for free, this was a slight exaggeration. We would still have to justify why we care about raw counterfactuals, but, actually being consistent, this would seem to be a much easier task than arguing that we should care about what happens in the kind of inconsistent situations generated by paraconsistent approaches.
I actually included the Smoking Lesion Steelman (https://www.alignmentforum.org/s/fgHSwxFitysGKHH56/p/5bd75cc58225bf0670375452) as Foundations Foundations research. And CDT=EDT is pretty far along in this direction as well (https://www.alignmentforum.org/s/fgHSwxFitysGKHH56/p/x2wn2MWYSafDtm8Lf), although in my conception of what Foundations Foundations research should look like, more attention would have been paid to the possibility of the EDT graph being inconsistent, while the CDT graph was consistent.
That’s exactly how I’d put it. Except I would say I’m interested in the problem from the external perspective and the reflective perspective. I just see the external perspective as easier to understand first.
Sure. But the agent is thinking about inconsistent actions beneath the surface which is why we have to worry about spurious counterfactuals. And this is important for having a way of determining if it is doing what it should be doing. (This becomes more important in the edge cases like Troll Bridge—https://agentfoundations.org/item?id=1711)
Consider the following types of situations:
1) A complete description of a world, with an agent identified
2) A theoretical decision theory problem viewed by an external observer
3) A theoretical decision theory problem viewed reflectively
I’m trying to get from 1->2, while you are trying to get from 2->3. Whatever formalisations we use need to ultimately relate to the real world in some way, which is why I believe that we need to understand the connection from 1->2. We could also try connecting 1->3 directly, although that seems much more challenging. If we ignore the link from 1->2 and focus solely on a link from 2->3, then we will end up implicitly assuming a link from 1->2 which could involve assumptions that we don’t actually want.
Sounds like the disagreement has mostly landed in the area of questions of what to investigate first, which is pretty firmly “you do you” territory—whatever most improves your own picture of what’s going on, that is very likely what you should be thinking about.
On the other hand, I’m still left feeling like your approach is not going to be embedded enough. You say that investigating 2->3 first risks implicitly assuming too much about 1->2. My sketchy response is that what we want in the end is not a picture which is necessarily even consistent with having any 1->2 view. Everything is embedded, and implicitly reflective, even the decision theorist thinking about what decision theory an agent should have. So, a firm 1->2 view can hurt rather than help, due to overly non-embedded assumptions which have to be discarded later.
Using some of the ideas from the embedded agency sequence: a decision theorist may, in the course of evaluating a decision theory, consider a lot of #1-type situations. However, since the decision theorist is embedded as well, the decision theorist does not want to assume realizability even with respect to their own ontology. So, ultimately, the decision theorist wants a decision theory to have “good behavior” on problems where no #1-type view is available (meaning some sort of optimality for non-realizable cases).