Hmm, I’m still confused. I can’t figure out why we would need logical uncertainty in the typical case to figure out the consequences of “source code X outputs action/policy Y”. Is there a simple problem where this is necessary or is this just a result of trying to solve for the general case?
Agents need to consider multiple actions and choose the one that has the best outcome. But we’re supposing that the code representing the agent’s decision only has one possible output. E.g., perhaps an agent is going to choose between action A and action B, and will end up choosing A. Then a sufficiently close examination of the agent’s source code will reveal that the scenario “the agent chooses B” is logically inconsistent. But then it’s not clear how the agent can reason about the desirability of “the agent chooses B” while evaluating its outcomes, if not via some mechanism for nontrivially reasoning about outcomes of logically inconsistent situations.
Do we need the ability to reason about logically inconsistent situations? Perhaps we could attempt to transform the question of logical counterfactuals into a question about consistent situations instead as I describe in this post? Or to put it another way, is the idea of logical counterfactuals an analogy or something that is supposed to be taken literally?
Hmm, I’m still confused. I can’t figure out why we would need logical uncertainty in the typical case to figure out the consequences of “source code X outputs action/policy Y”. Is there a simple problem where this is necessary or is this just a result of trying to solve for the general case?
Agents need to consider multiple actions and choose the one that has the best outcome. But we’re supposing that the code representing the agent’s decision only has one possible output. E.g., perhaps an agent is going to choose between action A and action B, and will end up choosing A. Then a sufficiently close examination of the agent’s source code will reveal that the scenario “the agent chooses B” is logically inconsistent. But then it’s not clear how the agent can reason about the desirability of “the agent chooses B” while evaluating its outcomes, if not via some mechanism for nontrivially reasoning about outcomes of logically inconsistent situations.
Do we need the ability to reason about logically inconsistent situations? Perhaps we could attempt to transform the question of logical counterfactuals into a question about consistent situations instead as I describe in this post? Or to put it another way, is the idea of logical counterfactuals an analogy or something that is supposed to be taken literally?
See “Example 1: Counterfactual Mugging” in Towards a New Decision Theory.