I think the important part is that the agent doesn’t know about it, not the negation. Throw in information we assume the agent can’t fully take into account and the agent fails when the information would have been important. Maybe your point isn’t that simple but I’m not seeing it.
Maybe that’s sufficient to take care of the issue, but I’m not sure I can prove that it is. ie, my point was more “As obvious as the pathology of this one was, can we be sure that we won’t run into analogous… but subtler and harder to detect incompatibilities in, say, whatever next week’s souped up extension of counterfactual mugging someone comes up with or whatever?”
My worry isn’t about this specific instance of a pathological decision problem. This specific instance was simply meant to be a trivial illustration of the issue. Maybe I’m being silly, and we’ll be able to detect any of these cases right away, but maybe not. I haven’t really thought yet of a way of precisely stating the problem, but basically it’s a “while we’re having all this fun figuring out how to make a decision algorithm that wins in these weird mindbending cases… let’s make sure doing so doesn’t sacrifice its ability to win for more ordinary cases.”
I think the important part is that the agent doesn’t know about it, not the negation. Throw in information we assume the agent can’t fully take into account and the agent fails when the information would have been important. Maybe your point isn’t that simple but I’m not seeing it.
Maybe that’s sufficient to take care of the issue, but I’m not sure I can prove that it is. ie, my point was more “As obvious as the pathology of this one was, can we be sure that we won’t run into analogous… but subtler and harder to detect incompatibilities in, say, whatever next week’s souped up extension of counterfactual mugging someone comes up with or whatever?”
My worry isn’t about this specific instance of a pathological decision problem. This specific instance was simply meant to be a trivial illustration of the issue. Maybe I’m being silly, and we’ll be able to detect any of these cases right away, but maybe not. I haven’t really thought yet of a way of precisely stating the problem, but basically it’s a “while we’re having all this fun figuring out how to make a decision algorithm that wins in these weird mindbending cases… let’s make sure doing so doesn’t sacrifice its ability to win for more ordinary cases.”