abramdemski comments on Comparing LICDT and LIEDT

abramdemski 28 Jul 2020 21:28 UTC
LW: 6 AF: 2
AF
If I’m interpreting thing correctly, this is just because anything that’s upstream gets screened off, because the agent knows what action it’s going to take.
Not quite. The agent might play a mixed strategy if there is a predictor in the environment, e.g., when playing rock/paper/scissors with a similarly-intelligent friend you (more or less) want to predict what you’re going to do and then do something else than that. (This is especially obvious if you assume the friend is exactly as smart as you, IE, assigns the same probability to things if there’s no hidden information—we can model this by supposing both of you use the same logical inductor.) You don’t know what you’re going to do, because your deliberation process is unstable: if you were leaning in any direction, you would immediately lean in a different direction. This is what it means to be playing a mixed strategy.
In this situation, I’m nonetheless still claiming that what’s “downstream” should be what’s logically correlated with you. So what screens off everything else is knowledge of the state of your deliberation, not the action itself. In the case of a mixed strategy, you know that you are balanced on a razor’s edge, even though you don’t know exactly which action you’re taking. And you can give a calibrated probability for that action.
You assert that LICDT pays the blackmail in XOR blackmail because it follows this law of logical causality. Is this because, conditioned on the letter being sent, if there is a disaster the agent assigns $p = 0$ to sending money, and if there isn’t a disaster the agent assigns $p = 1$ to sending money, so the disaster must be causally downstream of the decision to send money if the agent is to know whether or not it sends money?
I don’t recall whether I’ve written the following up, but a while after I wrote the OP here, I realized that LICDT/LIEDT can succeed in XOR Blackmail (failing to send the money), but for an absolutely terrible reason.
Suppose that the disaster is sufficiently rare—much less probable than the exploration probability $ϵ$ . Furthermore, suppose the exploration mechanism is p-chicken, IE “if you’re too sure of what you do, do something else.” (The story is more complicated for other exploration methods.)
Now how often does the agent respond to the letter?
Now suppose that, overall, the agent responds to the letter with frequency at least $ϵ$ (including rounds where the agent doesn’t receive the letter). Then, conditional on the letter being sent, the agent is pretty sure it will respond to the letter—it believes this with probability $> 1 - ϵ$ . This is because the minimum response frequency would be $ϵ$ , but this is already much more common than the disaster. Since the letter is only supposed to arrive when the disaster has happened or when the agent would respond (by hypothesis), it must be pretty likely that the agent is responding. It should learn this after enough trials.
But the agent is playing p-chicken. If the probability of responding to the letter is greater than $1 - ϵ$ , then the agent will refuse to do so. If the agent refuses, then the letter won’t be sent except if the rare disaster is occurring. This contradicts the assumption that the agent responds to the letter with frequency at least $ϵ$ .
So the agent receives and responds to the letter with frequency less than $ϵ$ . On most rounds, the predictor simulates the agent and finds that the agent would have refused, had the letter been sent.
This is good. But the agent’s reason for refusing is bonkers. The agent refuses because it thinks it responds to the letter. Its own credence in its responding to the letter is always bumping up against its $1 - ϵ$ ceiling.
A very similar thing can happen in transparent newcomb, except this time the response deprives the agent of the prize. In that problem, an agent only sees a full box in cases where it’ll 1-box. So if it sees a full box infinitely often, its credence that it will 1-box (upon seeing a full box) must approach 1. But this can’t be, since it must stay below $1 - ϵ$ . So in fact, the agent only sees a full box finitely often before being relegated to empty boxes forever. Omega keeps checking whether the agent would 1-box, and keeps seeing that it would 2-box due to its exploration clause triggering.
Moral of the story: p-chicken is pretty awful in perfect-predictor cases, particularly when the predictor is interested in what you do conditional on a particular observation.
Other exploration mechanisms only fare a little better.