Footnote 99 claims that
”...regardless of what the direct translator says, the human simulator will always imply a larger negative correlation [between camera-tampering and actually-saving the diamond] for any X such that Pai(diamond looks safe|X) > Ph(diamond looks safe|X).”
But AFAICT, the human simulator’s probability distribution given X depends only on human priors and the predictor’s probability that the diamond looks safe given X, not on how correlated or anticorrelated the predictor thinks tampering and actual-saving are. If X actually means that tampering is likely and diamond-saving is likely but their conjunction is vanishingly unlikely, the human simulator will give the same answers as if X meant they were still independent but both more likely.
If the predictor’s P(diamond look safe) is higher than the human’s P(diamond looks safe), then it seems like the human simulator will predictably have an anticorrelation between [camera-tamperign] and [actually-saving-diamond]. It’s effectively updating on one or the other of them happening more often than it thought, and so conditioned on one it goes back to the (lower) prior for the other. This still seems right to me, despite no explicit dependence on the correlations.
I agree the human simulator will predictably have an anticorrelation. But the direct simulator might also have an anticorrelation, perhaps a larger one, depending on what reality looks like.
Is the assumption that it’s unlikely that most identifiable X actually imply large anticorrelations?
I do agree that there are examples where the direct translator systematically has anticorrelations and so gets penalized even more than the human simulator. For example, this could happen if there is a consequentialist in the environment who wants it to happen, or if there’s a single big anticorrelation that dominates the sum and happens to go the wrong way.
That said, it at least seems like it should be rare for the direct translator to have a larger anticorrelation (without something funny going on). It should happen only if reality itself is much more anticorrelated than the human expects, by a larger margin than the anticorrelation induced by the update in the human simulator. But on average things should be more anticorrelated than expected about as much as they are positively correlated (averaging out to ~0), and probably usually don’t have any correlation big enough to matter.
Possible error in the strange correlations section of the report.
Footnote 99 claims that ”...regardless of what the direct translator says, the human simulator will always imply a larger negative correlation [between camera-tampering and actually-saving the diamond] for any X such that Pai(diamond looks safe|X) > Ph(diamond looks safe|X).”
But AFAICT, the human simulator’s probability distribution given X depends only on human priors and the predictor’s probability that the diamond looks safe given X, not on how correlated or anticorrelated the predictor thinks tampering and actual-saving are. If X actually means that tampering is likely and diamond-saving is likely but their conjunction is vanishingly unlikely, the human simulator will give the same answers as if X meant they were still independent but both more likely.
If the predictor’s P(diamond look safe) is higher than the human’s P(diamond looks safe), then it seems like the human simulator will predictably have an anticorrelation between [camera-tamperign] and [actually-saving-diamond]. It’s effectively updating on one or the other of them happening more often than it thought, and so conditioned on one it goes back to the (lower) prior for the other. This still seems right to me, despite no explicit dependence on the correlations.
I agree the human simulator will predictably have an anticorrelation. But the direct simulator might also have an anticorrelation, perhaps a larger one, depending on what reality looks like.
Is the assumption that it’s unlikely that most identifiable X actually imply large anticorrelations?
I do agree that there are examples where the direct translator systematically has anticorrelations and so gets penalized even more than the human simulator. For example, this could happen if there is a consequentialist in the environment who wants it to happen, or if there’s a single big anticorrelation that dominates the sum and happens to go the wrong way.
That said, it at least seems like it should be rare for the direct translator to have a larger anticorrelation (without something funny going on). It should happen only if reality itself is much more anticorrelated than the human expects, by a larger margin than the anticorrelation induced by the update in the human simulator. But on average things should be more anticorrelated than expected about as much as they are positively correlated (averaging out to ~0), and probably usually don’t have any correlation big enough to matter.
Thanks, I consider this fully answered.