A different way of phrasing Ajeya’s response, which I think is roughly accurate, is that if you have a reporter that gives consistent answers to questions, you’ve learned a fact about the predictor, namely “the predictor was such that when it was paired with this reporter it gave consistent answers to questions.” if there were 8 predictor for which this fact was true then “it’s the [7th] predictor such that when it was paired with this reporter it gave consistent answers to questions” is enough information to uniquely determine the reporter, e.g. the previous fact + 3 additional bits was enough. if the predictor was 1000 bits, the fact that it was consistent with a reporter “saved” you 997 bits, compressing the predictor into 3 bits.
The hope is that maybe the honest reporter “depends” on larger parts of the predictor’s reasoning, so less predictors are consistent with it, so the fact that a predictor is consistent with the honest reporter allows you to compress the predictor more. As such, searching for reporters that most compressed the predictor would prefer the honest reporter. However, the best way for a reporter to compress a predictor is to simply memorize the entire thing, so if the predictor is simple enough and the gap between the complexity of the human-imitator and the direct translator is large enough, then the human-imitator+memorized predictor is the simplest thing that maximally compresses the predictor.
A different way of phrasing Ajeya’s response, which I think is roughly accurate, is that if you have a reporter that gives consistent answers to questions, you’ve learned a fact about the predictor, namely “the predictor was such that when it was paired with this reporter it gave consistent answers to questions.” if there were 8 predictor for which this fact was true then “it’s the [7th] predictor such that when it was paired with this reporter it gave consistent answers to questions” is enough information to uniquely determine the reporter, e.g. the previous fact + 3 additional bits was enough. if the predictor was 1000 bits, the fact that it was consistent with a reporter “saved” you 997 bits, compressing the predictor into 3 bits.
The hope is that maybe the honest reporter “depends” on larger parts of the predictor’s reasoning, so less predictors are consistent with it, so the fact that a predictor is consistent with the honest reporter allows you to compress the predictor more. As such, searching for reporters that most compressed the predictor would prefer the honest reporter. However, the best way for a reporter to compress a predictor is to simply memorize the entire thing, so if the predictor is simple enough and the gap between the complexity of the human-imitator and the direct translator is large enough, then the human-imitator+memorized predictor is the simplest thing that maximally compresses the predictor.