This is really interesting.
To understand this more thoroughly I’m simplifying the high and low quality video feeds to lists of states that correspond to reality. (This simplification might be unfair so I’m not sure this is a true break of your original proposal, but I think it helped me think about general breaking strategies.)
Ok, video feeds compressed to arrays:
We consider scenarios in fixed order. If the diamond is present, we record a 1, and if not, a 0. The high quality feed gives us a different array than the low quality mode (otherwise the low quality mode is not helpful). E.g., High reports: (1,0,1,1,0, …); Low: (1,0,1,?,0,...)
There are two possible ways that gap can get resolved.
In case one, the low quality predictor has a powerful enough model of reality to effectively derive the High quality data. (We might find this collapses to the original problem, because it has somehow reconstructed the high quality stream from the low quality stream, then proceeds as normal. You might argue that’s computationally expensive, ok, then let’s proceed to case two.)
In case two, the low quality datafeed predictor predicts wrongly.
(I know you are saying it predicts *uncertainly,* but we still have to have some framework to map uncertainty to a state, we have to round one way or the other. If uncertainty avoids loss, the predictor will be preferentially inconclusive all the time. If we round uncertainty up, effectively we’re in case one. If we round down, effectively case two.)
So we could sharpen case two and say that sometimes the AI’s camera intentionally lies to it on some random subset of scenarios. And the AI finds itself in a chaotic world where it is sometimes punished for predicting what it just knows to be true things.
In that case, although it’s easy to show how it would diverge from human simulation, it also might not simulate reality very well either, since deriving the algorithm generating the lies might be too computationally complex. (Or maybe it can derive and counter the liar, in which case we’re back at case 1, ie, the original problem.) If liar simulation is impossible, then the optimal predictor might just hit a ceiling and accepting some level of noise. Effectively this means we have a new problem—there is no direct translation possible, because the predictor is viewing a “different” world than the human.
I simplified your construct, possibly unfairly, and maybe that’s a way you can salvage your original build. But this was a really illuminating exercise for me to generalize the strategy.
I think there are some classes of builds (maybe yours escapes this) where if you overfit on preventing human simulation, you let direct translation slip away. And then if you rehabilitate direct translation, you have to reexamine if there’s an escape for human simulation. This sort of disjunctive analysis seems like an important strategy for adversarial breakers.
You still may be able to get the bedsheet over both corners, but I think other breakers in general will want to start with some disjunctive approach like this in other cases.
Happy to try to clarify, and this is helping me rethink my own thoughts, so appreciate the prompts. I’m playing with new trains of thought here and so have pretty low confidence in where I ended up, so greatly appreciate any further clarifications or responses you have.
Yup, understand that is how to effectively score uncertainty. I was very wrong to phrase this as “we still have to have some framework to map uncertainty to a state” because you don’t strictly have to do anything, you can just use probabilities.
Restricting this to discrete, binary states allows us to simplify the comparison between models for this discussion. I will claim we can do so with no loss of fidelity (leaning heavily on Shannon, ie, this is all just information, encoding it to binary and back out again doesn’t mess anything up). And doing so is not obliged, but useful.
I really shouldn’t have said “you must X!” I should have said “it’s kind of handy if you X,” sorry for that confusion.
We have a high quality information stream and a low quality information stream, and they both gesture vaguely at the ultimate high quality information stream, namely, the true facts of the matter of the world itself. Say, LQ < HQ < W.
LQ may be low quality because it is missing information in HQ, it may just be a subset of HQ, like a lower resolution video. Or it may have actual noise, false information.
If we have a powerful algorithm, we may be able to, at least asymptomatically, convert LQ to HQ, using processing power. So maybe in some cases LQ + processing = HQ exactly. But that makes the distinction uninteresting, and you would likely have to further degrade v′1 to get the effect you are looking for, so let’s discard that and consider only cases where v′1 is strictly worse.
You can now use a NAND to sort the outputs of LQ and HQ into two buckets:
A stream of outputs that all agree.
A stream of outputs that all disagree.
So for bucket 1, there are aspects of the world where there’s effectively no loss in quality. But comparing HQ with HQ is not useful, so let’s discard those cases, and examine the corners where LQ and HQ disagree.
LQ effectively has false information about some subset of reality there, that is in a sense what “LQ” means.
(Or just has gaps, which resolve to approximate HQ after processing, or fail and resolve to noise, either way.)
Rereading, I think HoldenK started down this path, “once the predictor is good enough that it can get data points right despite missing crucial information, it is also (potentially) good enough that it can learn how to imitate “what the human would think had happened if they had more information.”″
So for your block—in a sense you’re giving the human some information the predictor lacks. You’re giving the human “hints,” in the form of higher quality input, which helps get the human closer to perfectly representing the actual world. (Not completely, sometimes there’s still uncertainty, but closer than the predictor is likely to get.)
If that gets the human to “perfect”, then the best the predictor can do is asymptotically approach human prediction and direct translation at the same time.
My Weak Spots
I think one likely objection to what I wrote here is that I am abusing Shannon. I’ve considered that, would be happy to discuss it more and carefully consider objections along those lines, but I think toy examples would get us there. And without taking away from your notes about how “Sometimes the predictor’s probability is strictly between 0 and 1, so it gets some loss.” If p(I eat soup) is 0.6 for all days, let’s just ask ten discrete questions, “across n days the number of soups I eat will converge to n/1? (T/F), n/2? (T/F), …” I would definitely try to preserve performance and scoring, I just want to run the NAND.
I think another likely objection is that when we apply models, trying to get m(HQ) = ~W, then it relies on interactions of states in complex ways where we can’t slice them randomly into two groups without disrupting how models work at the basic level. I think the response is to simply group these states into bigger subsets of outcomes and treat those as atomic.
I think the biggest and most important objection would be that I’ve misunderstood your block. I would welcome any clarifications, and especially appreciate a toy example if you could, even if not involving diamonds, just to make sure I definitely get what you’re saying in that part.
I’d be interested in other objections or weak spots here, appreciate your time helping me to think this through more carefully and completely.