But also, if you predict a completion model where a very weak hash is followed by its pre-image, it will probably have learned to undo the hash, even though the source generation process never performed that (potentially much more complicated than the hashing function itself) operation, which means it’s not really a simulator.
I’m saying that this won’t work with current systems at least for strong hash, because it’s hard, and instead of learning to undo, the model will learn to simulate, because it’s easier. And then you can vary the strength of hash to measure the degree of predictorness/simulatorness and compare it with what you expect. Or do a similar thing with something other than hash, that also distinguishes the two frames.
The point is that without experiments like these, how have you come to believe in the predictor frame?
I don’t understand, how is “not predicting errors” either a thing we have observed, or something that has anything to do with simulation?
I guess it is less about simulation being the right frame and more about prediction being the wrong one. But I think we have definitely observed LLMs mispredicting things we wouldn’t want them to predict. Or is this actually a crux and you haven’t seen any evidence at all against the predictor frame?
You can’t learn to simulate an undo of a hash, or at least I have no idea what you are “simulating” and why that would be “easier”. You are certainly not simulating the generation of the hash, going token by token forwards you don’t have access to a pre-image at that point.
Of course the reason why sometimes hashes are followed by their pre-image in the training set is because they were generated in the opposite order and then simply pasted in hash->pre-image order.
I’ve seen LLMs generating text backwards. Theoretically, LLM can keep pre-image in activations, calculate hash and then output in order hash, pre-image.
I’m saying that this won’t work with current systems at least for strong hash, because it’s hard, and instead of learning to undo, the model will learn to simulate, because it’s easier. And then you can vary the strength of hash to measure the degree of predictorness/simulatorness and compare it with what you expect. Or do a similar thing with something other than hash, that also distinguishes the two frames.
The point is that without experiments like these, how have you come to believe in the predictor frame?
I guess it is less about simulation being the right frame and more about prediction being the wrong one. But I think we have definitely observed LLMs mispredicting things we wouldn’t want them to predict. Or is this actually a crux and you haven’t seen any evidence at all against the predictor frame?
You can’t learn to simulate an undo of a hash, or at least I have no idea what you are “simulating” and why that would be “easier”. You are certainly not simulating the generation of the hash, going token by token forwards you don’t have access to a pre-image at that point.
Of course the reason why sometimes hashes are followed by their pre-image in the training set is because they were generated in the opposite order and then simply pasted in hash->pre-image order.
I’ve seen LLMs generating text backwards. Theoretically, LLM can keep pre-image in activations, calculate hash and then output in order hash, pre-image.