habryka comments on Jeremy Gillen’s Shortform

habryka 1 Sep 2024 3:01 UTC
2 points
0
What about any time a system generalizes favourably, instead of predicting errors? You can say it’s just a failure of prediction, but it’s not like these failures are random.
I don’t understand, how is “not predicting errors” either a thing we have observed, or something that has anything to do with simulation?
Yeah, I really don’t know what you are saying here. Like, if you prompt a completion model with badly written text, it will predict badly written text. But also, if you predict a completion model where a very weak hash is followed by its pre-image, it will probably have learned to undo the hash, even though the source generation process never performed that (potentially much more complicated than the hashing function itself) operation, which means it’s not really a simulator.
- Signer 1 Sep 2024 3:41 UTC
  −1 points
  −2
  Parent
  
  But also, if you predict a completion model where a very weak hash is followed by its pre-image, it will probably have learned to undo the hash, even though the source generation process never performed that (potentially much more complicated than the hashing function itself) operation, which means it’s not really a simulator.
  
  I’m saying that this won’t work with current systems at least for strong hash, because it’s hard, and instead of learning to undo, the model will learn to simulate, because it’s easier. And then you can vary the strength of hash to measure the degree of predictorness/simulatorness and compare it with what you expect. Or do a similar thing with something other than hash, that also distinguishes the two frames.
  
  The point is that without experiments like these, how have you come to believe in the predictor frame?
  
  I don’t understand, how is “not predicting errors” either a thing we have observed, or something that has anything to do with simulation?
  
  I guess it is less about simulation being the right frame and more about prediction being the wrong one. But I think we have definitely observed LLMs mispredicting things we wouldn’t want them to predict. Or is this actually a crux and you haven’t seen any evidence at all against the predictor frame?
  - habryka 1 Sep 2024 6:14 UTC
    4 points
    0
    Parent
    You can’t learn to simulate an undo of a hash, or at least I have no idea what you are “simulating” and why that would be “easier”. You are certainly not simulating the generation of the hash, going token by token forwards you don’t have access to a pre-image at that point.
    Of course the reason why sometimes hashes are followed by their pre-image in the training set is because they were generated in the opposite order and then simply pasted in hash->pre-image order.
    - quetzal_rainbow 1 Sep 2024 9:18 UTC
      5 points
      0
      Parent
      I’ve seen LLMs generating text backwards. Theoretically, LLM can keep pre-image in activations, calculate hash and then output in order hash, pre-image.