Thanks! Very interesting point about the lack of this in image-GPT etc. I have no comment there, not understanding them on a technical level.
I totally think that humans would put lots of (conditional) probability on such sequences. It’s true that we’d predict the sequence would stop eventually, but that’s not relevant; what’s relevant is: You see it repeating for N times so far. What’s the probability that it goes on for at least N+1 times total? That probability goes up and up with N, not down and down, even though you are supremely confident that N is finite and even though your (unconditional) credence in the sequence as a whole goes down and down with N.
I think you may be correct that even humans would increase probability of a repetition continuance with N up to a point. The difference could be that humans are using a much larger compressed historical context, so when reading something like Moby Dick, the prior for any serious repetition is absurdly low, and it never comes up.
Also humans read fundamentally differently through vision, and even when the retina is focusing on just a word or two at a time, you are also getting some bits of signal for surrounding future text, and big repetitions would be fairly obvious.
Thanks! Very interesting point about the lack of this in image-GPT etc. I have no comment there, not understanding them on a technical level.
I totally think that humans would put lots of (conditional) probability on such sequences. It’s true that we’d predict the sequence would stop eventually, but that’s not relevant; what’s relevant is: You see it repeating for N times so far. What’s the probability that it goes on for at least N+1 times total? That probability goes up and up with N, not down and down, even though you are supremely confident that N is finite and even though your (unconditional) credence in the sequence as a whole goes down and down with N.
I think you may be correct that even humans would increase probability of a repetition continuance with N up to a point. The difference could be that humans are using a much larger compressed historical context, so when reading something like Moby Dick, the prior for any serious repetition is absurdly low, and it never comes up.
Also humans read fundamentally differently through vision, and even when the retina is focusing on just a word or two at a time, you are also getting some bits of signal for surrounding future text, and big repetitions would be fairly obvious.