I don’t think my argument relies on the existence of a crisp boundary. Just on the existence of a part of the spectrum that clearly is just pattern recognition and not lookahead but still leads to the observations you made.
Here is one (thought) experiment to tease this apart: Imagine you train the model to predict whether a position leads to a forced checkmate and also the best move to make. You pick one tactical motive and erase it from the checkmate prediction part of the training set, but not the move prediction part.
Now the model still knows which the right moves are to make i.e. it would play the checkmate variation in a game. But would it still be able to predict the checkmate?
If it relies on pattern recognition it wouldn’t—it has never seen this pattern be connected to mate-in-x. But if it relies on lookahead, where it leverages the ability to predict the correct moves and then assesses the final position then it would still be able to predict the mate.
The results of this experiment would also be on a spectrum from 0% to 100% of correct checkmate-prediction for this tactical motive. But I think it would be fair to say that it hasn’t really learned lookahead for 0% or a very low percentage and that’s what I would expect.
I don’t think my argument relies on the existence of a crisp boundary. Just on the existence of a part of the spectrum that clearly is just pattern recognition and not lookahead but still leads to the observations you made.
Maybe I misunderstood you then, and tbc I agree that you don’t need a sharp boundary. That said, the rest of your message makes me think we might still be talking past each other a bit. (Feel free to disengage at any point obviously.)
For your thought experiment, my prediction would depend on the specifics of what this “tactical motive” looks like. For a very narrow motive, I expect the checkmate predictor will just generalize correctly. For a broader motive (like all backrank mates), I’m much less sure. Still seems plausible it would generalize if both predictors are just very simple heads on top of a shared network body. The more computational work is not shared between the heads, the less likely generalization seems.
The results of this experiment would also be on a spectrum from 0% to 100% of correct checkmate-prediction for this tactical motive. But I think it would be fair to say that it hasn’t really learned lookahead for 0% or a very low percentage and that’s what I would expect.
Note that 0% to 100% accuracy is not the main spectrum I’m thinking of (though I agree it’s also relevant). The main spectrum for me is the broadness of the motive (and in this case how much computation the heads share, but that’s more specific to this experiment).
Hmm, yeah, I think we are talking past each other.
Everything you describe is just pattern recognition to me. Lookahead or search does not depend on the broadness of the motive.
Lookahead, to me, is the ability to look ahead and see what is there. It allows very high certainty even for never before seen mating combinations.
If the line is forcing enough it allows finding very deep combinations (which you will never ever find with pattern recognition because the combinatorial explosions means that basically every deep combination has never been seen before).
In humans, it is clearly different from pattern recognition. Humans can see multi-move patterns in a glance. The example in the post I would play instantly in every blitz game. I would check the conditions of the pattern, but I wouldn’t have to “look ahead”.
Humans consider future moves even when intuitively assessing positions. “This should be winning, because I still have x,y and z in the position”. But actually calculating is clearly different because it is effortful. You have to force yourself to do it (or at least I usually have to). You manipulate the position sequentially in your mind and see what could happen. This allows you to see many things that you couldn’t predict from your past experience in similar positions
I didn’t want to get hung up on whether there is a crisp boundary. Maybe you are right and you just keep generalising and generalising until there is a search algo in the limit. I very much doubt this is where the ability of humans to calculate ahead comes from. In transformers? Who knows.
I don’t think my argument relies on the existence of a crisp boundary. Just on the existence of a part of the spectrum that clearly is just pattern recognition and not lookahead but still leads to the observations you made.
Here is one (thought) experiment to tease this apart: Imagine you train the model to predict whether a position leads to a forced checkmate and also the best move to make. You pick one tactical motive and erase it from the checkmate prediction part of the training set, but not the move prediction part.
Now the model still knows which the right moves are to make i.e. it would play the checkmate variation in a game. But would it still be able to predict the checkmate?
If it relies on pattern recognition it wouldn’t—it has never seen this pattern be connected to mate-in-x. But if it relies on lookahead, where it leverages the ability to predict the correct moves and then assesses the final position then it would still be able to predict the mate.
The results of this experiment would also be on a spectrum from 0% to 100% of correct checkmate-prediction for this tactical motive. But I think it would be fair to say that it hasn’t really learned lookahead for 0% or a very low percentage and that’s what I would expect.
Maybe I misunderstood you then, and tbc I agree that you don’t need a sharp boundary. That said, the rest of your message makes me think we might still be talking past each other a bit. (Feel free to disengage at any point obviously.)
For your thought experiment, my prediction would depend on the specifics of what this “tactical motive” looks like. For a very narrow motive, I expect the checkmate predictor will just generalize correctly. For a broader motive (like all backrank mates), I’m much less sure. Still seems plausible it would generalize if both predictors are just very simple heads on top of a shared network body. The more computational work is not shared between the heads, the less likely generalization seems.
Note that 0% to 100% accuracy is not the main spectrum I’m thinking of (though I agree it’s also relevant). The main spectrum for me is the broadness of the motive (and in this case how much computation the heads share, but that’s more specific to this experiment).
Hmm, yeah, I think we are talking past each other.
Everything you describe is just pattern recognition to me. Lookahead or search does not depend on the broadness of the motive.
Lookahead, to me, is the ability to look ahead and see what is there. It allows very high certainty even for never before seen mating combinations.
If the line is forcing enough it allows finding very deep combinations (which you will never ever find with pattern recognition because the combinatorial explosions means that basically every deep combination has never been seen before).
In humans, it is clearly different from pattern recognition. Humans can see multi-move patterns in a glance. The example in the post I would play instantly in every blitz game. I would check the conditions of the pattern, but I wouldn’t have to “look ahead”.
Humans consider future moves even when intuitively assessing positions. “This should be winning, because I still have x,y and z in the position”. But actually calculating is clearly different because it is effortful. You have to force yourself to do it (or at least I usually have to). You manipulate the position sequentially in your mind and see what could happen. This allows you to see many things that you couldn’t predict from your past experience in similar positions
I didn’t want to get hung up on whether there is a crisp boundary. Maybe you are right and you just keep generalising and generalising until there is a search algo in the limit. I very much doubt this is where the ability of humans to calculate ahead comes from. In transformers? Who knows.