You can call it a “gut claim” if that makes you feel better. But the actual reason is I did some very simple math (about the window size required and given quadratic scaling for transformers) and concluded that practically speaking it was impossible.
If you’re talking about this:
Now imagine trying to implement a serious backtracking algorithm. Stockfish checks millions of positions per turn of play. The attention window for your “backtracking transformer” is going to have to be at lease {size of chess board state}*{number of positions evaluated}.
And because of quadratic attention, training it is going to take on the order of {number or parameters}*({chess board state size}*{number of positions evaluated})^2
then that’s just irrelevant. You don’t need to evaluate millions of positions to backtrack (unless you think humans don’t backtrack) or play chess.
My point was not that “a relatively simple architecture that contains a Transformer as the core” cannot solve problems via trial and error (in fact I think it’s likely such an architecture exists). My point was that transformers alone cannot do so.
There’s nothing the former can do that the latter can’t. “architecture” is really overselling it but i couldn’t think of a better word. It’s just function calling.
Not really. The majority of your experiences and interactions are forgotten and discarded, the few that aren’t are recalled and triggered by the right input when necessary and not just sitting there in your awareness at all times. Those memories are also modified at every recall.
And that’s really just beside the point. However you want to spin it, evaluating that many positions is not necessary for backtracking or playing chess. If that’s the base of your “impossible” rhetoric then it’s a poor one.