The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we’ve identified a suitable asymptotic order on the function, we can say intelligent things like “the smallest network capable of solving a problem in complexity class C of size N is X”.
Or if our asymptotic bounds are not tight enough:
“No economically feasible LLM can solve problems in complexity class C of size >= N”.
(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)
Regardless, we can still obtain meaningful impossibility results.
Very big caveat: the LLM doesn’t actually perform O(1) computations per generated token.
The number of computational steps performed per generated token scales with network size: https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp
Caveat to the caveat: