Okay, you raise a very good point. To analogize to my own brain: it’s like noticing that I can multiply integers 1-20 in my head in one step, but for larger numbers I need to write it out. Does that mean that my neural net can do multiplication? Well, as you say, it depends on n.
it’s easy to imagine a huge LLM capable doing 500 iterations of SH1 of small strings in one shot
Nitpick: for SHA1 (and any other cryptographic hash functions) I can’t fathom how an LLM could learn it through SGD, as opposed to being hand coded. To do SHA1 correctly you need to implement its internals correctly; being off a little bit will result in a completely incorrect output. It’s all or nothing, so there’s no way to gradually approach getting it right, so there’s no gradient to descend.
But your overall point still stands. It is theoretically possible for a transformer to learn any function, so this is not a fundamental upper bound, and you are therefore correct that a large enough LLM could do any of these for small n. I wonder if this is going to be a real capability of SOTA LLMs, or will it be one of those “possible in theory, but many orders of magnitude off in practice” things.
Ultimately the question I’m thinking towards is whether an LLM could do the truly important/scary problems. I care less whether an LLM can multiply two 1Mx1M (or 3x3) matrices, and more whether it can devise & execute a 50-step plan for world domination, or make important new discoveries in nanotech, or make billions of dollars in financial markets, etc.
I don’t know how to evaluate the computational complexity of these hard problems. I also don’t know whether exploring that question would help the capabilities side more than the alignment side, so I need to think carefully before answering.
This was actually my position when I started writing this post. My instincts told me that “thinking out loud” was a big enhancement to its capabilities. But then I started thinking about what I saw. I watched it spend tens of trillions of FLOPs to write out, in English, how to do a 3x3 matrix multiplication. It was so colossally inefficient, like building a humanoid robot and teaching it to use an abacus.
Then again, your analogy to humans is valid. We do a huge amount of processing internally, and then have this incredibly inefficient communication mechanism called writing, which we then use to solve very hard problems!
So my instincts point both ways on this, but I have nothing resembling rigorous proof one way or the other. So I’m pretty undecided.