Okay, you raise a very good point. To analogize to my own brain: it’s like noticing that I can multiply integers 1-20 in my head in one step, but for larger numbers I need to write it out. Does that mean that my neural net can do multiplication? Well, as you say, it depends on n.
it’s easy to imagine a huge LLM capable doing 500 iterations of SH1 of small strings in one shot
Nitpick: for SHA1 (and any other cryptographic hash functions) I can’t fathom how an LLM could learn it through SGD, as opposed to being hand coded. To do SHA1 correctly you need to implement its internals correctly; being off a little bit will result in a completely incorrect output. It’s all or nothing, so there’s no way to gradually approach getting it right, so there’s no gradient to descend.
But your overall point still stands. It is theoretically possible for a transformer to learn any function, so this is not a fundamental upper bound, and you are therefore correct that a large enough LLM could do any of these for small n. I wonder if this is going to be a real capability of SOTA LLMs, or will it be one of those “possible in theory, but many orders of magnitude off in practice” things.
Ultimately the question I’m thinking towards is whether an LLM could do the truly important/scary problems. I care less whether an LLM can multiply two 1Mx1M (or 3x3) matrices, and more whether it can devise & execute a 50-step plan for world domination, or make important new discoveries in nanotech, or make billions of dollars in financial markets, etc.
I don’t know how to evaluate the computational complexity of these hard problems. I also don’t know whether exploring that question would help the capabilities side more than the alignment side, so I need to think carefully before answering.
Okay, you raise a very good point. To analogize to my own brain: it’s like noticing that I can multiply integers 1-20 in my head in one step, but for larger numbers I need to write it out. Does that mean that my neural net can do multiplication? Well, as you say, it depends on n.
Nitpick: for SHA1 (and any other cryptographic hash functions) I can’t fathom how an LLM could learn it through SGD, as opposed to being hand coded. To do SHA1 correctly you need to implement its internals correctly; being off a little bit will result in a completely incorrect output. It’s all or nothing, so there’s no way to gradually approach getting it right, so there’s no gradient to descend.
But your overall point still stands. It is theoretically possible for a transformer to learn any function, so this is not a fundamental upper bound, and you are therefore correct that a large enough LLM could do any of these for small n. I wonder if this is going to be a real capability of SOTA LLMs, or will it be one of those “possible in theory, but many orders of magnitude off in practice” things.
Ultimately the question I’m thinking towards is whether an LLM could do the truly important/scary problems. I care less whether an LLM can multiply two 1Mx1M (or 3x3) matrices, and more whether it can devise & execute a 50-step plan for world domination, or make important new discoveries in nanotech, or make billions of dollars in financial markets, etc.
I don’t know how to evaluate the computational complexity of these hard problems. I also don’t know whether exploring that question would help the capabilities side more than the alignment side, so I need to think carefully before answering.