(I’ll note that by default I’m highly skeptical of any current-day-human producing anything like a comprehensible, not-extremely-long ‘sketch of Python code’ of GPT-4 in a reasonable amount of time. For comparison, how hopeful would you be of producing the same for a smart human’s brain? And on some dimensions—e.g. knowledge—GPT-4 is vastly superhuman.)
I think OP just wanted some declarative code (I don’t think Python is the ideal choice of language, but basically anything that’s not a Turing tarpit is fine) that could speak fairly coherent English. I suspect if you had a functional transformer decompiler the results aof appling it to a Tiny Stories-size model are going to be tens to hundreds of megabytes of spaghetti, so understanding that in detail is going to be huge slog, but on the other hand, this is an actual operationalization of the Chinese Room argument (or in this case, English Room)! I agree it would be fascinating, if we can get a significant fraction of the model’s perplexity score. If it is, as people seem to suspect, mostly or entirely a pile of spaghetti, understanding even a representative (frequency-of-importance biased) statistical sample of it (say, enough for generating a few specific sentences) would still be fascinating.
(I’ll note that by default I’m highly skeptical of any current-day-human producing anything like a comprehensible, not-extremely-long ‘sketch of Python code’ of GPT-4 in a reasonable amount of time. For comparison, how hopeful would you be of producing the same for a smart human’s brain? And on some dimensions—e.g. knowledge—GPT-4 is vastly superhuman.)
I think OP just wanted some declarative code (I don’t think Python is the ideal choice of language, but basically anything that’s not a Turing tarpit is fine) that could speak fairly coherent English. I suspect if you had a functional transformer decompiler the results aof appling it to a Tiny Stories-size model are going to be tens to hundreds of megabytes of spaghetti, so understanding that in detail is going to be huge slog, but on the other hand, this is an actual operationalization of the Chinese Room argument (or in this case, English Room)! I agree it would be fascinating, if we can get a significant fraction of the model’s perplexity score. If it is, as people seem to suspect, mostly or entirely a pile of spaghetti, understanding even a representative (frequency-of-importance biased) statistical sample of it (say, enough for generating a few specific sentences) would still be fascinating.