Bill Benzon comments on Sparsify: A mechanistic interpretability research agenda

Bill Benzon 3 Apr 2024 17:12 UTC
−5 points
0
I really like the distinction you make between mathematical description and semantic description. It reminds me of something David Marr and Thomas Poggio published back in the 1970s where they argued that complex systems, such as computer programs on nervous systems need to be described on multiple levels. The objects on an upper level are understood to be implemented by the objects and processes on the next lower level. Marr reprised the argument in his influential 1982 book on vision (Vision: A Computational Investigation into the Human Representation and Processing of Visual Information), where he talks about three levels: computation, algorithmic, and implementation/physical. Since then Marr’s formulation has been subject to considerable discussion and revision. What is important is the principle, that higher levels of organization are implemented by lower in lower levels.
In the case of LLMs we’ve got the transformer engine, the model, but also language itself. What we’re interested in is how the model implements linguistic structures and processes. To a first approximation, it seems to me that your mathematical description is about the model while the semantic description is a property of language. I’ve got a paper where I investigate ChatGPT’s story-telling behavior from this POV: ChatGPT tells stories, and a note about reverse engineering. Here’s the abstract:
I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens seems to follow a hierarchy nested according to those three levels and that is encoded in the weights of ChatGPT’s parameters. I arrived at this conjecture to account for the results of experiments in which I give ChatGPT a prompt with two components: 1) a story and, 2) instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the original key character and the one substituted for it. I end with a methodological coda: ChatGPT’s behavior must be described and analyzed on three strata: 1) The experiments exhibit behavior at the phenomenal level. 2) The conjecture is about a middle stratum, the matrix, that generates the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom, the code stratum.