I really appreciate the naturalistic experimentation approach – the fact that it tries to poke at the unknown unknowns, discovering new capabilities or failure modes of Large Language Models (LLMs).
I’m particularly excited by the idea of developing a framework to understand hidden variables and create a phenomenological model of LLM behavior. This seems like a promising way to “carve LLM abilities at their joint,” moving closer to enumeration rather than the current approach of 1) coming up with an idea, 2) asking, “Can the LLM do this?” and 3) testing it. We lack access to a comprehensive list of what LLMs can do inherently. I’m very interested in anything that moves us closer to this, where human creativity is no longer the bottleneck in understanding LLMs. A constrained psychological framework could be helpful in uncovering non-obvious areas to explore. It also offers a way to evaluate the frameworks we build: do they merely describe known data, or do they suggest experiments and point toward phenomena we wouldn’t have discovered on our own?
However, I believe there are unique challenges in LLM psychology that make it more complex:
Researchers are humans. We have an intrinsic understanding of what it’s like to be human, including interesting capabilities and phenomena to study. Researchers can draw upon all of human history and literature to find phenomena worth exploring. In many ways, the hundreds of years of stories, novels, poems, and movies pre-digest work for psychologists by drawing detailed pictures of feelings, characters, and behaviors and surfacing the interesting phenomenon to study. LLMs, however, are i) extremely recent, and ii) a type of non-localized intelligence we have no prior examples of. This means we should expect significant blind spots.
LLMs appear quite brittle. Findings might be highly sensitive to i) the base model, ii) fine-tuning, and iii) the pre-prompt. Studying LLMs might mean exploring all the personas they can instantiate, potentially a vastly more enormous space than the range of human brains.
There’s also the risk of being confused by results and not knowing how to proceed. For instance, if you find high sensitivity to the exact tokens used, affecting certain properties in ways that seem illogical, you might have a lot of data but no framework to make sense of it.
I really like the concept of species-specific experiments. However, you should be careful not to project too much of your prior models into these experiments. The ideas of latent patterns and shadows could already make implicit assumptions and constrain what we might imagine as experiments. I think this field requires epistemology on steroids because i) experiments are cheap, so most of our time is spent digesting data, which makes it easy to go off track and continually study our pet theories, and ii) our human priors are probably flawed to understand LLMs.
I really appreciate the naturalistic experimentation approach – the fact that it tries to poke at the unknown unknowns, discovering new capabilities or failure modes of Large Language Models (LLMs).
I’m particularly excited by the idea of developing a framework to understand hidden variables and create a phenomenological model of LLM behavior. This seems like a promising way to “carve LLM abilities at their joint,” moving closer to enumeration rather than the current approach of 1) coming up with an idea, 2) asking, “Can the LLM do this?” and 3) testing it. We lack access to a comprehensive list of what LLMs can do inherently. I’m very interested in anything that moves us closer to this, where human creativity is no longer the bottleneck in understanding LLMs. A constrained psychological framework could be helpful in uncovering non-obvious areas to explore. It also offers a way to evaluate the frameworks we build: do they merely describe known data, or do they suggest experiments and point toward phenomena we wouldn’t have discovered on our own?
However, I believe there are unique challenges in LLM psychology that make it more complex:
Researchers are humans. We have an intrinsic understanding of what it’s like to be human, including interesting capabilities and phenomena to study. Researchers can draw upon all of human history and literature to find phenomena worth exploring. In many ways, the hundreds of years of stories, novels, poems, and movies pre-digest work for psychologists by drawing detailed pictures of feelings, characters, and behaviors and surfacing the interesting phenomenon to study. LLMs, however, are i) extremely recent, and ii) a type of non-localized intelligence we have no prior examples of. This means we should expect significant blind spots.
LLMs appear quite brittle. Findings might be highly sensitive to i) the base model, ii) fine-tuning, and iii) the pre-prompt. Studying LLMs might mean exploring all the personas they can instantiate, potentially a vastly more enormous space than the range of human brains.
There’s also the risk of being confused by results and not knowing how to proceed. For instance, if you find high sensitivity to the exact tokens used, affecting certain properties in ways that seem illogical, you might have a lot of data but no framework to make sense of it.
I really like the concept of species-specific experiments. However, you should be careful not to project too much of your prior models into these experiments. The ideas of latent patterns and shadows could already make implicit assumptions and constrain what we might imagine as experiments. I think this field requires epistemology on steroids because i) experiments are cheap, so most of our time is spent digesting data, which makes it easy to go off track and continually study our pet theories, and ii) our human priors are probably flawed to understand LLMs.