I slowly look at brains to understand their internal algorithms, then write about my observations at aliceandbobinwanderland.substack.com. Staring at mechanistic interpretability and neuroscience has me intrigued and worried.
Alice Wanderland
Solving Newcomb’s Paradox In Real Life
What if AGI was already accidentally created in 2019? [Fictional story]
Thanks for investigating this! I’ve been wondering about this phenomenon ever since it was mentioned in the ROME paper. This “reversal curse” fits well with my working hypothesis that we should expect the basic associative network of LLMs to be most similar to system 1 in humans (without addition plugins or symbolic processing capabilities added on afterwards, which would be more similar to system 2), and the auto-regressive nature of the masking for GPT style models makes it more similar to the human sense of sound (because humans don’t have a direct “sense” of language the way we have a sense of sound and sight). I suspect the best human equivalent to the “reversal curse” is the fact that we cannot, for example, “hear” music backwards. If we could, you would be able to “hear” what song this was before the “reveal”: https://youtube.com/shorts/C2C4ccId-W8?feature=shared
I suspect we can only do backwards recall and recite things like the alphabet backwards only when it’s been translated to a more...”visual” or 2D workspace (e.g., Can you spontaneously sing the tune of the happy birthday song backwards? What if you wrote the notes down on paper or in your head first. Now can you do it?)
I also wanted to point out that there are in fact some humans for which reversibility is not trivial and they don’t automatically infer “B is A” if told, or shown that “A is B”. They are humans before the age of ~7 on average. The capacity to recognise and apply symmetry needs to be developed over time in humans too, linguistically or non-linguistically. In developmental psychology, there’s been a lot of theorizing and experimentation on why children develop cognitive capacity X at ages ~Y-Z and whether capacity A is always developed before B etc., that seems to be playing out in LLMs (to my initially large, but diminishing surprise as evidence mounts).
For example, there are some well-known observations about children and when they do or don’t display the conservation capacity, which I would now expect to see in (some) LLMs. Things like acquiring conservation in the first place, the problem size effect, the screening effects, and the length bias effect* (this one I’m especially curious to see if the “smaller” vision-language models trained on smaller datasets will have this bias) should be in some of the less complex LLMs, if you roughly think of GPT-2, GPT-3, GPT-3.5, GPT-4 (no vision) etc. as models increasing in developmental “age”.
*From paper:
Acquisition of conservation: there is a shift from nonconservation to conservation beliefs regarding large quantities starting around the age of 6 to 7 years, and this shift can be rather abrupt
Problem size effect: correct conservation judgments emerge for small quantities before larger quantities
Length bias: nonconservers tend to choose the longer row as having more items than the shorter row
Screening effect: younger children (3 to 6 years) conserve only until they actually see the results of the transformation (called the screening effect because the effects of the transformation are temporarily screened from view).
Fair! Though when the alternative is my own fiction writing skills… let’s just say I appreciated Claude’s version the most amongst the set of possible options available ^^;