I agree with the general thrust of your take and have spent some time this recently trying to understand the phenomena.
How is it that by learning purely probabilistic relations between tokens, the model can appear to understand deeper causal structures? What exactly is the cutoff for structure it can and can’t learn purely from text?
If you have a math background you might enjoy this recent talk by Tai-Danae Bradley where she takes outlines a possible angle of attack to the problem using category theory. The “semantic structures” described are very primitive.
I agree with the general thrust of your take and have spent some time this recently trying to understand the phenomena.
How is it that by learning purely probabilistic relations between tokens, the model can appear to understand deeper causal structures? What exactly is the cutoff for structure it can and can’t learn purely from text?
If you have a math background you might enjoy this recent talk by Tai-Danae Bradley where she takes outlines a possible angle of attack to the problem using category theory. The “semantic structures” described are very primitive.
Paper is here:
https://deepai.org/publication/an-enriched-category-theory-of-language-from-syntax-to-semantics