Bogdan Ionut Cirstea comments on Bogdan Ionut Cirstea’s Shortform

Bogdan Ionut Cirstea 2 Nov 2024 19:10 UTC
2 points
0
Some related cool-seeming recent results: All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling, Cross-Entropy Is All You Need To Invert the Data Generating Process.
- Bogdan Ionut Cirstea 2 Nov 2024 21:31 UTC
  2 points
  0
  Parent
  fragment of chat with claude on All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling and Shared computational principles for language processing in humans and deep language models (my prompt first):
  
  ‘in ‘all or none...’ they mention ‘the general form of next-token predictors used in our analysis.‘; if we assume that the language network (in biological brains) is indeed a next-token predictor (as e.g. the next-word prediction results in ‘shared principles...’ suggest), shouldn’t this suggest that the linear properties should then also apply to biological brains, not just LLMs?
  
  This is a very insightful observation! Let’s analyze this:
  The key logical chain would be:
  1. The “All or None” paper’s results apply to next-token predictors of the form:
```
p(xt+1|x1:t) = exp(f(x1:t)ᵀg(xt+1))/Z(x1:t)
```
  where f maps sequences to embeddings and g maps tokens to unembeddings
  2. The “Shared Computational Principles” paper provides evidence that the human brain:
  - Engages in next-word prediction before word onset
  - Uses context to predict next tokens
  - Shows surprise signals consistent with prediction error
  - Represents words in ways that can be mapped to vector spaces
  3. If the brain’s language network is implementing a next-token predictor of this general form, then theoretically:
  - The same mathematical framework should apply
  - The same identifiability results should hold
  - The same “all or none” properties of linear features should be present
  However, there are some important considerations:
  1. Implementation Level:
  - While the brain might implement this type of predictor at a computational level, the neural implementation could add constraints or properties not captured by the mathematical framework
  - The brain might use a more complex or different form of next-token prediction
  2. Verification Challenges:
  - It’s harder to verify if the brain’s implementation exactly matches this mathematical form
  - We can’t directly observe all components like f and g in biological systems
  3. Scale/Scope:
  - The brain might implement multiple prediction systems operating at different scales
  - The mathematical framework might apply to some but not all aspects of neural language processing
  So while your reasoning seems sound—if the brain implements this type of next-token predictor, the mathematical results should apply—verifying this empirically would be challenging. It’s a fascinating theoretical bridge between the mathematical analysis and biological implementation.′