Mario Schlosser comments on LLMs and computation complexity

Mario Schlosser 29 Apr 2023 16:26 UTC
1 point
1
Agree. If GPT-4 can solve 3-dim matrix multiplication with chain-of-thought, then doesn’t that mean you could just take the last layer’s output (before you generate a single token from it) and send it into other instances of GPT-4, and then chain together their output? That should by definition by enough “internal state-keeping” that you wouldn’t need it to do the “note-keeping” of chain-of-thought. And that’s precisely bayesed’s point—because from the outside, that kind of a construct would just look like a bigger LLM. I think this is a clever post, but the bottleneck-ing created by token generation is too arbitrary of a way to assess LLM complexity.
- Rudi C 29 Apr 2023 22:27 UTC
  1 point
  0
  Parent
  The LLM outputs are out of distribution for its input layer. There is some research happening in deep model communication, but it has not yielded fruit yet AFAIK.