O O comments on Impressions from base-GPT-4?

O O 11 Nov 2023 18:30 UTC
4 points
0
What are the rumors? I’m only aware of MoE.
- mishka 11 Nov 2023 20:08 UTC
  6 points
  0
  Parent
  Yes, the main rumor is that it’s a mixture-of-experts. This is already quite a difference from a single Transformer.
  
  We presume that these experts are mostly made of various components of a Transformer (with some possible additions and modifications, which we don’t know), but we don’t know how independent those experts are, or whether they share a sizeable common initial computation and then branch off that, or something else entirely with some kind of dynamic sparse routing through a single network, and so on… I think it’s unlikely to be “just take a bunch of GPT-3′s, run an appropriate subset of them in parallel, and combine the results”.
  
  There is a huge diversity of techniques combining the MoE motifs and motifs associated with Transformers, see e.g. this collection of references https://github.com/XueFuzhao/awesome-mixture-of-experts
  
  So, we really don’t know, these rumors are only enough to make some partial guesses.
  
  If we survive for a while, all this will eventually became public knowledge, and we’ll probably understand eventually how the magic of GPT-4 is possible.