This new rumor about GPT-4′s architecture is just that and should be taken with a massive grain of salt...
That said however, it would explain OpenAI’s recent comments about difficulty training a model better than GPT-3. IIRC, OA spent a full year unable to substantially improve on GPT-3. Perhaps the scaling laws do not hold? Or they ran out of usable data? And thus this new architecture was deployed as a workaround. If this is true, it supports my suspicion that AI progress is slowing and that a lot of low-hanging fruit has been picked.
further progress will not come from making models bigger. “I think we’re at the end of the era where it’s going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We’ll make them better in other ways.” [...] Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them. [...] At MIT last week, Altman confirmed that his company is not currently developing GPT-5. “An earlier version of the letter claimed OpenAI is training GPT-5 right now,” he said. “We are not, and won’t for some time.”
This new rumor about GPT-4′s architecture is just that and should be taken with a massive grain of salt...
That said however, it would explain OpenAI’s recent comments about difficulty training a model better than GPT-3. IIRC, OA spent a full year unable to substantially improve on GPT-3. Perhaps the scaling laws do not hold? Or they ran out of usable data? And thus this new architecture was deployed as a workaround. If this is true, it supports my suspicion that AI progress is slowing and that a lot of low-hanging fruit has been picked.
Sam’s comments a few months ago would also make sense given this context:
https://www.lesswrong.com/posts/ndzqjR8z8X99TEa4E/?commentId=XNucY4a3wuynPPywb