Surely transformer-based architecture is not what superintelligences will be running on. Transformers have many limitations. The context window for one, can this be made large enough for what a superintelligence would need? What about learning and self-improvement after training? Scaling and improving transformers might be a path to superintelligence but it seems like a very inefficient route.
We’ve demonstrated that roughly human-level intelligence can, in many ways, be achieved by Transformer architecture. But what if there’s something way better than Transformers, just as Transformers are superior to what we were using before? We shouldn’t rule out someone publishing a landmark paper with a better architecture. The last landmark paper came out in 2017!
And there might well be discontinuities in performance. Pre-Stable Diffusion AI art was pretty awful, especially faces. It went from awful to artful in a matter of months, not years.
Surely transformer-based architecture is not what superintelligences will be running on. Transformers have many limitations. The context window for one, can this be made large enough for what a superintelligence would need? What about learning and self-improvement after training? Scaling and improving transformers might be a path to superintelligence but it seems like a very inefficient route.
We’ve demonstrated that roughly human-level intelligence can, in many ways, be achieved by Transformer architecture. But what if there’s something way better than Transformers, just as Transformers are superior to what we were using before? We shouldn’t rule out someone publishing a landmark paper with a better architecture. The last landmark paper came out in 2017!
And there might well be discontinuities in performance. Pre-Stable Diffusion AI art was pretty awful, especially faces. It went from awful to artful in a matter of months, not years.