gallabytes comments on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

gallabytes 3 Jul 2023 23:00 UTC
3 points
1
that paper is one of many claiming some linear attention mechanism that’s as good as full self attention. in practice they’re all sufficiently much worse that nobody uses them except the original authors in the original paper, usually not even the original authors in subsequent papers.
the one exception is flash attention, which is basically just a very fancy fused kernel for the same computation (actually the same, up to numerical error, unlike all these “linear attention” papers).