notrishi

Karma: 0

ML research eng
Looking for new a new position!!

notrishi 17 Nov 2024 22:29 UTC
1 point
0
on: notrishi’s Shortform
If blocks of text could be properly encoded and decoded vae style, this would not only improve compute requirements for transformers (since you would only have to predict latents), but it might also offer a sort of “idea” autoregression, where a thought is slowly formulated, then decoded into a sequence of tokens. Alas, it seems there is a large chunk of research left unresolved here; I doubt that researchers are willing to allocate time to this.

notrishi’s Shortform

notrishi31 Oct 2024 17:43 UTC

1 point

notrishi 31 Oct 2024 3:05 UTC
1 point
0
on: notrishi’s Shortform
There has been a lot of recent talk on diffusion models implicitly autoregressing in the frequency domain (from low-high, coarse to fine features). I find absolutely no reason that we cannot explicitly autoregress on frequencies using fft and casual attention in the frequency domain for the batched loss. I’ll probably attempt this at some point.