I think this is real, in the sense that they got the results they are reporting and this is a meaningful advance. Too early to say if this will scale to real world problems but it seems super promising, and I would hope and expect that Waymo and competitors are seriously investigating this, or will be soon.
Having said that, it’s totally unclear how you might apply this to LLMs, the AI du jour. One of the main innovations in liquid networks is that they are continuous rather than discrete, which is good for very high bandwidth exercises like vision. Our eyes are technically discrete in that retina cells fire discretely, but I think the best interpretation of them at scale is much more like a continuous system. Similar to hearing, the AI analog being speech recognition.
But language is not really like that. Words are mostly discrete—mostly you want to process things at the token level (~= words) or sometimes wordpieces or even letters, but it’s not that sensible to think of text as being continuous. So it’s not obvious how to apply liquid NNs to text understanding/generation.
Research opportunity!
But it’ll be a while, if ever, before continuous networks work for language.
I didn’t know about the continuous nature of LNN; I would have thought that you needed different hardware (maybe an analog computer?) to treat continuous values.
Maybe it could work for generative networks for images or music, that seems less discrete than written language.
I mean, computers aren’t technically continuous and neither are neural networks, but if your time step is small enough they are continuous-ish. It’s interesting that that’s enough.
I agree music would be a good application for this approach.
Then again...the output of an LLM is a stream of tokens (yeah?). I wonder what applications LTCs could have as a post-processor for LLM output? No idea what I’m really talking about though.
Not quite. The actual output is the map from tokens to probabilities, and only then one samples a token from that distribution.
So, LLMs are more continuous in this sense than is apparent at first, but time is discrete in LLMs (a discrete step produces the next map from tokens to probabilities, and then samples from that).
Of course, when one thinks about spoken language, time is continuous for audio, so there is still some temptation to use continuous models in connection with language :-) who knows… :-)
I think this is real, in the sense that they got the results they are reporting and this is a meaningful advance. Too early to say if this will scale to real world problems but it seems super promising, and I would hope and expect that Waymo and competitors are seriously investigating this, or will be soon.
Having said that, it’s totally unclear how you might apply this to LLMs, the AI du jour. One of the main innovations in liquid networks is that they are continuous rather than discrete, which is good for very high bandwidth exercises like vision. Our eyes are technically discrete in that retina cells fire discretely, but I think the best interpretation of them at scale is much more like a continuous system. Similar to hearing, the AI analog being speech recognition.
But language is not really like that. Words are mostly discrete—mostly you want to process things at the token level (~= words) or sometimes wordpieces or even letters, but it’s not that sensible to think of text as being continuous. So it’s not obvious how to apply liquid NNs to text understanding/generation.
Research opportunity!
But it’ll be a while, if ever, before continuous networks work for language.
Thanks for your answer! Very interesting
I didn’t know about the continuous nature of LNN; I would have thought that you needed different hardware (maybe an analog computer?) to treat continuous values.
Maybe it could work for generative networks for images or music, that seems less discrete than written language.
I mean, computers aren’t technically continuous and neither are neural networks, but if your time step is small enough they are continuous-ish. It’s interesting that that’s enough.
I agree music would be a good application for this approach.
Then again...the output of an LLM is a stream of tokens (yeah?). I wonder what applications LTCs could have as a post-processor for LLM output? No idea what I’m really talking about though.
Not quite. The actual output is the map from tokens to probabilities, and only then one samples a token from that distribution.
So, LLMs are more continuous in this sense than is apparent at first, but time is discrete in LLMs (a discrete step produces the next map from tokens to probabilities, and then samples from that).
Of course, when one thinks about spoken language, time is continuous for audio, so there is still some temptation to use continuous models in connection with language :-) who knows… :-)
Ah aha! Thank you for that clarification!