Agree on lower depth models being possible, a few other possibilities:
Smaller models with lower latency could be used, possibly distilled down from larger ones.
Compute improvements might make it practical onboard (like with Tesla’s self-driving hardware inside the chest of their andriod).
New architectures could work on more than one time scale—kind of like humans do. E.g. when we walk, not all of the processing is done in the brain. Your spinal cord can handle a tonne of it autonomously. (Will find source tomorrow).
LLM-type models could do the parts that can accept higher latency, leaving lower level processes to handle themselves. Imagine for a household cleaning robot that a LLM based agent puts out high level thoughts like “Scan the room for dirty clothes. … Fold them. … Put them in the third draw”, and existing low level stuff actually carried out the instructions. That’s an exaggerated example, but you get the idea, it doesn’t have to replace the PID controller!
I wrote this late at night, so to clarify and expand a little bit;
- “Work on more than one time scale” I think is actually an interesting idea to dwell on for a second. Like, when a person is trying to solve a problem, they will often pace back and forth, or talk, etc. They don’t have to do everything in one pass, somehow the complex computation which lets them see and move around can work on a very fast time scale, while other problem solving is going on simultaneously, and only starts to effect motor outputs later on. That’s interesting. The spinal cord doing processing independent of the brain thing I mentioned is evident in this older series of (rather horrible) experiments with cats: https://www.jstor.org/stable/24945006
- On the ‘smaller models with lower latency’, we already now see models like Minstral-7b outperforming 30b parameter models because of improvements in data, architecture, and training. I expect this trend to continue. If the largest models are capable of operating a robot out of the box, I think you could take those outputs, and use them to train (or otherwise distill down) the larger model to a more manageable size, more specialised for the task.
- On the ‘LLMs could do the parts with higher latency’, just yesterday I saw somebody do something like this with GPT-4V, where they periodically uploaded a photograph of what was in front of them, and got GPT-4V to output instructions on how to find the super market (walk further forward, turn right, etc). Kind of worked, that’s the sort of thing I was picturing here, leaving much more responsive systems to handle the low latency work, like balance, gripping, etc.
Agree on lower depth models being possible, a few other possibilities:
Smaller models with lower latency could be used, possibly distilled down from larger ones.
Compute improvements might make it practical onboard (like with Tesla’s self-driving hardware inside the chest of their andriod).
New architectures could work on more than one time scale—kind of like humans do. E.g. when we walk, not all of the processing is done in the brain. Your spinal cord can handle a tonne of it autonomously. (Will find source tomorrow).
LLM-type models could do the parts that can accept higher latency, leaving lower level processes to handle themselves. Imagine for a household cleaning robot that a LLM based agent puts out high level thoughts like “Scan the room for dirty clothes. … Fold them. … Put them in the third draw”, and existing low level stuff actually carried out the instructions. That’s an exaggerated example, but you get the idea, it doesn’t have to replace the PID controller!
I wrote this late at night, so to clarify and expand a little bit;
- “Work on more than one time scale” I think is actually an interesting idea to dwell on for a second. Like, when a person is trying to solve a problem, they will often pace back and forth, or talk, etc. They don’t have to do everything in one pass, somehow the complex computation which lets them see and move around can work on a very fast time scale, while other problem solving is going on simultaneously, and only starts to effect motor outputs later on. That’s interesting. The spinal cord doing processing independent of the brain thing I mentioned is evident in this older series of (rather horrible) experiments with cats: https://www.jstor.org/stable/24945006
- On the ‘smaller models with lower latency’, we already now see models like Minstral-7b outperforming 30b parameter models because of improvements in data, architecture, and training. I expect this trend to continue. If the largest models are capable of operating a robot out of the box, I think you could take those outputs, and use them to train (or otherwise distill down) the larger model to a more manageable size, more specialised for the task.
- On the ‘LLMs could do the parts with higher latency’, just yesterday I saw somebody do something like this with GPT-4V, where they periodically uploaded a photograph of what was in front of them, and got GPT-4V to output instructions on how to find the super market (walk further forward, turn right, etc). Kind of worked, that’s the sort of thing I was picturing here, leaving much more responsive systems to handle the low latency work, like balance, gripping, etc.