Clarify what you mean by that? The largest publicly known model (that was actually trained for a useful amount of time) is PaLM I believe, at around 500 billion parameters. Much bigger models have been trained for much less time.
Real time robotic control systems can have latency requirements as low as in the milliseconds, a ~1 billion parameter model is probably going to take at the very least a few tens of milliseconds in end-to-end latency. Which is probably why that limit was chosen.
A system incorporating a higher parameter model with a total latency of 1 second is unusable for real time robotic control in situations where a few hundred milliseconds of extra delay could cause catastrophic damage.
I’m not sure what an end-to-end latency of 1 second corresponds to in terms of model parameter size in the current best implementations, but it probably cannot be improved any faster than transistor scaling for mobile systems, due to power/weight/size constraints of computing systems. For fixed systems the costs will scale super-linearly due to the increasingly expensive cost of interconnect (Infiniband, etc.) to reduce latency.
Other less demanding applications, not affected by 100 milliseconds of latency may be affected by 1 second of it. Thus a wider range of applications, beyond robotic control tasks, become unsuitable for these larger models, at the present technological level.
I am not sure how many other common tasks have latency requirements within an order of magnitude of robotics control.
That’s what I mean by more general latency problems may appear beyond 10 billion parameters.
e.g. If PaLM requires 10 seconds to go from input → output, I can think of many applications where it is unsuitable. The set of such unsuitable applications grows as a function of latency.
How big is the intermediary space between having too many parameters for real time robotic control while having a low enough parameter count such that the latency is sufficient for all other common tasks?
Is it really just the range from 100 milliseconds to 1 second, or is the range smaller, or bigger?
That’s what I mean by ‘the highest parameter count that is still feasible’.
e.g. There may be a practical upper limit for how many parameters a Starcraft 2 playing model, such as AlphaStar, may have. Due to the fact that user actions can happen multiple times a second at the highest competitive levels, thus requiring AlphaStar, or an equivalent system to evaluate and place an action within a few hundred milliseconds. A hypothetical AlphaStar 2 with 10x the parameters, 10x the performance when actions per unit time are limited, and 10x the latency, may in fact play worse in a real match due to the massive disadvantage of being limited in the numbers of actions per unit time against an unbounded human opponent.
My guess based on how fast GPT-3 feels is that one OOM bigger will lead to noticeable latency (you’ll have to watch as the text appears on your screen, just like it does when you type) and that an OOM on top of that will result in annoying latency (you’ll press GO and then go browse other tabs while you wait for it to finish).
Clarify what you mean by that? The largest publicly known model (that was actually trained for a useful amount of time) is PaLM I believe, at around 500 billion parameters. Much bigger models have been trained for much less time.
Real time robotic control systems can have latency requirements as low as in the milliseconds, a ~1 billion parameter model is probably going to take at the very least a few tens of milliseconds in end-to-end latency. Which is probably why that limit was chosen.
A system incorporating a higher parameter model with a total latency of 1 second is unusable for real time robotic control in situations where a few hundred milliseconds of extra delay could cause catastrophic damage.
I’m not sure what an end-to-end latency of 1 second corresponds to in terms of model parameter size in the current best implementations, but it probably cannot be improved any faster than transistor scaling for mobile systems, due to power/weight/size constraints of computing systems. For fixed systems the costs will scale super-linearly due to the increasingly expensive cost of interconnect (Infiniband, etc.) to reduce latency.
Other less demanding applications, not affected by 100 milliseconds of latency may be affected by 1 second of it. Thus a wider range of applications, beyond robotic control tasks, become unsuitable for these larger models, at the present technological level.
I am not sure how many other common tasks have latency requirements within an order of magnitude of robotics control.
That’s what I mean by more general latency problems may appear beyond 10 billion parameters.
e.g. If PaLM requires 10 seconds to go from input → output, I can think of many applications where it is unsuitable. The set of such unsuitable applications grows as a function of latency.
How big is the intermediary space between having too many parameters for real time robotic control while having a low enough parameter count such that the latency is sufficient for all other common tasks?
Is it really just the range from 100 milliseconds to 1 second, or is the range smaller, or bigger?
That’s what I mean by ‘the highest parameter count that is still feasible’.
e.g. There may be a practical upper limit for how many parameters a Starcraft 2 playing model, such as AlphaStar, may have. Due to the fact that user actions can happen multiple times a second at the highest competitive levels, thus requiring AlphaStar, or an equivalent system to evaluate and place an action within a few hundred milliseconds. A hypothetical AlphaStar 2 with 10x the parameters, 10x the performance when actions per unit time are limited, and 10x the latency, may in fact play worse in a real match due to the massive disadvantage of being limited in the numbers of actions per unit time against an unbounded human opponent.
My guess based on how fast GPT-3 feels is that one OOM bigger will lead to noticeable latency (you’ll have to watch as the text appears on your screen, just like it does when you type) and that an OOM on top of that will result in annoying latency (you’ll press GO and then go browse other tabs while you wait for it to finish).
That sounds reasonable, GPT-4, or 5, may be latency constrained for real time applications. They may even have multiple variants for either case.