Plucked from thin air, to represent the (I think?) reasonably defensible claim that a neural net intended to predict/synthesise the next state (or short time series of states) of an operating system would need to be vastly larger and require vastly more training than even the most sophisticated LLM or diffusion model.
The corresponding ‘context window’ for AIOS would need to be its entire 400MB+ input, a linear scaling factor of 1.25 x 10^4 from 32KB, but the increase in complexity is likely to be much faster than linear, say quadratic
AIOS needs to output as many of the 2 ^ (200 x 8 x 10 ^ 6) output states as apply in its (intentionally suspect) definition of ‘reasonable circumstances’. This is a lot lot lot bigger than an LLM’s output space
(3.23 x 10 ^ 23) x (input scaling factor of 1.56 x 10 ^ 8) x (output scaling factor of a lot lot lot) = conservatively, 3.4 x 10 ^ 44
Current (September 2023) estimate of global compute capacity is 3.98 x 10 ^ 21 FLOPS. So if every microprocessor on earth were devoted to training AIOS, it would take about 10 ^ 23 seconds = about 30000000000000000 years. Too long, I suspect.
I’m fully willing to have any of this, and the original post’s argument, laughed out of court given sufficient evidence. I’m not particularly attached to it, but haven’t yet been convinced it’s wrong.
Plucked from thin air, to represent the (I think?) reasonably defensible claim that a neural net intended to predict/synthesise the next state (or short time series of states) of an operating system would need to be vastly larger and require vastly more training than even the most sophisticated LLM or diffusion model.
To clarify: I didn’t just pick the figures entirely at random. They were based on the below real-world data points and handwavy guesses.
ChatGPT took 3.23 x 10^23 FPOPs to train
ChatGPT has a context window of 8K tokens
Each token is roughly equivalent to four 8-bit characters = 4 bytes, so the context window is roughly equivalent to 4 x 8192 = 32KB
The corresponding ‘context window’ for AIOS would need to be its entire 400MB+ input, a linear scaling factor of 1.25 x 10^4 from 32KB, but the increase in complexity is likely to be much faster than linear, say quadratic
AIOS needs to output as many of the 2 ^ (200 x 8 x 10 ^ 6) output states as apply in its (intentionally suspect) definition of ‘reasonable circumstances’. This is a lot lot lot bigger than an LLM’s output space
(3.23 x 10 ^ 23) x (input scaling factor of 1.56 x 10 ^ 8) x (output scaling factor of a lot lot lot) = conservatively, 3.4 x 10 ^ 44
Current (September 2023) estimate of global compute capacity is 3.98 x 10 ^ 21 FLOPS. So if every microprocessor on earth were devoted to training AIOS, it would take about 10 ^ 23 seconds = about 30000000000000000 years. Too long, I suspect.
I’m fully willing to have any of this, and the original post’s argument, laughed out of court given sufficient evidence. I’m not particularly attached to it, but haven’t yet been convinced it’s wrong.