“we need to build AGI because it’s the only way to advance humanity’s technology from now on” is one thing (debatable, IMO, but at least an argument one can make)
It’s not a sane argument in favor of advancing now vs. later when it’s less likely to kill everyone (because there was more time to figure out how to advance safely). The same holds for any argument in the “think of the enormous upside” reference class, the upside isn’t going anywhere, it’s still there in 20 years.
Instead, there is talk about scaling to 10 times GPT-4 compute in 2024 and many dozens of times GPT-4 compute in 2025 (billions of dollars in compute). Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements. Any significant scaling poses a significant risk of reaching AGI. And once there is AGI, pausing before superintelligence becomes much less plausible than it is now.
It’s not a sane argument in favor of advancing now vs. later when it’s less likely to kill everyone (because there was more time to figure out how to advance safely). The same holds for any argument in the “think of the enormous upside” reference class, the upside isn’t going anywhere, it’s still there in 20 years.
Oh, I mean, I do agree. Unless you apply some really severe discount rate to those upsides, there’s no way they can outweigh a major risk of extinction (and if you are applying a really severe discount rate because you think you, personally, will die before seeing them, then that’s again just being really selfish). But I’m saying it is at least an argument we should try to reckon with at the societal level. Petty private desire for immortality should not even be entertained instead. If you want to risk humanity for the sake of your own life, you’re literally taking the sort of insane bet you’d expect a villainous fantasy video game necromancer to. Not only it’s evil, it’s not even particularly well written evil.
Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements.
This is what I find really puzzling. The human brain, which only crossed the sapience threshold a quarter-million-years of evolution ago, has O(1014) synapses, and a presumably a lot of evolved genetically-determined inductive biases. Synapses have very sparse connectivity, so synapse counts should presumably be compared to parameter counts after sparsification, which tends to reduce them by 1-2 orders of magnitude. GPT-4 is believed to have O(1012) parameters: it’s an MoE model so has some sparsity and some duplication, so call that O(1010or1011) for a comparable number. So GPT-4 is showing “sparks of AGI” something like 3 or 4 orders of magnitude before we would expect AGI from a biological parallel. I find that astonishingly low. Bear in mind also that a human brain only needs to implement one human mind, whereas an LLM is trying to learn to simulate every human who’s ever written material on the Internet in any high/medium-resource language, a clearly harder problem.
I don’t know if this is evidence that AGI is a lot easier than humans make it look, or a lot harder than GPT-4 makes it look? Maybe controlling a real human body is an incredibly compute-intensive task (but then I’m pretty sure that < 90% of the human brain’s synapses are devoted to motor control and controlling the internal organs, and more than 10% are used for language/visual processing, reasoning, memory, and executive function). Possibly we’re mostly still fine-tuned for something other than being an AGI? Given the implications for timelines, I’d really like to know.
Maybe controlling a real human body is an incredibly compute-intensive task
More specifically, the reason here is latency requirements are on the order of milliseconds, which is also a hard constraint, which means you need more compute specifically for motor processing.
I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.
It’s not a sane argument in favor of advancing now vs. later when it’s less likely to kill everyone (because there was more time to figure out how to advance safely). The same holds for any argument in the “think of the enormous upside” reference class, the upside isn’t going anywhere, it’s still there in 20 years.
Instead, there is talk about scaling to 10 times GPT-4 compute in 2024 and many dozens of times GPT-4 compute in 2025 (billions of dollars in compute). Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements. Any significant scaling poses a significant risk of reaching AGI. And once there is AGI, pausing before superintelligence becomes much less plausible than it is now.
Oh, I mean, I do agree. Unless you apply some really severe discount rate to those upsides, there’s no way they can outweigh a major risk of extinction (and if you are applying a really severe discount rate because you think you, personally, will die before seeing them, then that’s again just being really selfish). But I’m saying it is at least an argument we should try to reckon with at the societal level. Petty private desire for immortality should not even be entertained instead. If you want to risk humanity for the sake of your own life, you’re literally taking the sort of insane bet you’d expect a villainous fantasy video game necromancer to. Not only it’s evil, it’s not even particularly well written evil.
This is what I find really puzzling. The human brain, which only crossed the sapience threshold a quarter-million-years of evolution ago, has O(1014) synapses, and a presumably a lot of evolved genetically-determined inductive biases. Synapses have very sparse connectivity, so synapse counts should presumably be compared to parameter counts after sparsification, which tends to reduce them by 1-2 orders of magnitude. GPT-4 is believed to have O(1012) parameters: it’s an MoE model so has some sparsity and some duplication, so call that O(1010 or 1011) for a comparable number. So GPT-4 is showing “sparks of AGI” something like 3 or 4 orders of magnitude before we would expect AGI from a biological parallel. I find that astonishingly low. Bear in mind also that a human brain only needs to implement one human mind, whereas an LLM is trying to learn to simulate every human who’s ever written material on the Internet in any high/medium-resource language, a clearly harder problem.
I don’t know if this is evidence that AGI is a lot easier than humans make it look, or a lot harder than GPT-4 makes it look? Maybe controlling a real human body is an incredibly compute-intensive task (but then I’m pretty sure that < 90% of the human brain’s synapses are devoted to motor control and controlling the internal organs, and more than 10% are used for language/visual processing, reasoning, memory, and executive function). Possibly we’re mostly still fine-tuned for something other than being an AGI? Given the implications for timelines, I’d really like to know.
I broadly suspect that this is the actual answer:
More specifically, the reason here is latency requirements are on the order of milliseconds, which is also a hard constraint, which means you need more compute specifically for motor processing.
I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.