I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms
Mm, I think the “algorithms vs. compute” distinction here doesn’t quite cleave reality at its joints. Much as I talked about interpolation before, it’s a pretty abstract kind of interpolation: LLMs don’t literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points they’ve been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms.
It’s dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to “push” your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it won’t match a modern LLM’s performance, won’t even come close.)
So algorithms do matter. It’s just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.
So that’s where, I would argue, the sharp left turn would lie. Not in-training, when a model’s loss suddenly drops as it “groks” general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, that’s basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The “main” sharp left turn happens during the architecture search, not during the training.)
And I’m reasonably sure we’re in an agency overhang, meaning that the newborn GI would pass human intelligence in an eye-blink. (And if it won’t, it’ll likely stall at incredibly unimpressive sub-human levels, so the ML researchers will keep tinkering with the training setups until finding one that does send it over the edge. And there’s no reason whatsoever to expect it to stall again at the human level, instead of way overshooting it.)
we have a lot of data on human values
Which human’s values? IMO, “the AI will fall into the basin of human values” is kind of a weird reassurance, given the sheer diversity of human values – diversity that very much includes xenophobia, genocide, and petty vengeance scaled up to geopolitical scales. And stuff like RLHF designed to fit the aesthetics of modern corporations doesn’t result in deeply thoughtful cosmopolitan philosophers – it results in sycophants concerned with PR as much as with human lives, and sometimes (presumably when not properly adapted to a new model’s scale) in high-strung yanderes.
Let’s grant the premise that the AGI’s values will be restricted to the human range (which I don’t really buy). If the quality of the sample within the human range that we pick will be as good as what GPT-4/Sydney’s masks appeared to be? Yeah, I don’t expect humans to stick around for a while after.
Indeed, that’s basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off.
Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tiny—something like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other ‘secret sauce’ and we now have some confidence in a null result: no secret sauce, just much more training compute.
Yes, but: whales and elephants have brains several times the size of humans, and they’re yet to build an industrial civilization. I agree that hitting upon the right architecture isn’t sufficient, you also need to scale it up – but scale alone doesn’t suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms.
Evolution stumbled upon the human/primate template brain. One of the forks of that template somehow “took off” in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. They’ll then scale it up as far as it’d go, as they’re wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics.
And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman.
(Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. There’s likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
Yes, but: whales and elephants have brains several times the size of humans, and they’re yet to build an industrial civilization.
Size/capacity isn’t all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked.
Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
Human language/culture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/whales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/expensive because it was valuable to do so. Evolution didn’t suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissue—something had to go), and yes larger brains, etc.
Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/culture. This is not entirely unprecedented—insects are also social organisms of course, but their tiny brains aren’t large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size.
You can separate intelligence into world model knowledge (crystal intelligence) and search/planning/creativity (fluid intelligence). Humans are absolutely not special in our fluid intelligence—it is just what you’d expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural. Just as each cell can store the DNA knowledge of the entire organism, each human mind ‘cell’ can store a compressed version of much of human knowledge and gains the benefits thereof.
The cultural metasystems transition which is solely completely responsible for our intellectual capability is a one time qualitative shift that will never reoccur. AI will not undergo the same transition, that isn’t how these work. The main advantage of digital minds is just speed, and to a lesser extent, copying.
Mm, I think the “algorithms vs. compute” distinction here doesn’t quite cleave reality at its joints. Much as I talked about interpolation before, it’s a pretty abstract kind of interpolation: LLMs don’t literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points they’ve been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms.
It’s dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to “push” your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it won’t match a modern LLM’s performance, won’t even come close.)
So algorithms do matter. It’s just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.
So that’s where, I would argue, the sharp left turn would lie. Not in-training, when a model’s loss suddenly drops as it “groks” general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, that’s basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The “main” sharp left turn happens during the architecture search, not during the training.)
And I’m reasonably sure we’re in an agency overhang, meaning that the newborn GI would pass human intelligence in an eye-blink. (And if it won’t, it’ll likely stall at incredibly unimpressive sub-human levels, so the ML researchers will keep tinkering with the training setups until finding one that does send it over the edge. And there’s no reason whatsoever to expect it to stall again at the human level, instead of way overshooting it.)
Which human’s values? IMO, “the AI will fall into the basin of human values” is kind of a weird reassurance, given the sheer diversity of human values – diversity that very much includes xenophobia, genocide, and petty vengeance scaled up to geopolitical scales. And stuff like RLHF designed to fit the aesthetics of modern corporations doesn’t result in deeply thoughtful cosmopolitan philosophers – it results in sycophants concerned with PR as much as with human lives, and sometimes (presumably when not properly adapted to a new model’s scale) in high-strung yanderes.
Let’s grant the premise that the AGI’s values will be restricted to the human range (which I don’t really buy). If the quality of the sample within the human range that we pick will be as good as what GPT-4/Sydney’s masks appeared to be? Yeah, I don’t expect humans to stick around for a while after.
Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tiny—something like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other ‘secret sauce’ and we now have some confidence in a null result: no secret sauce, just much more training compute.
Yes, but: whales and elephants have brains several times the size of humans, and they’re yet to build an industrial civilization. I agree that hitting upon the right architecture isn’t sufficient, you also need to scale it up – but scale alone doesn’t suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms.
Evolution stumbled upon the human/primate template brain. One of the forks of that template somehow “took off” in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. They’ll then scale it up as far as it’d go, as they’re wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics.
And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman.
(Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. There’s likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
Size/capacity isn’t all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked.
Human language/culture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/whales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/expensive because it was valuable to do so. Evolution didn’t suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissue—something had to go), and yes larger brains, etc.
Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/culture. This is not entirely unprecedented—insects are also social organisms of course, but their tiny brains aren’t large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size.
You can separate intelligence into world model knowledge (crystal intelligence) and search/planning/creativity (fluid intelligence). Humans are absolutely not special in our fluid intelligence—it is just what you’d expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural. Just as each cell can store the DNA knowledge of the entire organism, each human mind ‘cell’ can store a compressed version of much of human knowledge and gains the benefits thereof.
The cultural metasystems transition which is solely completely responsible for our intellectual capability is a one time qualitative shift that will never reoccur. AI will not undergo the same transition, that isn’t how these work. The main advantage of digital minds is just speed, and to a lesser extent, copying.