ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality. Genes determine a brain’s architectural prior just as a small amount of python code determines an ANN’s architectural prior, but the capabilities come only from scaling with compute and data (quantity and quality).
So you absolutely can not take datasets of gene-IQ correlations and assume those correlations would somehow transfer to gene interventions on adults (post training in DL lingo). The genetic contribution to IQ is almost all developmental/training factors (architectural prior, learning algorithm hyper params, value/attention function tweaks, etc) which snowball during training. Unfortunately developmental windows close and learning rates slow down as the brain literally carves/prunes out its structure, so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves.
But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality.
How do you know this?
Genes determine a brain’s architectural prior just as a small amount of python code determines an ANN’s architectural prior, but the capabilities come only from scaling with compute and data (quantity and quality).
In comparing human brains to DL, training seems more analogous to natural selection than to brain development. Much simpler “architectural prior”, vastly more compute and data.
So you absolutely can not take datasets of gene-IQ correlations and assume those correlations would somehow transfer to gene interventions on adults
We’re really uncertain about how much would transfer! It would probably affect some aspects of intelligence more than others, and I’m afraid it might just not work at all if g is determined by the shape of structures that are ~fixed in adults (e.g. long range white matter connectome). But it’s plausible to me that the more plastic local structures and the properties of individual neurons matter a lot for at least some aspects of intelligence (e.g. see this).
so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves
There’s a lot more to intelligence than learning. Combinatorial search, unrolling the consequences of your beliefs, noticing things, forming new abstractions. One might consider forming new abstractions as an important part of learning, which it is, but it seems possible to come up with new abstractions ‘on the spot’ in a way that doesn’t obviously depend on plasticity that much; plasticity would more determine whether the new ideas ‘stick’. I’m bottlenecked by the ability to find new abstractions that usefully simplify reality, not having them stick when I find them.
But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
My model is there’s this thing lurking in the distance, I’m not sure how far out: dangerously capable AI (call it DCAI). If our current civilization manages to cough up one of those, we’re all dead, essentially by definition (if DCAI doesn’t kill everyone, it’s because technical alignment was solved, which our current civilization looks very unlikely to accomplish). We look to be on a trajectory to cough one of those up, but It isn’t at all obvious to me that it’s just around the corner: so stuff like this seems worth trying, since humans qualitatively smarter than any current humans might have a shot at thinking of a way out that we didn’t think of (or just having the mental horsepower to quickly get working something we have thought of, e.g. getting mind uploading working).
ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality.
How do you know this?
From study of DL and neuroscience of course. I’ve also written on this for LW in some reasonably well known posts: starting with The Brain as a Universal Learning Machine, and continuing in Brain Efficiency, and AI Timelines specifically see the Cultural Scaling Criticality section on the source of human intelligence, or the DL section of simboxes. Or you could see Steven Byrne’s extensive LW writings on the brain—we are mostly in agreement on the current consensus from computational/systems neuroscience.
The scaling laws are extremely well established in DL and there are strong theoretical reasons (and increasingly experimental neurosci evidence) that they are universal to all NNs, and we have good theoretical models of why they arise. Strong performance arises from search (bayesian inference) over a large circuit space. Strong general performance is strong performance on many many diverse subtasks which require many many specific circuits built on top of compressed/shared base circuits down a heirarchy. The strongest quantitative predictor of performance is the volume of search space explored which is the product of C * T (capacity and data/time). Data quality matters in the sense that the search volume quantitative function of predictive loss only applies to tasks similar enough to the training data distribution.
In comparing human brains to DL, training seems more analogous to natural selection than to brain development. Much simpler “architectural prior”, vastly more compute and data.
No—biological evolution via natural selection is very similar to technological evolution via engineering. Both brains and DL systems have fairly simple architectural priors in comparison to the emergent learned complexity (remember whenever I use the term learning, I use it in a technical sense, not a colloquial sense) - see my first early ULM post for a review of the extensive evidence (greatly substantiated now by my scaling hypothesis predictions coming true with the scaling of transformers which are similar to the archs I discussed in that post).
so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves
There’s a lot more to intelligence than learning.
Whenever I use the word learning, without further clarification, I mean learning as in bayesian learning or deep learning, not in the colloquial sense. My definition/sense of learning encompasses all significant changes to synapses/weights and is all encompassing.
Combinatorial search, unrolling the consequences of your beliefs, noticing things, forming new abstractions.
Brains are very slow so have limited combinatorial search, and our search/planning is just short term learning (short/medium term plasticity). Again it’s nearly all learning (synaptic updates).
if DCAI doesn’t kill everyone, it’s because technical alignment was solved, which our current civilization looks very unlikely to accomplish)
I find the standard arguments for doom implausible—they rely on many assumptions contradicted by deep knowledge of computational neuroscience and DL.
I was at the WBE2 workshop with Davidad but haven’t yet had time to write about progress (or lack thereof); I think we probably mostly agree that the type of uploading moonshot he discusses there is enormously expensive (not just in initial R&D, but also in recurring per scan costs). I am actually more optimistic than more pure DL based approaches will scale to much lower cost, but “much lower cost” is still on order of GPT4 training cost just to process the scan data through a simple vision ANN—for a single upload.
The scaling laws are extremely well established in DL and there are strong theoretical reasons (and increasingly experimental neurosci evidence) that they are universal to all NNs, and we have good theoretical models of why they arise.
I’m not aware of these—do you have any references?
Both brains and DL systems have fairly simple architectural priors in comparison to the emergent learned complexity
True but misleading? Isn’t the brain’s “architectural prior” a heckuva lot more complex than the things used in DL?
Brains are very slow so have limited combinatorial search, and our search/planning is just short term learning (short/medium term plasticity). Again it’s nearly all learning (synaptic updates).
Sure. The big crux here is whether plasticity of stuff which is normally “locked down” in adulthood is needed to significantly increase “fluid intelligence” (by which I mean, something like, whatever allows people to invent useful new concepts and find ingenious applications of existing concepts). I’m not convinced these DL analogies are useful—what properties do brains and deepnets share that renders the analogies useful here? DL is a pretty specific thing, so by default I’d strongly expect brains to differ in important ways. E.g. what if the structures whose shapes determine the strength of fluid intelligence aren’t actually “locked down”, but reach a genetically-influenced equilibrium by adulthood, and changing the genes changes the equilibrium? E.g. what if working memory capacity is limited by the noisiness of neural transmission, and we can reduce the noisiness through gene edits?
I find the standard arguments for doom implausible—they rely on many assumptions contradicted by deep knowledge of computational neuroscience and DL
FOOM isn’t necessary for doom—the convergent endpoint is that you have dangerously capable minds around: minds which can think much faster and figure out things we can’t. FOOM is one way to get there.
True but misleading? Isn’t the brain’s “architectural prior” a heckuva lot more complex than the things used in DL?
The full specification of the DL system includes the microde, OS, etc. Likewise much of the brain complexity is in the smaller ‘oldbrain’ structures that are the equivalent of a base robot OS. The architectural prior I speak of is the complexity on top of that, which separates us from some ancient earlier vertebrate brain. But again see the brain as a ULM post, which cover the the extensive evidence for emergent learned complexity from simple architecture/algorithms (now the dominant hypothesis in neuroscience).
I’m not convinced these DL analogies are useful—what properties do brains and deepnets share that renders the analogies useful here?
Most everything above the hardware substrate—but i’ve already provided links to sections of my articles addressing the convergence of DL and neurosci with many dozens of references. So it’d probably be better to focus exactly on what specific key analogies/properties you believe diverge.
DL is a pretty specific thing
DL is extremely general—it’s just efficient approximate bayesian inference over circuit spaces. It doesn’t imply any specific architecture, and doesn’t even strongly imply any specific approx inference/learning algorithm (as 1st and approx 2nd order methods are both common).
E.g. what if working memory capacity is limited by the noisiness of neural transmission, and we can reduce the noisiness through gene edits?
Training to increase working memory capacity has near zero effect on IQ or downstream intellectual capabilities—see gwern’s reviews and experiments. Working memory capacity is important in both brains and ANNs (transformers), but it comes from large fast weight synaptic capacity, not simple hacks.
Noise is important for sampling—adequate noise is a feature, not a bug.
So I agree with your general point that genetic interventions made in adults would have a lesser effect than those same interventions made in embryos, which is why our model assumes that the average genetic change would have just half the normal effect. The exact relative size of edits made in the adult brain vs an embryo IS one of the major unknown factors in this project but like… if brain size were the only thing affecting intelligence we’d expect a near perfect correlation between it and intelligence. But that’s not what we see.
Brain size only correlates with intelligence at 0.3-0.4.
So there’s obviously a lot more going on.
post training in DL lingo
It’s not post-training. Brains are constantly evolving and adapting throughout the lifespan.
But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
If this was actually the case then none of the stuff people are doing in AI safety or anything else would matter. That’s clearly not true.
Maturation proceeds inside out with the regions closest to the world (lower sensory/motor) maturing first, proceeding up the processing hierarchy, and ending with maturation of the highest levels (some prefrontal, associative etc) around age ~20.
The human brain’s most prized intellectual capabilities are constrained (but not fully determined) mostly by the upper brain regions. Having larger V1 synaptic capacity may make for a better fighter pilot through greater visual acuity, but STEM capability is mostly determined by capacity & efficiency of upper brain regions (prefrontal, associative, etc and their cerebellar partners).
I say constrains rather than determines because training data quantity/quality also obviously constrains. Genius level STEM capability requires not only winning the genetic lottery, but also winning the training run lottery.
Brain size only correlates with intelligence at 0.3-0.4.
IQ itself only correlates with STEM potential (and less so as you move away from the mean) but sure there are many ways to make a brain larger that do not specifically increase synaptic capacity&efficiency of the specific brain regions most important for STEM capability. Making neurons larger, increasing the space between them, increasing glial size or counts, etc. But some brain size increase methods will increase the size of STEM linked brain regions, so 0.3-0.4 seems about right.
The capacity&efficiency of the most important brain regions is mostly determined by genes effecting the earliest stage 1 - the architectural prior. These regions won’t fully be used until ~20 years later due to how the brain trains/matures modules over time, but most of the genetic influence is in stage 1 - i’d guess 75%.
I’d guess the next 20% of genetic influence is on stage 2 factors that effect synaptic efficiency and learning efficiency, and only 5% influence on 3 via fully mature/trained modules.
Yes a few brain regions (hippocampus, BG, etc) need to maintain high plasticity (with some neurogenesis) even well in to adulthood—they never fully mature to stage 3. But that is the exception, not the rule.
Brains are constantly evolving and adapting throughout the lifespan.
Not really—See above. At 45 most of my brain potential is now fully spent. I’m very unlikely to ever be a highly successful chess player or physicist or poet etc. Even learning a new human language is very slow and ineffective compared to a child. It’s all about depletion of synaptic learning potential reserves.
The colloquial use of the word ‘learning’ as in ‘learning’ new factual information is not at all what I mean and is not relevant to STEM capability. I am using ‘learning’ in the more deep learning sense of learning deep complex circuits important for algorithmic creativity, etc.
As a concrete specific example, most humans learn to multiply large numbers by memorizing lookup tables for multiplication of individual digits and memorizing a slow serial mental program built on that. But that isn’t the only way! It is possible to learn more complex circuits which actually do larger sum addition&multiplication directly—and some mentats/savants do acquire these circuits (with JVN being a famous likely example).
STEM capability is determined by deep learning many such circuits, not ‘learning’ factual knowledge.
Now it is likely that one of the key factors for high intelligence is a slower and more efficient maturation cycle that maintains greater synaptic learning reserves far into adulthood—ala enhanced neotany, but that is also an example of genetic influence that only matters in stage 1 and 2. Maturation is largely irreversible—once most connections are pruned and the few survivors are strengthened/myelinated you can’t go back to the earlier immature state of high potential.
But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
If this was actually the case then none of the stuff people are doing in AI safety or anything else would matter.
Huh? Oh—by learning there I meant full learning in the training sense—stages 1 and 2. Of course things adults do now matter, they just don’t matter through the process of training new improved human brains.
Couldn’t there be genetic effects on things that can improve the brain even once its NN structure is mostly fixed? Maybe it’s possible to have neurons work faster, or for the brain to wear less with abstract thinking, or to need less sleep.
This kind of thing is not a full intelligence improvement because it does not allow you to notice more patterns or to think with new schemes.
So maybe yes, it won’t make a difference for AI timelines, though it would still be a very big deal.
Sure—I’m not saying no improvement is possible. I expect that the enhancements from adult gene editing should encompass most all of the brain tweaks you can get from drugs/diet. But those interventions will not convert an average brain into an Einstein.
The brain—or more specifically the brains of very intelligent people—are already very efficient, so I’m also just skeptical in general that there are many remaining small tweaks that take you past the current “very intelligent”. Biological brains beyond the human limit are of course possible, but probably require further significant size expansion amongst other infeasible changes.
Sleep is very important, less isn’t really better—most of the critical cortex learning/training happens during sleep through episodic replay, SWRs and REM etc.
It does not. Despite the title of that section it is focused on adult expression factors. The post in general lacks a realistic mechanistic model of how tweaking genes affects intelligence.
genes are likely to have an effect if edited in adults: the size of the effect of a given gene at any given time is likely proportional to its level of expression
Is similar to expecting that a tweak to the hyperparams (learning rate) etc of trained GPT4 can boost its IQ (yes LLMs have their IQ or g factor). Most all variables that affect adult/trained performance do so only through changing the learning trajectory. The low hanging fruit or free energy in hyperparams with immediate effect is insignificant.
Of course if you combine gene edits with other interventions to rejuvenate older brains or otherwise restore youthful learning rate more is probably possible, but again it doesn’t really matter much as this all takes far too long. Brains are too slow.
Of course if you combine gene edits with other interventions to rejuvenate older brains or otherwise restore youthful learning rate more is probably possible
We thought a bit about this, though it didn’t make the post. Agree that it increases the chance of the editing having a big effect.
Maybe it’s the lack of sleep for me but is “Brains are too slow.” referring to something like growth/formation of structures that support some level of intelligence or to to human brain’s just being slower than and AGI?
But it ultimately doesn’t matter, because the brain just learns too slowly.
Why think the brain learns too slowly? If I can boost my sample efficiency I can learn new subjects quicker, remember more facts, and do better thought-result attribution. All these seem short-term beneficial. Unless you think there’s major critical period barriers here, these all seem likely results.
Though I do agree that a person with the genes of a genius for 2 years will be far less a genius than a person with the genes of a genius for 25 years. It seems a stretch to say the first change can be rounded off as not matteringthough.
It would matter in a world without AI, but that is not the world we live in. Yes if you condition on some indefinite AI pause or something then perhaps, but that seems extremely unlikely. It takes about 30 years to train a new brain—so the next generation of humans won’t reach their prime until around the singularity, long after AGI.
Though I do agree that a person with the genes of a genius for 2 years
Most genius is determined prenatally and during ‘training’ when cortex/cerebellum modules irreversibly mature, just as the capabilities of GPT4 are determined by the initial code and the training run.
I think I agree with everything that you said except that it won’t matter. It seems like it’d very much matter if in the most successful case we make people learn skills, facts, and new fields 6x faster. Maybe you think 6x is too high, and its more like 1.05x, enough to notice a difference over 30 years, but not over 2-5.
So other than the medical issues which makes this idea unviable (off target edits causing cancer, major brain firmware edits causing uncontrollable seizures), we can also bound our possible performance increases.
We haven’t increased brain volume meaningfully, the patients skull plates are fixed. And they still have one set of eyes and one set of hands. Nerve transmission velocities haven’t been improved either.
In terms of real performance is 6x achievable? Does it mean anything? Even the most difficult tasks humans have ever accomplished require taking in the latest data from the last round of testing and then choosing what to try next based on this information. This “innovation feedback cycle” is several steps, and the “think about it” step, even if we posit we can make it infinitely fast, would be limited by “view the data” and “communicate with other people” steps.
That is I am taking a toy model:
View the data, think about the next experiment, tell someone/something else to do the experiment
Are the only 3 steps.
If the view/tell steps take more than 1⁄6 of the total time 6x performance increase is impossible. This is Amdahls law.
Viewing and telling are themselves learned skills that can be sped up. Most are far from their maximal physically possible reading/listening speed, or their maximal physically possible writing speed. For example, just after high school I tried to read sutton & barto and spent a week reading each chapter. Later I read it & spent a day on each chapter. That’s a 7x improvement just from meta-learning!
You’re still i/o limited though. And optimizations you mention come with tradeoffs, skipping words by speed reading for example won’t work at all when the content has high information density
I still read every word, I just knew better what to think about, recall was faster, etc. I was reading at a leisurely pace as well. If you want to call learning what to pay attention to & how to pay attention to it not an i/o problem, just the physical limits, then I do think i/o is very very fast, taking <<1/6 of time.
Depends on what it is. Experimenting with AI? Fixing cars? Working on particle physics as CERN? Developing the atomic bomb? Discovering the mass of the electron? Performing surgery? Developing new surgery methods?
I/O is at least 90 percent of each of those tasks.
I don’t know how to think about i/o in the tasks you mention, so I don’t think the question is very useful. Definitely on an individual level, much time is spent on i/o, but that’s beside the point, as I said above people can do more efficient i/o than they currently do, and generally smarter people are able to do more efficient i/o. When I ask myself why we aren’t better at the tasks you mention, mostly I think firstly we are coordination constrained, and secondly we are intelligence constrained. Maybe we’re also interface constrained, which seems related to i/o, but generally we make advancements in interfaces, and this improves productivity in all the tasks you mention, and smarter people can use & make smarter interfaces if that is indeed the problem.
A good motivator: There exist 10x software engineers, who are generally regarded as being 10x better programmers than regular engineers. If i/o was the limiter for programming ability, then such people would be expected to simply have better keyboards, finger dexterity, and eyesight. Possibly they have these to some extent, but I expect their main edge over other 1x engineers is greater sample efficiency when generalizing from one programming task to another. We can thus conclude that i/o takes up <1/10 the time in programming. Probably <<1/10.
There also probably exist 10x surgeons, experimentalists, and mechanics. Perhaps there also exist 10x particle physicists at CERN, though there are fewer of them, and it may be less obvious.
So if 10x software engineers exist, they develop architecture and interfaces and use patterns where over time 1⁄10 the total amount of human time per feature is used. Bad code consumes enormous amounts of time to deal with, where bad architecture that blocks adding new features or makes localizing a bug difficult would be the worst.
But to be this good mostly comes from knowledge, learned either in school or over a lot of time from doing it the wrong way and learning how it fails.
It’s not an intelligence thing. A genius swe can locate a bug on a hunch, a 10x swe would write code where the bug doesn’t exist or is obvious every time.
A lot of the other examples I gave I have the impression that no, I/o is everything. Finding the mass of the electron was done with laborious effort over many hours, most of it dealing with the equipment. Nobody can cut 10 times faster in surgery, hands can’t run that quickly. Same with fixing a car. Cern scientists obviously are limited by all sorts of equipment issues. Same with AI research—the limiting factor has always been equipment from the start. “Equipment issues” mean either you get your hands dirty fixing it yourself—that’s I/O or spare parts bound—or you tell someone else to do it and their time to fix is bound the same way.
Some of the best scientists in history could fix equipment issues themselves, this likely broadened their skill base and made their later discoveries feasible.
They aren’t the same thing? I mean for the topics of interest, AI alignment, there is nothing to learn from other humans or improve on past a certain baseline level of knowledge. Past a certain point reading papers on it I suspect your learning curve would go negative because you’re just learning on errors people before you made.
Improving past that point has to be designing and executing high knowledge gain experiments, and that’s I/o and funding bound.
I would argue that the above is the rule for anything humans cannot already do.
Were you thinking of skills where it’s a confined objective task? Like StarCraft 2 or Go? The former being strongly I/o bound.
I’m very confident we’re talking past each other, and I’m not in the mood to figure out what we actually disagree on. I think we’re using “i/o” differently, and I claim your use permits improvements to the process, which contradicts your argument.
I am not so sure about that. I am thinking back to the Minnesota Twin Study here, and the related fact that heritability of IQ increases with age (up until age 20, at least). Now, it might be that we’re just not great at measuring childhood IQ, or that childhood IQ and adult IQ are two subtly different things.
But it certainly looks as if there’s factors related to adult brain plasticity, motivation (curiosity, love of reading, something) that continue to affect IQ development at least until the age of 18.
heritability of IQ increases with age (up until age 20, at least)
Straight forward result of how the brain learns. Cortical/cerebellar modules start out empty and mature inwards out—starting with the lowest sensory/motor levels closest to the world and proceeding up the hierarchy ending with the highest/deepest modules like prefrontal and associative cortex. Maturation is physically irreversible as it involves pruning most long-range connections and myelinating&strengthening the select few survivors. Your intelligence potential is constrained prenatally by genes influencing synaptic density/connectivity/efficiency in these higher regions, but those higher regions aren’t (mostly) finishing training until ~20 years age.
Is it true? We need to pour lifetimes of information to get moderate expertize-level performance in SOTA models. I have no significant doubt that we can overcome this via scaling, but with correction on available compute, brains seem to be decent learners.
In addition, I would say that here is a difference between learning capability and elicting it: current models seem to be very sensitive to prompts, wrappnigs and other conditions. It’s possible that intelligence gains can come from easier eliciting of already learned capabilities but blocked by, say, social RLHF.
ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality. Genes determine a brain’s architectural prior just as a small amount of python code determines an ANN’s architectural prior, but the capabilities come only from scaling with compute and data (quantity and quality).
So you absolutely can not take datasets of gene-IQ correlations and assume those correlations would somehow transfer to gene interventions on adults (post training in DL lingo). The genetic contribution to IQ is almost all developmental/training factors (architectural prior, learning algorithm hyper params, value/attention function tweaks, etc) which snowball during training. Unfortunately developmental windows close and learning rates slow down as the brain literally carves/prunes out its structure, so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves.
But it ultimately doesn’t matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
How do you know this?
In comparing human brains to DL, training seems more analogous to natural selection than to brain development. Much simpler “architectural prior”, vastly more compute and data.
We’re really uncertain about how much would transfer! It would probably affect some aspects of intelligence more than others, and I’m afraid it might just not work at all if g is determined by the shape of structures that are ~fixed in adults (e.g. long range white matter connectome). But it’s plausible to me that the more plastic local structures and the properties of individual neurons matter a lot for at least some aspects of intelligence (e.g. see this).
There’s a lot more to intelligence than learning. Combinatorial search, unrolling the consequences of your beliefs, noticing things, forming new abstractions. One might consider forming new abstractions as an important part of learning, which it is, but it seems possible to come up with new abstractions ‘on the spot’ in a way that doesn’t obviously depend on plasticity that much; plasticity would more determine whether the new ideas ‘stick’. I’m bottlenecked by the ability to find new abstractions that usefully simplify reality, not having them stick when I find them.
My model is there’s this thing lurking in the distance, I’m not sure how far out: dangerously capable AI (call it DCAI). If our current civilization manages to cough up one of those, we’re all dead, essentially by definition (if DCAI doesn’t kill everyone, it’s because technical alignment was solved, which our current civilization looks very unlikely to accomplish). We look to be on a trajectory to cough one of those up, but It isn’t at all obvious to me that it’s just around the corner: so stuff like this seems worth trying, since humans qualitatively smarter than any current humans might have a shot at thinking of a way out that we didn’t think of (or just having the mental horsepower to quickly get working something we have thought of, e.g. getting mind uploading working).
From study of DL and neuroscience of course. I’ve also written on this for LW in some reasonably well known posts: starting with The Brain as a Universal Learning Machine, and continuing in Brain Efficiency, and AI Timelines specifically see the Cultural Scaling Criticality section on the source of human intelligence, or the DL section of simboxes. Or you could see Steven Byrne’s extensive LW writings on the brain—we are mostly in agreement on the current consensus from computational/systems neuroscience.
The scaling laws are extremely well established in DL and there are strong theoretical reasons (and increasingly experimental neurosci evidence) that they are universal to all NNs, and we have good theoretical models of why they arise. Strong performance arises from search (bayesian inference) over a large circuit space. Strong general performance is strong performance on many many diverse subtasks which require many many specific circuits built on top of compressed/shared base circuits down a heirarchy. The strongest quantitative predictor of performance is the volume of search space explored which is the product of C * T (capacity and data/time). Data quality matters in the sense that the search volume quantitative function of predictive loss only applies to tasks similar enough to the training data distribution.
No—biological evolution via natural selection is very similar to technological evolution via engineering. Both brains and DL systems have fairly simple architectural priors in comparison to the emergent learned complexity (remember whenever I use the term learning, I use it in a technical sense, not a colloquial sense) - see my first early ULM post for a review of the extensive evidence (greatly substantiated now by my scaling hypothesis predictions coming true with the scaling of transformers which are similar to the archs I discussed in that post).
Whenever I use the word learning, without further clarification, I mean learning as in bayesian learning or deep learning, not in the colloquial sense. My definition/sense of learning encompasses all significant changes to synapses/weights and is all encompassing.
Brains are very slow so have limited combinatorial search, and our search/planning is just short term learning (short/medium term plasticity). Again it’s nearly all learning (synaptic updates).
I find the standard arguments for doom implausible—they rely on many assumptions contradicted by deep knowledge of computational neuroscience and DL.
I was at the WBE2 workshop with Davidad but haven’t yet had time to write about progress (or lack thereof); I think we probably mostly agree that the type of uploading moonshot he discusses there is enormously expensive (not just in initial R&D, but also in recurring per scan costs). I am actually more optimistic than more pure DL based approaches will scale to much lower cost, but “much lower cost” is still on order of GPT4 training cost just to process the scan data through a simple vision ANN—for a single upload.
I’m not aware of these—do you have any references?
True but misleading? Isn’t the brain’s “architectural prior” a heckuva lot more complex than the things used in DL?
Sure. The big crux here is whether plasticity of stuff which is normally “locked down” in adulthood is needed to significantly increase “fluid intelligence” (by which I mean, something like, whatever allows people to invent useful new concepts and find ingenious applications of existing concepts). I’m not convinced these DL analogies are useful—what properties do brains and deepnets share that renders the analogies useful here? DL is a pretty specific thing, so by default I’d strongly expect brains to differ in important ways. E.g. what if the structures whose shapes determine the strength of fluid intelligence aren’t actually “locked down”, but reach a genetically-influenced equilibrium by adulthood, and changing the genes changes the equilibrium? E.g. what if working memory capacity is limited by the noisiness of neural transmission, and we can reduce the noisiness through gene edits?
FOOM isn’t necessary for doom—the convergent endpoint is that you have dangerously capable minds around: minds which can think much faster and figure out things we can’t. FOOM is one way to get there.
[Scaling law theories]
Sure: here’s a few: quantization model, scaling laws from the data manifold, and a statistical model.
The full specification of the DL system includes the microde, OS, etc. Likewise much of the brain complexity is in the smaller ‘oldbrain’ structures that are the equivalent of a base robot OS. The architectural prior I speak of is the complexity on top of that, which separates us from some ancient earlier vertebrate brain. But again see the brain as a ULM post, which cover the the extensive evidence for emergent learned complexity from simple architecture/algorithms (now the dominant hypothesis in neuroscience).
Most everything above the hardware substrate—but i’ve already provided links to sections of my articles addressing the convergence of DL and neurosci with many dozens of references. So it’d probably be better to focus exactly on what specific key analogies/properties you believe diverge.
DL is extremely general—it’s just efficient approximate bayesian inference over circuit spaces. It doesn’t imply any specific architecture, and doesn’t even strongly imply any specific approx inference/learning algorithm (as 1st and approx 2nd order methods are both common).
Training to increase working memory capacity has near zero effect on IQ or downstream intellectual capabilities—see gwern’s reviews and experiments. Working memory capacity is important in both brains and ANNs (transformers), but it comes from large fast weight synaptic capacity, not simple hacks.
Noise is important for sampling—adequate noise is a feature, not a bug.
So I agree with your general point that genetic interventions made in adults would have a lesser effect than those same interventions made in embryos, which is why our model assumes that the average genetic change would have just half the normal effect. The exact relative size of edits made in the adult brain vs an embryo IS one of the major unknown factors in this project but like… if brain size were the only thing affecting intelligence we’d expect a near perfect correlation between it and intelligence. But that’s not what we see.
Brain size only correlates with intelligence at 0.3-0.4.
So there’s obviously a lot more going on.
It’s not post-training. Brains are constantly evolving and adapting throughout the lifespan.
If this was actually the case then none of the stuff people are doing in AI safety or anything else would matter. That’s clearly not true.
We can roughly bin brain tissue into 3 developmental states:
juvenile: macro structure formation—brain expanding, neural tissue morphogenesis, migration, etc
maturing: micro synaptic structure formation, irreversible pruning and myelination
mature: fully myelinated, limited remaining plasticity
Maturation proceeds inside out with the regions closest to the world (lower sensory/motor) maturing first, proceeding up the processing hierarchy, and ending with maturation of the highest levels (some prefrontal, associative etc) around age ~20.
The human brain’s most prized intellectual capabilities are constrained (but not fully determined) mostly by the upper brain regions. Having larger V1 synaptic capacity may make for a better fighter pilot through greater visual acuity, but STEM capability is mostly determined by capacity & efficiency of upper brain regions (prefrontal, associative, etc and their cerebellar partners).
I say constrains rather than determines because training data quantity/quality also obviously constrains. Genius level STEM capability requires not only winning the genetic lottery, but also winning the training run lottery.
IQ itself only correlates with STEM potential (and less so as you move away from the mean) but sure there are many ways to make a brain larger that do not specifically increase synaptic capacity&efficiency of the specific brain regions most important for STEM capability. Making neurons larger, increasing the space between them, increasing glial size or counts, etc. But some brain size increase methods will increase the size of STEM linked brain regions, so 0.3-0.4 seems about right.
The capacity&efficiency of the most important brain regions is mostly determined by genes effecting the earliest stage 1 - the architectural prior. These regions won’t fully be used until ~20 years later due to how the brain trains/matures modules over time, but most of the genetic influence is in stage 1 - i’d guess 75%.
I’d guess the next 20% of genetic influence is on stage 2 factors that effect synaptic efficiency and learning efficiency, and only 5% influence on 3 via fully mature/trained modules.
Yes a few brain regions (hippocampus, BG, etc) need to maintain high plasticity (with some neurogenesis) even well in to adulthood—they never fully mature to stage 3. But that is the exception, not the rule.
Not really—See above. At 45 most of my brain potential is now fully spent. I’m very unlikely to ever be a highly successful chess player or physicist or poet etc. Even learning a new human language is very slow and ineffective compared to a child. It’s all about depletion of synaptic learning potential reserves.
The colloquial use of the word ‘learning’ as in ‘learning’ new factual information is not at all what I mean and is not relevant to STEM capability. I am using ‘learning’ in the more deep learning sense of learning deep complex circuits important for algorithmic creativity, etc.
As a concrete specific example, most humans learn to multiply large numbers by memorizing lookup tables for multiplication of individual digits and memorizing a slow serial mental program built on that. But that isn’t the only way! It is possible to learn more complex circuits which actually do larger sum addition&multiplication directly—and some mentats/savants do acquire these circuits (with JVN being a famous likely example).
STEM capability is determined by deep learning many such circuits, not ‘learning’ factual knowledge.
Now it is likely that one of the key factors for high intelligence is a slower and more efficient maturation cycle that maintains greater synaptic learning reserves far into adulthood—ala enhanced neotany, but that is also an example of genetic influence that only matters in stage 1 and 2. Maturation is largely irreversible—once most connections are pruned and the few survivors are strengthened/myelinated you can’t go back to the earlier immature state of high potential.
Huh? Oh—by learning there I meant full learning in the training sense—stages 1 and 2. Of course things adults do now matter, they just don’t matter through the process of training new improved human brains.
Couldn’t there be genetic effects on things that can improve the brain even once its NN structure is mostly fixed? Maybe it’s possible to have neurons work faster, or for the brain to wear less with abstract thinking, or to need less sleep.
This kind of thing is not a full intelligence improvement because it does not allow you to notice more patterns or to think with new schemes.
So maybe yes, it won’t make a difference for AI timelines, though it would still be a very big deal.
Sure—I’m not saying no improvement is possible. I expect that the enhancements from adult gene editing should encompass most all of the brain tweaks you can get from drugs/diet. But those interventions will not convert an average brain into an Einstein.
The brain—or more specifically the brains of very intelligent people—are already very efficient, so I’m also just skeptical in general that there are many remaining small tweaks that take you past the current “very intelligent”. Biological brains beyond the human limit are of course possible, but probably require further significant size expansion amongst other infeasible changes.
Sleep is very important, less isn’t really better—most of the critical cortex learning/training happens during sleep through episodic replay, SWRs and REM etc.
See Would edits to the adult brain even do anything?.
(Not endorsing the post or that section, just noticing that it seems relevant to your complaint.)
It does not. Despite the title of that section it is focused on adult expression factors. The post in general lacks a realistic mechanistic model of how tweaking genes affects intelligence.
Is similar to expecting that a tweak to the hyperparams (learning rate) etc of trained GPT4 can boost its IQ (yes LLMs have their IQ or g factor). Most all variables that affect adult/trained performance do so only through changing the learning trajectory. The low hanging fruit or free energy in hyperparams with immediate effect is insignificant.
Of course if you combine gene edits with other interventions to rejuvenate older brains or otherwise restore youthful learning rate more is probably possible, but again it doesn’t really matter much as this all takes far too long. Brains are too slow.
We thought a bit about this, though it didn’t make the post. Agree that it increases the chance of the editing having a big effect.
Maybe it’s the lack of sleep for me but is “Brains are too slow.” referring to something like growth/formation of structures that support some level of intelligence or to to human brain’s just being slower than and AGI?
Too slow too matter now due to the slow speed of neurons and bio learning combined with where we are in AI.
Why think the brain learns too slowly? If I can boost my sample efficiency I can learn new subjects quicker, remember more facts, and do better thought-result attribution. All these seem short-term beneficial. Unless you think there’s major critical period barriers here, these all seem likely results.
Though I do agree that a person with the genes of a genius for 2 years will be far less a genius than a person with the genes of a genius for 25 years. It seems a stretch to say the first change can be rounded off as not matteringthough.
It would matter in a world without AI, but that is not the world we live in. Yes if you condition on some indefinite AI pause or something then perhaps, but that seems extremely unlikely. It takes about 30 years to train a new brain—so the next generation of humans won’t reach their prime until around the singularity, long after AGI.
Most genius is determined prenatally and during ‘training’ when cortex/cerebellum modules irreversibly mature, just as the capabilities of GPT4 are determined by the initial code and the training run.
I think I agree with everything that you said except that it won’t matter. It seems like it’d very much matter if in the most successful case we make people learn skills, facts, and new fields 6x faster. Maybe you think 6x is too high, and its more like 1.05x, enough to notice a difference over 30 years, but not over 2-5.
So other than the medical issues which makes this idea unviable (off target edits causing cancer, major brain firmware edits causing uncontrollable seizures), we can also bound our possible performance increases.
We haven’t increased brain volume meaningfully, the patients skull plates are fixed. And they still have one set of eyes and one set of hands. Nerve transmission velocities haven’t been improved either.
In terms of real performance is 6x achievable? Does it mean anything? Even the most difficult tasks humans have ever accomplished require taking in the latest data from the last round of testing and then choosing what to try next based on this information. This “innovation feedback cycle” is several steps, and the “think about it” step, even if we posit we can make it infinitely fast, would be limited by “view the data” and “communicate with other people” steps.
That is I am taking a toy model:
View the data, think about the next experiment, tell someone/something else to do the experiment
Are the only 3 steps.
If the view/tell steps take more than 1⁄6 of the total time 6x performance increase is impossible. This is Amdahls law.
Viewing and telling are themselves learned skills that can be sped up. Most are far from their maximal physically possible reading/listening speed, or their maximal physically possible writing speed. For example, just after high school I tried to read sutton & barto and spent a week reading each chapter. Later I read it & spent a day on each chapter. That’s a 7x improvement just from meta-learning!
You’re still i/o limited though. And optimizations you mention come with tradeoffs, skipping words by speed reading for example won’t work at all when the content has high information density
I still read every word, I just knew better what to think about, recall was faster, etc. I was reading at a leisurely pace as well. If you want to call learning what to pay attention to & how to pay attention to it not an i/o problem, just the physical limits, then I do think i/o is very very fast, taking <<1/6 of time.
Depends on what it is. Experimenting with AI? Fixing cars? Working on particle physics as CERN? Developing the atomic bomb? Discovering the mass of the electron? Performing surgery? Developing new surgery methods?
I/O is at least 90 percent of each of those tasks.
I don’t know how to think about i/o in the tasks you mention, so I don’t think the question is very useful. Definitely on an individual level, much time is spent on i/o, but that’s beside the point, as I said above people can do more efficient i/o than they currently do, and generally smarter people are able to do more efficient i/o. When I ask myself why we aren’t better at the tasks you mention, mostly I think firstly we are coordination constrained, and secondly we are intelligence constrained. Maybe we’re also interface constrained, which seems related to i/o, but generally we make advancements in interfaces, and this improves productivity in all the tasks you mention, and smarter people can use & make smarter interfaces if that is indeed the problem.
A good motivator: There exist 10x software engineers, who are generally regarded as being 10x better programmers than regular engineers. If i/o was the limiter for programming ability, then such people would be expected to simply have better keyboards, finger dexterity, and eyesight. Possibly they have these to some extent, but I expect their main edge over other 1x engineers is greater sample efficiency when generalizing from one programming task to another. We can thus conclude that i/o takes up <1/10 the time in programming. Probably <<1/10.
There also probably exist 10x surgeons, experimentalists, and mechanics. Perhaps there also exist 10x particle physicists at CERN, though there are fewer of them, and it may be less obvious.
So if 10x software engineers exist, they develop architecture and interfaces and use patterns where over time 1⁄10 the total amount of human time per feature is used. Bad code consumes enormous amounts of time to deal with, where bad architecture that blocks adding new features or makes localizing a bug difficult would be the worst.
But to be this good mostly comes from knowledge, learned either in school or over a lot of time from doing it the wrong way and learning how it fails.
It’s not an intelligence thing. A genius swe can locate a bug on a hunch, a 10x swe would write code where the bug doesn’t exist or is obvious every time.
A lot of the other examples I gave I have the impression that no, I/o is everything. Finding the mass of the electron was done with laborious effort over many hours, most of it dealing with the equipment. Nobody can cut 10 times faster in surgery, hands can’t run that quickly. Same with fixing a car. Cern scientists obviously are limited by all sorts of equipment issues. Same with AI research—the limiting factor has always been equipment from the start. “Equipment issues” mean either you get your hands dirty fixing it yourself—that’s I/O or spare parts bound—or you tell someone else to do it and their time to fix is bound the same way.
Some of the best scientists in history could fix equipment issues themselves, this likely broadened their skill base and made their later discoveries feasible.
You are operating on the wrong level of analysis here. The question is about skill improvement, not execution.
They aren’t the same thing? I mean for the topics of interest, AI alignment, there is nothing to learn from other humans or improve on past a certain baseline level of knowledge. Past a certain point reading papers on it I suspect your learning curve would go negative because you’re just learning on errors people before you made.
Improving past that point has to be designing and executing high knowledge gain experiments, and that’s I/o and funding bound.
I would argue that the above is the rule for anything humans cannot already do.
Were you thinking of skills where it’s a confined objective task? Like StarCraft 2 or Go? The former being strongly I/o bound.
I’m very confident we’re talking past each other, and I’m not in the mood to figure out what we actually disagree on. I think we’re using “i/o” differently, and I claim your use permits improvements to the process, which contradicts your argument.
I am not so sure about that. I am thinking back to the Minnesota Twin Study here, and the related fact that heritability of IQ increases with age (up until age 20, at least). Now, it might be that we’re just not great at measuring childhood IQ, or that childhood IQ and adult IQ are two subtly different things.
But it certainly looks as if there’s factors related to adult brain plasticity, motivation (curiosity, love of reading, something) that continue to affect IQ development at least until the age of 18.
Straight forward result of how the brain learns. Cortical/cerebellar modules start out empty and mature inwards out—starting with the lowest sensory/motor levels closest to the world and proceeding up the hierarchy ending with the highest/deepest modules like prefrontal and associative cortex. Maturation is physically irreversible as it involves pruning most long-range connections and myelinating&strengthening the select few survivors. Your intelligence potential is constrained prenatally by genes influencing synaptic density/connectivity/efficiency in these higher regions, but those higher regions aren’t (mostly) finishing training until ~20 years age.
Is it true? We need to pour lifetimes of information to get moderate expertize-level performance in SOTA models. I have no significant doubt that we can overcome this via scaling, but with correction on available compute, brains seem to be decent learners.
In addition, I would say that here is a difference between learning capability and elicting it: current models seem to be very sensitive to prompts, wrappnigs and other conditions. It’s possible that intelligence gains can come from easier eliciting of already learned capabilities but blocked by, say, social RLHF.
Current AI is less sample efficient, but that is mostly irrelevant as the effective speed is 1000x to 10000x greater.
By the time current human infants finish ~30 year biological training, we’ll by long past AGI and approaching singularity (in hyperexpoential models).