Meta: I don’t want this comment to be taken as “I disagree with everything you (Thomas) said.” I do think the question of what to do when you have an opaque, potentially intractable problem is not obvious, and I don’t want to come across as saying that I have the definitive answer, or anything like that. It’s tricky to know what to do, here, and I certainly think it makes sense to focus on more concrete problems if deconfusion work didn’t seem that useful to you.
That said, at a high-level I feel pretty strongly about investing in early-stage deconfusion work, and I disagree with many of the object-level claims you made suggesting otherwise. For instance:
The neuroscientists I’ve talked to say that a new scanning technology that could measure individual neurons would revolutionize neuroscience, much more than a theoretical breakthrough. But in interpretability we already have this, and we’re just missing the software.
It seems to me like the history of neuroscience should inspire the opposite conclusion: a hundred years of increasingly much data collection at finer and finer resolution, and yet, we still have a field that even many neuroscientists agree barely understands anything. I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion. The main problem, in my opinion, is that theory usually tells us which facts to collect. Without it—without even a proto-theory or a rough guess, as with “model-free data collection” approaches—you are basically just taking shots in the dark and hoping that if you collect a truly massive amount of data, and somehow search over it for regularities, that theory will emerge. This seems pretty hopeless to me, and entirely backwards from how science has historically progressed.
It seems similarly pretty hopeless to me to expect a “revolution” out of tabulating features of the brain at low-enough resolution. Like, I certainly buy that it gets us some cool insights, much like every other imaging advance has gotten us some cool insights. But I don’t think the history of neuroscience really predicts a “revolution,” here. Aside from the computational costs of “understanding” an object in such a way, I just don’t really buy that you’re guaranteed to find all the relevant regularities. You can never collect *all* the data, you have to make choices and tradeoffs when you measure the world, and without a theory to tell you which features are meaningfully irrelevant and can be ignored, it’s hard to know that you’re ultimately looking at the right thing.
I ran into this problem, for instance, when I was researching cortical uniformity. Academia has amassed a truly gargantuan collection of papers on the structural properties of the human neocortex. What on Earth do any of these papers say about how algorithmically uniform the brain is? As far as I can tell, pretty much close to zero, because we have no idea how the structural properties of the cortex relate to the functional ones, and so who’s to say that “neuron subtype A is more dense in the frontal cortex relative to the visual cortex” is a meaningful finding or not? I worry that other “shot in the dark” data collection methods will suffer similar setbacks.
Eliezer has written about how Einstein cleverly used very limited data to discover relativity. But we could have discovered relativity easily if we observed not only the precession of Mercury, but also the drifting of GPS clocks, gravitational lensing of distant galaxies, gravitational waves, etc.
It’s of course difficult to say how science might have progressed counterfactually, but I find it pretty hard to believe that relativity would have been “discovered easily” were we to have had a bunch of data staring us in the face. In general, I think it’s very easy to underestimate how difficult it is to come up with new concepts. I felt this way when I was reading about Darwin and how it took him over a year to go from realizing that “artificial selection is the means by which breeders introduce changes,” to realizing that “natural selection is the means by which changes are introduced in the wild.” But then I spent a long time in his shoes, so to speak, operating from within the concepts he had available to him at the time, and I became more humbled. For instance, among other things, it seems like a leap to go from “a human uses their intellect to actively select” to “nature ends up acting like a selector, in the sense that its conditions favor some traits for survival over others.” These feel like quite different “types” of things, in some ways.
In general, I suspect it’s easy to take the concepts we already have, look over past data, and assume it would have been obvious. But I think the history of science again speaks to the contrary: scientific breakthroughs are rare, and I don’t think it’s usually the case that they’re rare because of a lack of data, but because they require looking at that data differently. Perhaps data on gravitational lensing may have roused scientists to notice that there were anomalies, and may have eventually led to general relativity. But the actual process of taking the anomalies and turning that into a theory is, I think, really hard. Theories don’t just pop out wholesale when you have enough data, they take serious work.
I heard David Bau say something interesting at the ICML safety workshop: in the 1940s and 1950s lots of people were trying to unlock the basic mysteries of life from first principles. How was hereditary information transmitted? Von Neumann designed a universal constructor in a cellular automaton, and even managed to reason that hereditary information was transmitted digitally for error correction, but didn’t get further. But it was Crick, Franklin, and Watson who used crystallography data to discover the structure of DNA, unraveling far more mysteries. Since then basically all advances in biochemistry have been empirical. Biochemistry is a case study where philosophy and theory failed to solve the problems but empirical work succeeded, and maybe interpretability and intelligence are similar.
This story misses some pretty important pieces. For instance, Schrödinger predicted basic features about DNA—that it was an aperiodic crystal—using first principles in his book What if Life? published in 1944. The basic reasoning is that in order to stably encode genetic information, the molecule should itself be stable, i.e., a crystal. But to encode a variety of information, rather than the same thing repeated indefinitely, it needs to be aperiodic. An aperiodic crystal is a molecule that can use a few primitives to encode near infinite possibilities, in a stable way. His book was very influential, and Francis and Crick both credited Schrödinger with the theoretical ideas that guided their search. I also suspect their search went much faster than it would have otherwise; many biologists at the time thought that the hereditary molecule was a protein, of which there are tens of millions in a typical cell.
But, more importantly, I would certainly not say that biochemistry is an area where empirical work has succeeded to nearly the extent that we might hope it to. Like, we still can’t cure cancer, or aging, or any of the myriad medical problems people have to endure; we still can’t even define “life” in a reasonable way, or answer basic questions like “why do arms come out basically the same size?” The discovery of DNA was certainly huge, and helpful, but I would say that we’re still quite far from a major success story with biology.
My guess is that it is precisely because we lack theory that we are unable to answer these basic questions, and to advance medicine as much as we want. Certainly the “tabulate indefinitely” approach will continue pushing the needle on biological research, but I doubt it is going to get us anywhere near the gains that, e.g., “the hereditary molecule is an aperiodic crystal” did.
And while it’s certainly possible that biology, intelligence, agency and so on are just not amenable to the cleave-reality-at-its-joints type of clarity one gets from scientific inquiry, I’m pretty skeptical that this the world we in fact live in, for a few reasons.
For one, it seems to me that practically no one is trying to find theories in biology. It is common for biologists (even bright-eyed, young PhDs at elite universities) to say things like (and in some cases this exact sentence): “there are no general theories in biology because biology is just chemistry which is just physics.” These are people at the beginning of their careers, throwing in the towel before they’ve even started! Needless to say, this take is clearly not true in all generality, because it would anti-predict natural selection. It would also, I think, anti-predict Newtonian mechanics (“there are no general theories of motion because motion is just the motion of chemicals which is just the motion of particles which is just physics”).
Secondly, I think that practically all scientific disciplines look messy, ad-hoc, and empirical before we get theories that tie it together, and that this does not on its own suggest biology is a theoretically bankrupt field. E.g., we had steam engines before we knew about thermodynamics, but they were kind of ad-hoc, messy contraptions, because we didn’t really understand what variables were causing the “work.” Likewise, naturalism before Darwin was largely compendiums upon compendiums of people being like “I saw this [animal/fossil/plant/rock] here, doing this!” Science before theory often looks like this, I think.
Third: I’m just like, look guys, I don’t really know what to tell you, but when I look at the world and I see intelligences doing stuff, I sense deep principles. It’s a hunch, to be sure, and kind of hard to justify, but it feels very obvious to me. And if there are deep principles to be had, then I sure as hell want to find them. Because it’s embarrassing that at this point we don’t even know what intelligence is, nor agency, nor abstractions: how to measure any of it, predict when it will increase or not. These are the gears that are going to move our world, for better or for worse, and I at least want my hands on the steering wheel when they do.
I think that sometimes people don’t really know what to envision with theoretical work on alignment, or “agent foundations”-style work. My own vision is quite simple: I want to do great science, as great science has historically been done, and to figure out what in god’s name any of these phenomena are. I want to be able to measure that which threatens our existence, such that we may learn to control it. And even though I am of course not certain this approach is workable, it feels very important to me to try. I think there is a strong case for there being a shot, here, and I want us to take it.
I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion.
I remember the introductory lecture for the Cognitive Neuroscience course I took at Oxford. I won’t mention the professor’s name, because he’s got his own lab and is all senior and stuff, and might not want his blunt view to be public—but his take was “this field is 95% nonsense. I’ll try to talk about the 5% that isn’t”. Here’s a lecture slide:
Thanks, I really like this comment. Here are some points about metascience I agree and disagree with, and how this fits into my framework for thinking about deconfusion vs data collection in AI alignment.
I tentatively think you’re right about relativity, though I also feel out of my depth here. [1]
David Bau must have mentioned the Schrödinger book but I forgot about it, thanks for the correction. The fact that ideas like this told Watson and Crick where to look definitely seems important.
Overall, I agree that a good theoretical understanding guides further experiment and discovery early on in a field.
However, I don’t think curing cancer or defining life are bottlenecked on deconfusion. [2]
For curing cancer, we know the basic mechanisms behind cancer and understand that they’re varied and complex. We have categorized dozens of oncogenes of about 7 different types, and equally many ways that organisms defend against cancer. It seems unlikely that the the cure for cancer will depend on some unified theory of cancer, and much more plausible that it’ll be due to investments in experiment and engineering. It was mostly engineering that gave us mRNA vaccines, and a mix of all three that allowed CRISPR.
For defining life, we already have edge cases like viruses and endosymbiotic organisms, and understand pretty well which things can maintain homeostasis, reproduce, etc. in what circumstances. It also seems unlikely that someone will draw a much sharper boundary around life, especially without lots more useful data.
My model of the basic process of science currently looks like this:
Note that I distinguish deconfusion (what you can invest in) from theory (the output). At any time, there are various returns to investing in data collection tech, experiments, and deconfusion, and returns diminish with the amount invested. I claim that in both physics and biology, we went through an early phase where the bottleneck was theory and returns to deconfusion were high, and currently the fields are relatively mature, such that the bottlenecks have shifted to experiment and engineering, but with theory still playing a role.
In AI, we’re in a weird situation:
We feel pretty confused about basic concepts like agency, suggesting that we’re early and deconfusion is valuable.
Machine learning is a huge field. If there are diminishing returns to deconfusion, this means experiment and data collection are more valuable than deconfusion.
Machine learning is already doing impressive things primarily on the back of engineering, without much reliance on the type of theory that deconfusion generates alone (deep, simple relationships between things).
But even if engineering alone is enough to build AGI, we need theory for alignment.
In biology, we know cancer is complex and unlikely to be understood by a deep simple theory, but in AI, we don’t know whether intelligence is complex.
I’m not sure what to make of all this, and this comment is too long already, but hopefully I’ve laid out a frame that we can roughly agree on.
[1] When writing the dialogue I thought the hard part of special relativity was discovering the Lorentz transformations (which GPS clock drift observations would make obvious), but Lorentz did this between 1892-1904 and it took until 1905 for Einstein to discover the theory of special relativity. I missed the point about theory guiding experiment earlier, and without relativity we would not have built gravitational wave detectors. I’m not sure whether this also applies to gravitational lensing or redshift.
[2] I also disagree with the idea that “practically no one is trying to find theories in biology”. Theoretical biology seems like a decently large field—probably much larger than it was in 1950-- and biologists use mathematical models all the time.
we still can’t even define “life” in a reasonable way, or answer basic questions like “why do arms come out basically the same size?”
Such a definition seems futile (I recommend the rest of the word sequence also). Biology already does a great job explaining what and why some things are alive. We are not going around thinking a rock is “alive”. Or what exactly did you have in mind there?
Meta: I don’t want this comment to be taken as “I disagree with everything you (Thomas) said.” I do think the question of what to do when you have an opaque, potentially intractable problem is not obvious, and I don’t want to come across as saying that I have the definitive answer, or anything like that. It’s tricky to know what to do, here, and I certainly think it makes sense to focus on more concrete problems if deconfusion work didn’t seem that useful to you.
That said, at a high-level I feel pretty strongly about investing in early-stage deconfusion work, and I disagree with many of the object-level claims you made suggesting otherwise. For instance:
It seems to me like the history of neuroscience should inspire the opposite conclusion: a hundred years of increasingly much data collection at finer and finer resolution, and yet, we still have a field that even many neuroscientists agree barely understands anything. I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion. The main problem, in my opinion, is that theory usually tells us which facts to collect. Without it—without even a proto-theory or a rough guess, as with “model-free data collection” approaches—you are basically just taking shots in the dark and hoping that if you collect a truly massive amount of data, and somehow search over it for regularities, that theory will emerge. This seems pretty hopeless to me, and entirely backwards from how science has historically progressed.
It seems similarly pretty hopeless to me to expect a “revolution” out of tabulating features of the brain at low-enough resolution. Like, I certainly buy that it gets us some cool insights, much like every other imaging advance has gotten us some cool insights. But I don’t think the history of neuroscience really predicts a “revolution,” here. Aside from the computational costs of “understanding” an object in such a way, I just don’t really buy that you’re guaranteed to find all the relevant regularities. You can never collect *all* the data, you have to make choices and tradeoffs when you measure the world, and without a theory to tell you which features are meaningfully irrelevant and can be ignored, it’s hard to know that you’re ultimately looking at the right thing.
I ran into this problem, for instance, when I was researching cortical uniformity. Academia has amassed a truly gargantuan collection of papers on the structural properties of the human neocortex. What on Earth do any of these papers say about how algorithmically uniform the brain is? As far as I can tell, pretty much close to zero, because we have no idea how the structural properties of the cortex relate to the functional ones, and so who’s to say that “neuron subtype A is more dense in the frontal cortex relative to the visual cortex” is a meaningful finding or not? I worry that other “shot in the dark” data collection methods will suffer similar setbacks.
It’s of course difficult to say how science might have progressed counterfactually, but I find it pretty hard to believe that relativity would have been “discovered easily” were we to have had a bunch of data staring us in the face. In general, I think it’s very easy to underestimate how difficult it is to come up with new concepts. I felt this way when I was reading about Darwin and how it took him over a year to go from realizing that “artificial selection is the means by which breeders introduce changes,” to realizing that “natural selection is the means by which changes are introduced in the wild.” But then I spent a long time in his shoes, so to speak, operating from within the concepts he had available to him at the time, and I became more humbled. For instance, among other things, it seems like a leap to go from “a human uses their intellect to actively select” to “nature ends up acting like a selector, in the sense that its conditions favor some traits for survival over others.” These feel like quite different “types” of things, in some ways.
In general, I suspect it’s easy to take the concepts we already have, look over past data, and assume it would have been obvious. But I think the history of science again speaks to the contrary: scientific breakthroughs are rare, and I don’t think it’s usually the case that they’re rare because of a lack of data, but because they require looking at that data differently. Perhaps data on gravitational lensing may have roused scientists to notice that there were anomalies, and may have eventually led to general relativity. But the actual process of taking the anomalies and turning that into a theory is, I think, really hard. Theories don’t just pop out wholesale when you have enough data, they take serious work.
This story misses some pretty important pieces. For instance, Schrödinger predicted basic features about DNA—that it was an aperiodic crystal—using first principles in his book What if Life? published in 1944. The basic reasoning is that in order to stably encode genetic information, the molecule should itself be stable, i.e., a crystal. But to encode a variety of information, rather than the same thing repeated indefinitely, it needs to be aperiodic. An aperiodic crystal is a molecule that can use a few primitives to encode near infinite possibilities, in a stable way. His book was very influential, and Francis and Crick both credited Schrödinger with the theoretical ideas that guided their search. I also suspect their search went much faster than it would have otherwise; many biologists at the time thought that the hereditary molecule was a protein, of which there are tens of millions in a typical cell.
But, more importantly, I would certainly not say that biochemistry is an area where empirical work has succeeded to nearly the extent that we might hope it to. Like, we still can’t cure cancer, or aging, or any of the myriad medical problems people have to endure; we still can’t even define “life” in a reasonable way, or answer basic questions like “why do arms come out basically the same size?” The discovery of DNA was certainly huge, and helpful, but I would say that we’re still quite far from a major success story with biology.
My guess is that it is precisely because we lack theory that we are unable to answer these basic questions, and to advance medicine as much as we want. Certainly the “tabulate indefinitely” approach will continue pushing the needle on biological research, but I doubt it is going to get us anywhere near the gains that, e.g., “the hereditary molecule is an aperiodic crystal” did.
And while it’s certainly possible that biology, intelligence, agency and so on are just not amenable to the cleave-reality-at-its-joints type of clarity one gets from scientific inquiry, I’m pretty skeptical that this the world we in fact live in, for a few reasons.
For one, it seems to me that practically no one is trying to find theories in biology. It is common for biologists (even bright-eyed, young PhDs at elite universities) to say things like (and in some cases this exact sentence): “there are no general theories in biology because biology is just chemistry which is just physics.” These are people at the beginning of their careers, throwing in the towel before they’ve even started! Needless to say, this take is clearly not true in all generality, because it would anti-predict natural selection. It would also, I think, anti-predict Newtonian mechanics (“there are no general theories of motion because motion is just the motion of chemicals which is just the motion of particles which is just physics”).
Secondly, I think that practically all scientific disciplines look messy, ad-hoc, and empirical before we get theories that tie it together, and that this does not on its own suggest biology is a theoretically bankrupt field. E.g., we had steam engines before we knew about thermodynamics, but they were kind of ad-hoc, messy contraptions, because we didn’t really understand what variables were causing the “work.” Likewise, naturalism before Darwin was largely compendiums upon compendiums of people being like “I saw this [animal/fossil/plant/rock] here, doing this!” Science before theory often looks like this, I think.
Third: I’m just like, look guys, I don’t really know what to tell you, but when I look at the world and I see intelligences doing stuff, I sense deep principles. It’s a hunch, to be sure, and kind of hard to justify, but it feels very obvious to me. And if there are deep principles to be had, then I sure as hell want to find them. Because it’s embarrassing that at this point we don’t even know what intelligence is, nor agency, nor abstractions: how to measure any of it, predict when it will increase or not. These are the gears that are going to move our world, for better or for worse, and I at least want my hands on the steering wheel when they do.
I think that sometimes people don’t really know what to envision with theoretical work on alignment, or “agent foundations”-style work. My own vision is quite simple: I want to do great science, as great science has historically been done, and to figure out what in god’s name any of these phenomena are. I want to be able to measure that which threatens our existence, such that we may learn to control it. And even though I am of course not certain this approach is workable, it feels very important to me to try. I think there is a strong case for there being a shot, here, and I want us to take it.
I remember the introductory lecture for the Cognitive Neuroscience course I took at Oxford. I won’t mention the professor’s name, because he’s got his own lab and is all senior and stuff, and might not want his blunt view to be public—but his take was “this field is 95% nonsense. I’ll try to talk about the 5% that isn’t”. Here’s a lecture slide:
Lol possibly someone should try to make this professor work for Steven Byrnes / on his agenda.
Top level blog post, do it.
Thanks, I really like this comment. Here are some points about metascience I agree and disagree with, and how this fits into my framework for thinking about deconfusion vs data collection in AI alignment.
I tentatively think you’re right about relativity, though I also feel out of my depth here. [1]
David Bau must have mentioned the Schrödinger book but I forgot about it, thanks for the correction. The fact that ideas like this told Watson and Crick where to look definitely seems important.
Overall, I agree that a good theoretical understanding guides further experiment and discovery early on in a field.
However, I don’t think curing cancer or defining life are bottlenecked on deconfusion. [2]
For curing cancer, we know the basic mechanisms behind cancer and understand that they’re varied and complex. We have categorized dozens of oncogenes of about 7 different types, and equally many ways that organisms defend against cancer. It seems unlikely that the the cure for cancer will depend on some unified theory of cancer, and much more plausible that it’ll be due to investments in experiment and engineering. It was mostly engineering that gave us mRNA vaccines, and a mix of all three that allowed CRISPR.
For defining life, we already have edge cases like viruses and endosymbiotic organisms, and understand pretty well which things can maintain homeostasis, reproduce, etc. in what circumstances. It also seems unlikely that someone will draw a much sharper boundary around life, especially without lots more useful data.
My model of the basic process of science currently looks like this:
Note that I distinguish deconfusion (what you can invest in) from theory (the output). At any time, there are various returns to investing in data collection tech, experiments, and deconfusion, and returns diminish with the amount invested. I claim that in both physics and biology, we went through an early phase where the bottleneck was theory and returns to deconfusion were high, and currently the fields are relatively mature, such that the bottlenecks have shifted to experiment and engineering, but with theory still playing a role.
In AI, we’re in a weird situation:
We feel pretty confused about basic concepts like agency, suggesting that we’re early and deconfusion is valuable.
Machine learning is a huge field. If there are diminishing returns to deconfusion, this means experiment and data collection are more valuable than deconfusion.
Machine learning is already doing impressive things primarily on the back of engineering, without much reliance on the type of theory that deconfusion generates alone (deep, simple relationships between things).
But even if engineering alone is enough to build AGI, we need theory for alignment.
In biology, we know cancer is complex and unlikely to be understood by a deep simple theory, but in AI, we don’t know whether intelligence is complex.
I’m not sure what to make of all this, and this comment is too long already, but hopefully I’ve laid out a frame that we can roughly agree on.
[1] When writing the dialogue I thought the hard part of special relativity was discovering the Lorentz transformations (which GPS clock drift observations would make obvious), but Lorentz did this between 1892-1904 and it took until 1905 for Einstein to discover the theory of special relativity. I missed the point about theory guiding experiment earlier, and without relativity we would not have built gravitational wave detectors. I’m not sure whether this also applies to gravitational lensing or redshift.
[2] I also disagree with the idea that “practically no one is trying to find theories in biology”. Theoretical biology seems like a decently large field—probably much larger than it was in 1950-- and biologists use mathematical models all the time.
I don’t have the energy to contribute actual thoughts, but here are a few links that may be relevant to this conversation:
Sequencing is the new microscope, by Laura Deming
On whether neuroscience is primarily data, rather than theory, bottlenecked:
Could a neuroscientist understand a microprocessor?, by Eric Jonas
This footnote on computational neuroscience, by Jascha Sohl-Dickstein
Such a definition seems futile (I recommend the rest of the word sequence also). Biology already does a great job explaining what and why some things are alive. We are not going around thinking a rock is “alive”. Or what exactly did you have in mind there?
Same quote, emphasis on the basic question.
What’s wrong with « Left and right limbs come out basically the same size because it’s the same construction plan. »?
A sufficiently mechanistic answer lets us engineer useful things, e.g. constructing an animal with left limbs 2 inches longer than its right limbs.
Ups. Yeah I forgot to address that one. I was just astonished to hear no one knows the answer to that one.