Goodheart is about generalization, not approximation.
This seems true, and I agree that there’s a philosophical hangup along the lines of “but everything above the lowest level is fuzzy and subjective, so all of our concepts are inevitably approximate.” I think there are a few things going on with this, but my guess is that part of what this is tracking—at least in the case of biology and AI—is something like “tons of tiny causal factors.”
Like, I think one of the reasons that optimizing hard for good grades as a proxy for doing well at a job fails is because “getting good grades” has many possible causes. One way is by being intelligent; others include cheating, memorizing a bunch of stuff, etc. I.e., there are many pathways to the target, and if you optimize hard for the target, then you get people putting more effort into the pathways they can meaningfully influence.
Approximate measures are bad, I think, when the measure is not in one-to-one lock step with the “real thing.” This happens with grades, but it isn’t true, for instance, in science when we understand the causal underpinnings of phenomena. I know exactly how to increase pressure in a box (increase the number of molecules, make it hotter, etc.), and there is no other secret causal factor which might achieve the same end.
The question, to me, is whether or not these sorts of clean causal structures (as we have with, e.g. “pressure”) exist in biological systems or AI. I suspect they do, and that we just have to find the right way of understanding them. But I think part of the reason people expect these domains to be hopelessly messy is because their behavior seems to be mostly determined by tons of tiny casual factors (chemical signals, sequence of specific neurons firing, etc.)… Like, I read this biology paper once where the authors hilariously lamented that their chart of signaling cascades was a “horror graph” where “everything does everything to everything.”
Indeed, the underlying structure “everything does everything to everything” does seem less conducive to being precisely conceptualized. Especially when (imo) ideal scientific understanding consists of finding the few, isolated causal factors which produce an effect (as in the case of “pressure”). But if everything we care about—abstractions, agency, goals, etc.—are coarse-grainings over billions of semi-independent causal factors, produced in path-dependent ways, then the idea of finding concepts as precise as “pressure” starts looking kind of hopeless, I think.
Like you, I suspect that this is mostly an ontology issue (i.e., that neurons/etc are not necessarily the natural units), and I do think that precise concepts of agency/abstractions/etc both exist and are findable. Partially this is an empirical claim—as you note, we do actually find surprisingly clean structures in biological systems (such as modularity). But also, my proposed angle against “fractal complexity” is something like: I expect life to be understandable because it needs to control itself. I’m not going to give the full argument here, but to speak in my own mentalese for a paragraph:
When you get large, directed systems—(e.g., we are composed of 40 trillion cells, each containing tens of millions of proteins)—I think you basically need some level of modularity if there’s any hope of steering the whole thing. E.g., I think one of the reasons we see modular structure across all levels in biology is because modular structures are easier to control (the input/output structure is much cleaner than if it were some messy clump where “everything does everything to everything”). If my brain had to figure out how to navigate some hyper complex horror graph with a gazillion tiny causal factors every time I wanted to move my hand, I wouldn’t get very far. There’s more argument here to totally spell this out—like why that can’t all be pre-computed, etc.—but I think the fact that biological organisms can reliably coordinate their activity in flexible/general ways is evidence of there being clean structure.
I agree there’s a core principle somewhere around the idea of “controllable implies understandable”. But when I think about this with respect to humans studying biology, then there’s another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it’s not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn’t mean that our goal, “reliably stopping cancer” is understandable, since it is not a way that the system is controlling itself.
This mismatch seems pretty evidently applicable to AI alignment.
And perhaps the “environment” part is critical. A system being controllable in one environment doesn’t imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.
Yeah, I think I misspoke a bit. I do think that controllability is related to understanding, but also I’m trying to gesture at something more like “controllability implies simpleness.”
Where, I think what I’m tracking with “controllability implies simpleness” is that the ease with which we can control things is a function of how many causal factors there are in creating it, i.e., “conjunctive things are less likely” in some Occam’s Razor sense, but also conjunctive things “cost more.” At the very least, they cost more from the agents point of view. Like, if I have to control every step in some horror graph in order to get the outcome I want, that’s a lot of stuff that has to go perfectly, and that’s hard. If “I,” on the other hand, as a virus, or cancer, or my brain, or whatever, only have to trigger a particular action to set off a cascade which reliably results in what I want, this is easier.
There is a sense, of course, in which the spaghetti code thing is still happening underneath it all, but I think it matters what level of abstraction you’re taking with respect to the system. Like, with enough zoom, even very simple activities start looking complex. If you looked at a ball following a parabolic arc at the particle level, it’d be pretty wild. And yet, one might reasonably assume that the regularity with which the ball follows the arc is meaningful. I am suspicious that much of the “it’s all ad-hoc” intuitions about biology/neuroscience/etc are making a zoom error (also typing errors, but that’s another problem). Even simple things can look complicated if you don’t look at them right, and I suspect that the ease with which we can operate in our world should be some evidence that this simpleness “exists,” much like the movement of the ball is “simple” even though a whole world of particles underlies its behavior.
To your point, I do think that the ability to control something is related to understanding it, but not exactly the same. Like, there indeed might be things in the environment that we don’t have much control over, even if we can clearly see what their effects are. Although, I’d be a little surprised if there were true for, e.g., obesity. Like, it seems likely that hormones (semaglutide) can control weight loss, which makes sense to me. A bunch of global stuff in bodies is regulated by hormones, in a similar way, I think, to how viruses “hook into” the cells machinery, i.e., it reliably triggers the right cascades. And controlling something doesn’t necessarily mean that it’s possible to understand it completely. But I do suspect that the ability to control implies the existence of some level of analysis upon which the activity is pretty simple.
When you get large, directed systems—(e.g., we are composed of 40 trillion cells, each containing tens of millions of proteins)—I think you basically need some level of modularity if there’s any hope of steering the whole thing.
This seems basically right to me. That said, while it is predictable that the systems in question will be modular, what exact form that modularity takes is both environment-dependent and also path-dependent. Even in cases where the environmental pressures form a very strong attractor for a particular shape of solution, the “module divisions” can differ between species. For example, the pectoral fins of fish and the pectoral flippers of dolphins both fulfill similar roles. However, fish fins are basically a collection of straight, parallel fin rays made of bone or cartilage and connected to a base inside the body of the fish, and the muscles to control the movement of the fin are located within the body of the fish. By contrast, a dolphin’s flipper is derived from the foreleg of its tetrapod ancestor, and contains “fingers” which can be moved by muscles within the flipper.
So I think approaches that look like “find a structure that does a particular thing, and try to shape that structure in the way you want” are somewhat (though not necessarily entirely) doomed, because the pressures that determine which particular structure does a thing are not nearly so strong as the pressures that determine that some structure does the thing.
This seems true, and I agree that there’s a philosophical hangup along the lines of “but everything above the lowest level is fuzzy and subjective, so all of our concepts are inevitably approximate.” I think there are a few things going on with this, but my guess is that part of what this is tracking—at least in the case of biology and AI—is something like “tons of tiny causal factors.”
Like, I think one of the reasons that optimizing hard for good grades as a proxy for doing well at a job fails is because “getting good grades” has many possible causes. One way is by being intelligent; others include cheating, memorizing a bunch of stuff, etc. I.e., there are many pathways to the target, and if you optimize hard for the target, then you get people putting more effort into the pathways they can meaningfully influence.
Approximate measures are bad, I think, when the measure is not in one-to-one lock step with the “real thing.” This happens with grades, but it isn’t true, for instance, in science when we understand the causal underpinnings of phenomena. I know exactly how to increase pressure in a box (increase the number of molecules, make it hotter, etc.), and there is no other secret causal factor which might achieve the same end.
The question, to me, is whether or not these sorts of clean causal structures (as we have with, e.g. “pressure”) exist in biological systems or AI. I suspect they do, and that we just have to find the right way of understanding them. But I think part of the reason people expect these domains to be hopelessly messy is because their behavior seems to be mostly determined by tons of tiny casual factors (chemical signals, sequence of specific neurons firing, etc.)… Like, I read this biology paper once where the authors hilariously lamented that their chart of signaling cascades was a “horror graph” where “everything does everything to everything.”
Indeed, the underlying structure “everything does everything to everything” does seem less conducive to being precisely conceptualized. Especially when (imo) ideal scientific understanding consists of finding the few, isolated causal factors which produce an effect (as in the case of “pressure”). But if everything we care about—abstractions, agency, goals, etc.—are coarse-grainings over billions of semi-independent causal factors, produced in path-dependent ways, then the idea of finding concepts as precise as “pressure” starts looking kind of hopeless, I think.
Like you, I suspect that this is mostly an ontology issue (i.e., that neurons/etc are not necessarily the natural units), and I do think that precise concepts of agency/abstractions/etc both exist and are findable. Partially this is an empirical claim—as you note, we do actually find surprisingly clean structures in biological systems (such as modularity). But also, my proposed angle against “fractal complexity” is something like: I expect life to be understandable because it needs to control itself. I’m not going to give the full argument here, but to speak in my own mentalese for a paragraph:
When you get large, directed systems—(e.g., we are composed of 40 trillion cells, each containing tens of millions of proteins)—I think you basically need some level of modularity if there’s any hope of steering the whole thing. E.g., I think one of the reasons we see modular structure across all levels in biology is because modular structures are easier to control (the input/output structure is much cleaner than if it were some messy clump where “everything does everything to everything”). If my brain had to figure out how to navigate some hyper complex horror graph with a gazillion tiny causal factors every time I wanted to move my hand, I wouldn’t get very far. There’s more argument here to totally spell this out—like why that can’t all be pre-computed, etc.—but I think the fact that biological organisms can reliably coordinate their activity in flexible/general ways is evidence of there being clean structure.
I agree there’s a core principle somewhere around the idea of “controllable implies understandable”. But when I think about this with respect to humans studying biology, then there’s another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it’s not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn’t mean that our goal, “reliably stopping cancer” is understandable, since it is not a way that the system is controlling itself.
This mismatch seems pretty evidently applicable to AI alignment.
And perhaps the “environment” part is critical. A system being controllable in one environment doesn’t imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.
Yeah, I think I misspoke a bit. I do think that controllability is related to understanding, but also I’m trying to gesture at something more like “controllability implies simpleness.”
Where, I think what I’m tracking with “controllability implies simpleness” is that the ease with which we can control things is a function of how many causal factors there are in creating it, i.e., “conjunctive things are less likely” in some Occam’s Razor sense, but also conjunctive things “cost more.” At the very least, they cost more from the agents point of view. Like, if I have to control every step in some horror graph in order to get the outcome I want, that’s a lot of stuff that has to go perfectly, and that’s hard. If “I,” on the other hand, as a virus, or cancer, or my brain, or whatever, only have to trigger a particular action to set off a cascade which reliably results in what I want, this is easier.
There is a sense, of course, in which the spaghetti code thing is still happening underneath it all, but I think it matters what level of abstraction you’re taking with respect to the system. Like, with enough zoom, even very simple activities start looking complex. If you looked at a ball following a parabolic arc at the particle level, it’d be pretty wild. And yet, one might reasonably assume that the regularity with which the ball follows the arc is meaningful. I am suspicious that much of the “it’s all ad-hoc” intuitions about biology/neuroscience/etc are making a zoom error (also typing errors, but that’s another problem). Even simple things can look complicated if you don’t look at them right, and I suspect that the ease with which we can operate in our world should be some evidence that this simpleness “exists,” much like the movement of the ball is “simple” even though a whole world of particles underlies its behavior.
To your point, I do think that the ability to control something is related to understanding it, but not exactly the same. Like, there indeed might be things in the environment that we don’t have much control over, even if we can clearly see what their effects are. Although, I’d be a little surprised if there were true for, e.g., obesity. Like, it seems likely that hormones (semaglutide) can control weight loss, which makes sense to me. A bunch of global stuff in bodies is regulated by hormones, in a similar way, I think, to how viruses “hook into” the cells machinery, i.e., it reliably triggers the right cascades. And controlling something doesn’t necessarily mean that it’s possible to understand it completely. But I do suspect that the ability to control implies the existence of some level of analysis upon which the activity is pretty simple.
This seems basically right to me. That said, while it is predictable that the systems in question will be modular, what exact form that modularity takes is both environment-dependent and also path-dependent. Even in cases where the environmental pressures form a very strong attractor for a particular shape of solution, the “module divisions” can differ between species. For example, the pectoral fins of fish and the pectoral flippers of dolphins both fulfill similar roles. However, fish fins are basically a collection of straight, parallel fin rays made of bone or cartilage and connected to a base inside the body of the fish, and the muscles to control the movement of the fin are located within the body of the fish. By contrast, a dolphin’s flipper is derived from the foreleg of its tetrapod ancestor, and contains “fingers” which can be moved by muscles within the flipper.
So I think approaches that look like “find a structure that does a particular thing, and try to shape that structure in the way you want” are somewhat (though not necessarily entirely) doomed, because the pressures that determine which particular structure does a thing are not nearly so strong as the pressures that determine that some structure does the thing.