Not OP, but relevant—I spent the last ~6 months going to meetings with [biggest name at a top-20 ML university]’s group. He seems to me like a clearly very smart guy (and very generous in allowing me to join), but I thought it was quite striking that almost all his interests were questions of the form “I wonder if we can get a model to do x”, or “if we modify the training in way y, what will happen?” A few times I proposed projects about “maybe if we try z, we can figure out why b happens” and he was never very interested—a near exact quote of his in response was “even if we figured that out successfully, I don’t see anything new we could get [the model] to do”.
At one point I explicitly asked him about his lack of interest in a more general theory of what neural nets are good at and why—his response was roughly that he’s thought about it and the problem is too hard, comparing it to P=NP.
To be clear, I think he’s an exceptionally good ML researcher, but his vision of the field looks to me more like a naturalist studying behavior than a biologist studying anatomy, which is very different from what I expected (and from the standard my shoulder-John is holding people to).
This is mostly a gestalt sense from years of interacting with people in the space, so unrolling the full belief-production process into something legible would be a lot of work. But I can try a few sub-queries and give some initial answers.
Zeroth query: let’s try to query my intuition and articulate a little more clearly the kind of models which I think the median ML researcher doesn’t have. I think the core thing here is gears. Like, here’s a simple (not necessarily correct/incorrect) mental model of training of some random net:
We’re doing high dimensional optimization via gradient descent. The high dimensionality will typically make globally-suboptimal local minima rare, but high condition numbers quite common, so the main failure mode of the training process (other than fundamental limitations of the data or architecture) will be very slow convergence to minima along the bottom of long, thin “valleys” in the loss landscape.
That mental model immediately exposes a lot of gears. If that’s my mental model, and my training process is failing somehow, then I can go test that hypothesis via e.g. estimating the local condition number of the Hessian (this can be done in linear time, unlike calculation of the full Hessian), or by trying a type of optimizer suited to poor condition numbers (maybe conjugate gradient), or by looking for a “back-and-forth” pattern in the update steps; the model predicts that all those measurements will have highly correlated results. And if I do such measurements in a few contexts and find that the condition number is generally reasonable, or that it’s uncorrelated with how well training is going, then that would in-turn update a bunch of related things, like e.g. which aspects of the NTK model are likely to hold, or how directions in weight-space which embed human-intelligible concepts are likely to correspond to loss basin geometry. So we’ve got a mental model which involves lots of different measurements and related phenomena being tightly coupled epistemically. It makes a bunch of predictions about different things going on inside the black box of the training process and the network itself. That’s gearsiness.
(In analogy to the “dark room” example from the OP: for the person who “models the room as containing walls”, there’s tight coupling between a whole bunch of predictions involving running into something along a particular line where they expect a wall to be. If they reach toward a spot where they expect a wall, and feel nothing, then that’s a big update; maybe the wall ended! That, in turn, updates a bunch of other predictions about where the person will/won’t run into things. The model projects a bunch of internal structure into the literal black box of the room. That’s gearsiness. Contrast to the person who doesn’t model the room as containing walls: they don’t make a bunch of tightly coupled predictions, so they don’t update a bunch of related things when they hit a surprise.)
Now contrast the “high condition numbers” mental model to another (not necessarily correct/incorrect) mental model:
We’re doing optimization via gradient descent, so the main failure mode of the training process (other than fundamental limitations of the data or architecture) will be getting stuck in local minima which are not global minima (or close to them in performance).
This mental model exposes fewer gears. It allows basically one way to test the hypothesis: randomize to a new start location many times (or otherwise jump to a random location many times, as in e.g. simulated annealing), and see if training goes better. Based on this mental model in isolation, I don’t have a bunch of qualitatively different tests to run which I expect to yield highly correlated results. I don’t have a bunch of related things which update based on how the tests turn out. I don’t have predictions about what’s going on inside the magic box—there’s nothing analogous to e.g. “check the condition number of the Hessian”. So not much in terms of gears. (This “local minima” model could still be a component of a larger model with more gears in it, but few of those gears are in this model itself.)
So that’s the sort of thing I’m gesturing at. Again, note that it’s not about whether the model is true or false. It’s also not about how mathematically principled/justified the model is, though that does tend to correlate with gearsiness in practice.
Ok, on to the main question. First query: what are the general types of observations which served as input to my belief? Also maybe some concrete examples...
Taking ML courses back in the day, as well as occasionally looking at syllabi for more recent courses, gives some idea of both what noobs are learning and what more experienced people are trying to teach.
Let’s take this Udacity course as a prototypical example (it was the first one I saw; not super up-to-date or particularly advanced but I expect the points I make about it to generalize). Looks like it walks students through implementing and training some standard net types; pretty typical for a course IIUC. The closest thing to that course which I expect would install the sorts of models the OP talks about would be a project-based course in which students have to make up a new architecture, or a new training method, or some such, and figure out things like e.g. normalization, preprocessing, performance bottlenecks, how to distinguish different failure modes, etc—and the course would provide the background mental models people use for such things. That’s pretty centrally the kind of skill behind the example of Bengio et al’s paper from the OP, and it’s not something I’d expect someone to get from Udacity’s course based on the syllabus.
Reading papers and especially blog posts from people working in ML gives some sense of what mental models are common.
For instance, both the “local minima” and “high condition numbers” examples above are mental models which at least some people use.
Talking to people working on ML projects, e.g. in the lightcone office or during the MATS program.
Again, I often see peoples’ mental models in conversation.
Looking at peoples’ resumes/CVs.
By looking at peoples’ background, I can often rule out some common mental models—e.g. someone who doesn’t have much-if-any linear algebra background probably won’t understand-well-enough-to-measure gradient explosion, poorly conditioned basins in the loss landscape, low-rank components in a net, NTK-style approximations, etc (not that all of those are necessarily correct models). That doesn’t mean the person doesn’t have any gearsy mental models of nets, but it sure does rule a lot out and the remainder are much more limited.
Second query: any patterns which occasionally come up and can be especially revealing when they do?
If something unexpected happens, does the person typically have a hypothesis for why it happened, with some gears in it? Do they have the kind of hypotheses with sub-claims which can be tested by looking at internals of a system? If some of a model’s hypotheses turn out to be wrong, does that induce confusion about a bunch of other stuff?
Does the person’s model only engage with salient externally-visible knobs/features? A gearsy model typically points to specific internal structures as interesting things to examine (e.g. the Hessian condition number in the example earlier), which are not readily visible “externally”. If a model’s ontology only involves externally-visible behavior, then that usually means that it lacks gears.
Does the model sound like shallow babble? “Only engaging with salient externally-visible knobs/features” is one articulable sign of shallow babble, but probably my/your intuitions pick up on lots of other signs that we don’t yet know how to articulate.
These are all very much the kinds of patterns which come up in conversation and papers/blog posts.
Ok, that’s all the answer I have time for now. Not really a full answer to the question, but hopefully it gave some sense of where the intuition comes from.
Same, but I’m more skeptical. At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio. Also I’d guess that a lot of why the field is so empirical is less that nobody is able to form models, but rather that people have models, but rationally put more trust in empirical research methods than in their inside-view models. When I talked to the average ICML presenter they generally had some reason they expected their research to work, even if it was kind of fake.
Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio.
To be clear, I would also expect “well over 5%”. 10-20% feels about right. When I said in the OP that the median researcher lacks deep models, I really did mean the median, I was not trying to claim 90%+.
Re: the TRPO vs PPO example, I don’t think this is getting at the thing the OP is intended to be about. It’s not about how “well-justified” a technique is mathematically. It’s about models of what’s going wrong—in this case, something to do with large update steps messing things up. Like, imagine someone who sees their training run mysteriously failing and starts babbling random things like “well, maybe it’s getting stuck in local minima”, “maybe the network needs to be bigger”, “maybe I should adjust some hyperparameters”, and they try all these random things but they don’t have any way to go figure out what’s causing the problem, they just fiddle with whatever knobs are salient and available. That person probably never figures out TRPO or PPO, because they don’t figure out that too-large update steps are causing problems.
Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
This seems to also have happened in alignment, and I especially count RLHF here, and all the efforts to make AI nice, which I think show a pretty important point: Less justified/principled methods can and arguably do win over more principled methods like the embedded agency research, or a lot of decision theory research from MIRI, or the modern OAA plan from Davidad, or arguably ~all of the research that Lesswrong did pre 2014-2016.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
It’s like degrowth or dieting or veganism; people come up with a solution that makes things better but requires personal sacrifice and then make that solution a cornerstone of personal moral virtue. Once that’s your identity, any other solutions to the original problem are evil.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
I think this is kind of a non-sequitur and also wrong in multiple ways. Slowdown can give more time either for work like Davidad’s or improvements to RLHF-like techniques. Most of the AI safety people I know have actual models of why RLHF will stop working based on reasonable assumptions.
A basic fact about EA is that it’s super consequentialist and thus less susceptible to this “personal sacrifice = good” mistake than most other groups, and the AI alignment researchers who are not EAs are just normal ML researchers. Just look at the focus on cage-free campaigns over veganism, or earning-to-give. Not saying it’s impossible for AI safety researchers to make this mistake, but you have no reason to believe they are.
Placeholder response: this is mostly a gestalt sense from years of interacting with people in the space, so unrolling the full belief-production process into something legible would be a lot of work. I’ve started to write enough to give the general flavor, but it will probably be a few days before even that is ready. I will post another response-comment when it is.
Can I query you for the observations which produced this belief? (Not particularly skeptical, but would appreciate knowing why you think this.)
Not OP, but relevant—I spent the last ~6 months going to meetings with [biggest name at a top-20 ML university]’s group. He seems to me like a clearly very smart guy (and very generous in allowing me to join), but I thought it was quite striking that almost all his interests were questions of the form “I wonder if we can get a model to do x”, or “if we modify the training in way y, what will happen?” A few times I proposed projects about “maybe if we try z, we can figure out why b happens” and he was never very interested—a near exact quote of his in response was “even if we figured that out successfully, I don’t see anything new we could get [the model] to do”.
At one point I explicitly asked him about his lack of interest in a more general theory of what neural nets are good at and why—his response was roughly that he’s thought about it and the problem is too hard, comparing it to P=NP.
To be clear, I think he’s an exceptionally good ML researcher, but his vision of the field looks to me more like a naturalist studying behavior than a biologist studying anatomy, which is very different from what I expected (and from the standard my shoulder-John is holding people to).
EDITED—removed identity of Professor.
This is mostly a gestalt sense from years of interacting with people in the space, so unrolling the full belief-production process into something legible would be a lot of work. But I can try a few sub-queries and give some initial answers.
Zeroth query: let’s try to query my intuition and articulate a little more clearly the kind of models which I think the median ML researcher doesn’t have. I think the core thing here is gears. Like, here’s a simple (not necessarily correct/incorrect) mental model of training of some random net:
That mental model immediately exposes a lot of gears. If that’s my mental model, and my training process is failing somehow, then I can go test that hypothesis via e.g. estimating the local condition number of the Hessian (this can be done in linear time, unlike calculation of the full Hessian), or by trying a type of optimizer suited to poor condition numbers (maybe conjugate gradient), or by looking for a “back-and-forth” pattern in the update steps; the model predicts that all those measurements will have highly correlated results. And if I do such measurements in a few contexts and find that the condition number is generally reasonable, or that it’s uncorrelated with how well training is going, then that would in-turn update a bunch of related things, like e.g. which aspects of the NTK model are likely to hold, or how directions in weight-space which embed human-intelligible concepts are likely to correspond to loss basin geometry. So we’ve got a mental model which involves lots of different measurements and related phenomena being tightly coupled epistemically. It makes a bunch of predictions about different things going on inside the black box of the training process and the network itself. That’s gearsiness.
(In analogy to the “dark room” example from the OP: for the person who “models the room as containing walls”, there’s tight coupling between a whole bunch of predictions involving running into something along a particular line where they expect a wall to be. If they reach toward a spot where they expect a wall, and feel nothing, then that’s a big update; maybe the wall ended! That, in turn, updates a bunch of other predictions about where the person will/won’t run into things. The model projects a bunch of internal structure into the literal black box of the room. That’s gearsiness. Contrast to the person who doesn’t model the room as containing walls: they don’t make a bunch of tightly coupled predictions, so they don’t update a bunch of related things when they hit a surprise.)
Now contrast the “high condition numbers” mental model to another (not necessarily correct/incorrect) mental model:
This mental model exposes fewer gears. It allows basically one way to test the hypothesis: randomize to a new start location many times (or otherwise jump to a random location many times, as in e.g. simulated annealing), and see if training goes better. Based on this mental model in isolation, I don’t have a bunch of qualitatively different tests to run which I expect to yield highly correlated results. I don’t have a bunch of related things which update based on how the tests turn out. I don’t have predictions about what’s going on inside the magic box—there’s nothing analogous to e.g. “check the condition number of the Hessian”. So not much in terms of gears. (This “local minima” model could still be a component of a larger model with more gears in it, but few of those gears are in this model itself.)
So that’s the sort of thing I’m gesturing at. Again, note that it’s not about whether the model is true or false. It’s also not about how mathematically principled/justified the model is, though that does tend to correlate with gearsiness in practice.
Ok, on to the main question. First query: what are the general types of observations which served as input to my belief? Also maybe some concrete examples...
Taking ML courses back in the day, as well as occasionally looking at syllabi for more recent courses, gives some idea of both what noobs are learning and what more experienced people are trying to teach.
Let’s take this Udacity course as a prototypical example (it was the first one I saw; not super up-to-date or particularly advanced but I expect the points I make about it to generalize). Looks like it walks students through implementing and training some standard net types; pretty typical for a course IIUC. The closest thing to that course which I expect would install the sorts of models the OP talks about would be a project-based course in which students have to make up a new architecture, or a new training method, or some such, and figure out things like e.g. normalization, preprocessing, performance bottlenecks, how to distinguish different failure modes, etc—and the course would provide the background mental models people use for such things. That’s pretty centrally the kind of skill behind the example of Bengio et al’s paper from the OP, and it’s not something I’d expect someone to get from Udacity’s course based on the syllabus.
Reading papers and especially blog posts from people working in ML gives some sense of what mental models are common.
For instance, both the “local minima” and “high condition numbers” examples above are mental models which at least some people use.
Talking to people working on ML projects, e.g. in the lightcone office or during the MATS program.
Again, I often see peoples’ mental models in conversation.
Looking at peoples’ resumes/CVs.
By looking at peoples’ background, I can often rule out some common mental models—e.g. someone who doesn’t have much-if-any linear algebra background probably won’t understand-well-enough-to-measure gradient explosion, poorly conditioned basins in the loss landscape, low-rank components in a net, NTK-style approximations, etc (not that all of those are necessarily correct models). That doesn’t mean the person doesn’t have any gearsy mental models of nets, but it sure does rule a lot out and the remainder are much more limited.
Second query: any patterns which occasionally come up and can be especially revealing when they do?
If something unexpected happens, does the person typically have a hypothesis for why it happened, with some gears in it? Do they have the kind of hypotheses with sub-claims which can be tested by looking at internals of a system? If some of a model’s hypotheses turn out to be wrong, does that induce confusion about a bunch of other stuff?
Does the person’s model only engage with salient externally-visible knobs/features? A gearsy model typically points to specific internal structures as interesting things to examine (e.g. the Hessian condition number in the example earlier), which are not readily visible “externally”. If a model’s ontology only involves externally-visible behavior, then that usually means that it lacks gears.
Does the model sound like shallow babble? “Only engaging with salient externally-visible knobs/features” is one articulable sign of shallow babble, but probably my/your intuitions pick up on lots of other signs that we don’t yet know how to articulate.
These are all very much the kinds of patterns which come up in conversation and papers/blog posts.
Ok, that’s all the answer I have time for now. Not really a full answer to the question, but hopefully it gave some sense of where the intuition comes from.
Same, but I’m more skeptical. At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio. Also I’d guess that a lot of why the field is so empirical is less that nobody is able to form models, but rather that people have models, but rationally put more trust in empirical research methods than in their inside-view models. When I talked to the average ICML presenter they generally had some reason they expected their research to work, even if it was kind of fake.
Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
To be clear, I would also expect “well over 5%”. 10-20% feels about right. When I said in the OP that the median researcher lacks deep models, I really did mean the median, I was not trying to claim 90%+.
Re: the TRPO vs PPO example, I don’t think this is getting at the thing the OP is intended to be about. It’s not about how “well-justified” a technique is mathematically. It’s about models of what’s going wrong—in this case, something to do with large update steps messing things up. Like, imagine someone who sees their training run mysteriously failing and starts babbling random things like “well, maybe it’s getting stuck in local minima”, “maybe the network needs to be bigger”, “maybe I should adjust some hyperparameters”, and they try all these random things but they don’t have any way to go figure out what’s causing the problem, they just fiddle with whatever knobs are salient and available. That person probably never figures out TRPO or PPO, because they don’t figure out that too-large update steps are causing problems.
This seems to also have happened in alignment, and I especially count RLHF here, and all the efforts to make AI nice, which I think show a pretty important point: Less justified/principled methods can and arguably do win over more principled methods like the embedded agency research, or a lot of decision theory research from MIRI, or the modern OAA plan from Davidad, or arguably ~all of the research that Lesswrong did pre 2014-2016.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
Cf this tweet and the text below:
https://twitter.com/Rocketeer_99/status/1706057953524977740
I think this is kind of a non-sequitur and also wrong in multiple ways. Slowdown can give more time either for work like Davidad’s or improvements to RLHF-like techniques. Most of the AI safety people I know have actual models of why RLHF will stop working based on reasonable assumptions.
A basic fact about EA is that it’s super consequentialist and thus less susceptible to this “personal sacrifice = good” mistake than most other groups, and the AI alignment researchers who are not EAs are just normal ML researchers. Just look at the focus on cage-free campaigns over veganism, or earning-to-give. Not saying it’s impossible for AI safety researchers to make this mistake, but you have no reason to believe they are.
Placeholder response: this is mostly a gestalt sense from years of interacting with people in the space, so unrolling the full belief-production process into something legible would be a lot of work. I’ve started to write enough to give the general flavor, but it will probably be a few days before even that is ready. I will post another response-comment when it is.