I think this letter is “just wrong” in a number of frustrating ways.
A few points:
“Engineering doesn’t help unless one wants to do mechanistic interpretability.” This seems incredibly wrong. Engineering disciplines provide reasonable intuitions for how to reason about complex systems. Almost all engineering disciplines require their practitioners to think concretely. Software engineering in particular also lets you run experiments incredibly quickly, which makes it harder to be wrong.
ML theory in particular is in fact useful for reasoning about minds. This is not to say that cognitive science is not also useful. Further, being able to solve alignment in the current paradigm would mean we have excellent practice when encountering future paradigms.
It seems ridiculous to me to confidently claim that labs won’t care to implement a solution to alignment.
I think you should’ve spent more time developing some of these obvious cruxes, before implying that SERI MATS should change its behavior based on your conclusions. Implementing these changes would obviously have some costs for SERI MATS, and I suspect that SERI MATS organizers do not share your views on a number of these cruxes.
“Engineering doesn’t help unless one wants to do mechanistic interpretability.” This seems incredibly wrong. Engineering disciplines provide reasonable intuitions for how to reason about complex systems. Almost all engineering disciplines require their practitioners to think concretely. Software engineering in particular also lets you run experiments incredibly quickly, which makes it harder to be wrong.
I should have written “ML engineering” (I think it was not entirely clear from the context, fixed now). Knowing the general engineering methodology and the typical challenges in systems engineering for robustness and resilience is, of course, useful, and having visceral experience of these (e.g., engineering distributed systems, coding oneself bugs in the systems and seeing how they may fail in unexpected ways). But I would claim that learning this through practice, i.e., learning “from one’s own mistakes”, is again inefficient. Smart people learn from others’ mistakes. Just going through some of the materials from here would give alignment researchers much more useful insights than years of hands-on engineering practice[1]. Again, it’s an important qualification that we are talking about what’s effective for theoretical-ish alignment research, not actual engineering of (AGI) systems!
ML theory in particular is in fact useful for reasoning about minds. This is not to say that cognitive science is not also useful. Further, being able to solve alignment in the current paradigm would mean we have excellent practice when encountering future paradigms.
I don’t argue that ML theory is useless. I argue that going through ML courses that spend too much time on building basic MLP networks or random forests (and understanding the theory of these, though it’s minimal) is ineffective. I personally stay abreast of ML research by following MLST podcast (e.g., on spiking NNs, deep RL, Domingos on neurosymbolic and lots of other stuff, a series of interviews with people at Cohere: Hooker, Lewis, Grefenstette, etc.)
It seems ridiculous to me to confidently claim that labs won’t care to implement a solution to alignment.
This is not what I wrote. I wrote that they are not planning to “solve alignment once and forever” before deploying first AGI that will help them actually develop alignment and other adjacent sciences. This might sound ridiculous to you, but that’s what OpenAI and Conjecture say absolutely directly, and I suspect other labs thinking about it, too, though don’t pronounce it directly.
I did develop several databases and distributed systems over my 10-year-long engineering career and was also interested in resilience research and was reading about it, so I know what I’m talking about and can compare.
I wrote that they are not planning to “solve alignment once and forever” before deploying first AGI that will help them actually develop alignment and other adjacent sciences.
Surely this is because alignment is hard! Surely if alignment researchers really did find the ultimate solution to alignment and present it on a silver platter, the labs would use it.
Also: An explicit part of SERI MATS’ mission is to put alumni in orgs like Redwood and Anthropic AFAICT. (To the extent your post does this,) it’s plausibly a mistake to treat SERI MATS like an independent alignment research incubator.
MATS aims to find and accelerate alignment research talent, including:
Developing scholar research ability through curriculum elements focused on breadth, depth, and originality (the “T-model of research”);
Assisting scholars in producing impactful research through research mentorship, a community of collaborative peers, dedicated 1-1 support, and educational seminars;
Aiding the creation of impactful new alignment organizations (e.g., Jessica Rumbelow’s Leap Labs and Marius Hobbhahn’s Apollo Research);
Preparing scholars for impactful alignment research roles in existing organizations.
Not all alumni will end up in existing alignment research organizations immediately; some return to academia, pursue independent research, or potentially skill-up in industry (to eventually aid alignment research efforts). We generally aim to find talent with existing research ability and empower it to work on alignment, not necessarily through existing initiatives (though we certainly endorse many).
Yes, admittedly, there is much less strain on being very good at philosophy of science if you are going to work within a team with a clear agenda, particularly within AGI lab where the research agendas tend to be much more empirical than in “academic” orgs like MIRI or ARC. And thinking about research strategy is not the job of non-leading researchers at these orgs either, whereas independent researcher or researchers at more boutique labs have to think about their strategies by themselves. Founders of new orgs and labs have to think about their strategies very hard, too.
But preparing employees for OpenAI, Antrhopic, or DeepMind is clearly not the singular focus of SERI MATS.
Epistemic status: hasty, first pass
First of all thanks for writing this.
I think this letter is “just wrong” in a number of frustrating ways.
A few points:
“Engineering doesn’t help unless one wants to do mechanistic interpretability.” This seems incredibly wrong. Engineering disciplines provide reasonable intuitions for how to reason about complex systems. Almost all engineering disciplines require their practitioners to think concretely. Software engineering in particular also lets you run experiments incredibly quickly, which makes it harder to be wrong.
ML theory in particular is in fact useful for reasoning about minds. This is not to say that cognitive science is not also useful. Further, being able to solve alignment in the current paradigm would mean we have excellent practice when encountering future paradigms.
It seems ridiculous to me to confidently claim that labs won’t care to implement a solution to alignment.
I think you should’ve spent more time developing some of these obvious cruxes, before implying that SERI MATS should change its behavior based on your conclusions. Implementing these changes would obviously have some costs for SERI MATS, and I suspect that SERI MATS organizers do not share your views on a number of these cruxes.
I should have written “ML engineering” (I think it was not entirely clear from the context, fixed now). Knowing the general engineering methodology and the typical challenges in systems engineering for robustness and resilience is, of course, useful, and having visceral experience of these (e.g., engineering distributed systems, coding oneself bugs in the systems and seeing how they may fail in unexpected ways). But I would claim that learning this through practice, i.e., learning “from one’s own mistakes”, is again inefficient. Smart people learn from others’ mistakes. Just going through some of the materials from here would give alignment researchers much more useful insights than years of hands-on engineering practice[1]. Again, it’s an important qualification that we are talking about what’s effective for theoretical-ish alignment research, not actual engineering of (AGI) systems!
I don’t argue that ML theory is useless. I argue that going through ML courses that spend too much time on building basic MLP networks or random forests (and understanding the theory of these, though it’s minimal) is ineffective. I personally stay abreast of ML research by following MLST podcast (e.g., on spiking NNs, deep RL, Domingos on neurosymbolic and lots of other stuff, a series of interviews with people at Cohere: Hooker, Lewis, Grefenstette, etc.)
This is not what I wrote. I wrote that they are not planning to “solve alignment once and forever” before deploying first AGI that will help them actually develop alignment and other adjacent sciences. This might sound ridiculous to you, but that’s what OpenAI and Conjecture say absolutely directly, and I suspect other labs thinking about it, too, though don’t pronounce it directly.
I did develop several databases and distributed systems over my 10-year-long engineering career and was also interested in resilience research and was reading about it, so I know what I’m talking about and can compare.
Short on time. Will respond to last point.
Surely this is because alignment is hard! Surely if alignment researchers really did find the ultimate solution to alignment and present it on a silver platter, the labs would use it.
Also: An explicit part of SERI MATS’ mission is to put alumni in orgs like Redwood and Anthropic AFAICT. (To the extent your post does this,) it’s plausibly a mistake to treat SERI MATS like an independent alignment research incubator.
MATS aims to find and accelerate alignment research talent, including:
Developing scholar research ability through curriculum elements focused on breadth, depth, and originality (the “T-model of research”);
Assisting scholars in producing impactful research through research mentorship, a community of collaborative peers, dedicated 1-1 support, and educational seminars;
Aiding the creation of impactful new alignment organizations (e.g., Jessica Rumbelow’s Leap Labs and Marius Hobbhahn’s Apollo Research);
Preparing scholars for impactful alignment research roles in existing organizations.
Not all alumni will end up in existing alignment research organizations immediately; some return to academia, pursue independent research, or potentially skill-up in industry (to eventually aid alignment research efforts). We generally aim to find talent with existing research ability and empower it to work on alignment, not necessarily through existing initiatives (though we certainly endorse many).
Yes, admittedly, there is much less strain on being very good at philosophy of science if you are going to work within a team with a clear agenda, particularly within AGI lab where the research agendas tend to be much more empirical than in “academic” orgs like MIRI or ARC. And thinking about research strategy is not the job of non-leading researchers at these orgs either, whereas independent researcher or researchers at more boutique labs have to think about their strategies by themselves. Founders of new orgs and labs have to think about their strategies very hard, too.
But preparing employees for OpenAI, Antrhopic, or DeepMind is clearly not the singular focus of SERI MATS.