A few points here (all with respect to a target of “find new approaches to core problems in AGI alignment”):
It’s not clear to me what the upside of the PhD structure is supposed to be here (beyond respectability). If the aim is to avoid being influenced by most of the incentives and environment, that’s more easily achieved by not doing a PhD. (to the extent that development of research ‘taste’/skill acts to service a publish-or-perish constraint, that’s likely to be harmful)
This is not to say that there’s nothing useful about an academic context—only that the sensible approach seems to be [create environments with some of the same upsides, but fewer downsides].
I can see a more persuasive upside where the PhD environment gives:
Access to deep expertise in some relevant field.
The freedom to explore openly (without any “publish or perish” constraint).
This seems likely to be both rare, and more likely for professors not doing ML. I note here that ML professors are currently not solving fundamental alignment problems—we’re not in a [Newtonian physics looking for Einstein] situation; more [Aristotelian physics looking for Einstein]. I can more easily imagine a mathematics PhD environment being useful than an ML one (though I’d expect this to be rare too).
This is also not to say that a PhD environment might not be useful in various other ways. For example, I think David Krueger’s lab has done and is doing a bunch of useful stuff—but it’s highly unlikely to uncover new approaches to core problems.
For example, of the 213 concrete problems posed here how many would lead us to think [it’s plausible that a good answer to this question leads to meaningful progress on core AGI alignment problems]? 5? 10? (many more can be a bit helpful for short-term safety)
There are a few where sufficiently general answers would be useful, but I don’t expect such generality—both since it’s hard, and because incentives constantly push towards [publish something on this local pattern], rather than [don’t waste time running and writing up experiments on this local pattern, but instead investigate underlying structure].
I note that David’s probably at the top of my list for [would be a good supervisor for this kind of thing, conditional on having agreed the exploratory aims at the outset], but the environment still seems likely to be not-close-to-optimal, since you’d be surrounded by people not doing such exploratory work.
I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)
I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is “related” in the sense of “might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds”.
A few points here (all with respect to a target of “find new approaches to core problems in AGI alignment”):
It’s not clear to me what the upside of the PhD structure is supposed to be here (beyond respectability). If the aim is to avoid being influenced by most of the incentives and environment, that’s more easily achieved by not doing a PhD. (to the extent that development of research ‘taste’/skill acts to service a publish-or-perish constraint, that’s likely to be harmful)
This is not to say that there’s nothing useful about an academic context—only that the sensible approach seems to be [create environments with some of the same upsides, but fewer downsides].
I can see a more persuasive upside where the PhD environment gives:
Access to deep expertise in some relevant field.
The freedom to explore openly (without any “publish or perish” constraint).
This seems likely to be both rare, and more likely for professors not doing ML. I note here that ML professors are currently not solving fundamental alignment problems—we’re not in a [Newtonian physics looking for Einstein] situation; more [Aristotelian physics looking for Einstein]. I can more easily imagine a mathematics PhD environment being useful than an ML one (though I’d expect this to be rare too).
This is also not to say that a PhD environment might not be useful in various other ways. For example, I think David Krueger’s lab has done and is doing a bunch of useful stuff—but it’s highly unlikely to uncover new approaches to core problems.
For example, of the 213 concrete problems posed here how many would lead us to think [it’s plausible that a good answer to this question leads to meaningful progress on core AGI alignment problems]? 5? 10? (many more can be a bit helpful for short-term safety)
There are a few where sufficiently general answers would be useful, but I don’t expect such generality—both since it’s hard, and because incentives constantly push towards [publish something on this local pattern], rather than [don’t waste time running and writing up experiments on this local pattern, but instead investigate underlying structure].
I note that David’s probably at the top of my list for [would be a good supervisor for this kind of thing, conditional on having agreed the exploratory aims at the outset], but the environment still seems likely to be not-close-to-optimal, since you’d be surrounded by people not doing such exploratory work.
I do think category theory professors or similar would be reasonable advisors for certain types of MIRI research.
I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)
I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is “related” in the sense of “might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds”.