Ok I want to just lay out what I’m trying to do here, and why, because it could be based on false assumptions.
A main assumption I’m making, which totally could be false, is that your paragraph
Funders of independent researchers we’ve interviewed think that there are plenty of talented applicants, but would prefer more research proposals focused on relatively few existing promising research directions (e.g., Open Phil RFPs, MATS mentors’ agendas), rather than a profusion of speculative new agendas.
is generally representative of the entire landscape, with a few small-ish exceptions. In other words, I’m assuming that it’s pretty difficult for a young smart person to show up and say “hey, I want to spend 3 whole years thinking about this problem de novo, can I have one year’s salary and a reevaluation after 1 year for a renewal”.
A main assumption that motivates what I’m doing here, and that could be false, is:
Funders make decisions mostly by some combination of recommendations from people they trust. The trust might be personal, or might be based on accomplishments, or might be based on some arguments made by the trusted person to the funder—and, centrally, the trust is actually derived from a loose diffuse array of impressions coming from the community, broadly.
To make the assumption slightly more clear: The assumption says that it’s actually quite common, maybe even the single dominant way funders make decisions, for the causality of a decision to flow through literally thousands of little interactions, where the little interactions communicate “I think XYZ is Important/Unimportant”. And these aggregate up into a general sense of importance/unimportance, or something. And then funding decisions work with two filters:
The explicit reasoning about the details—is this person qualified, how much funding, what’s the feedback, who endorses it, etc etc.
The implicit filter of Un/Importance. This doesn’t get raised to attention usually. It’s just in the background.
And “fund a smart motivated youngster without a plan for 3 years with little evaluation” is “unimportant”. And this unimportance is implicitly but strongly reinforced by everyone talking about in-paradigm stuff. And the situation is self-reinforcing because youngsters mostly don’t try to do the thing, because there’s no narrative and no funding, and so it is actually true that there aren’t many smart motivated youngsters just waiting for some funding to do trailblazing.
If my assumptions are true, then IDK what to do about this but would say that at least
people should be aware of this situation, and
people should keep talking about this situation, especially in contexts where they are contributing to the loose diffuse array of impressions by contributing to framing about what AGI alignment needs.
An interesting note: I don’t necessarily want to start a debate about the merits of academia, but “fund a smart motivated youngster without a plan for 3 years with little evaluation” sounds a lot like “fund more exploratory AI safety PhDs” to me. If anyone wants to do an AI safety PhD (e.g., with these supervisors) and needs funding, I’m happy to evaluate these with my Manifund Regrantor hat on.
That would only work for people with the capacity to not give a fuck what anyone around them thinks, especially including the person funding and advising them. And that’s arguably unethical depending on context.
You’ll also have an unusual degree of autonomy: You’re basically guaranteed funding and a moderately supportive environment for 3-5 years, and if you have a hands-off advisor you can work on pretty much any research topic. This is enough time to try two or more ambitious and risky agendas.
Ex ante funding guarantees, like The Vitalik Buterin PhD Fellowship in AI Existential Safety or Manifund or other funders, mitigate my concerns around overly steering exploratory research. Also, if one is worried about culture/priority drift, there are several AI safety offices in Berkeley, Boston, London, etc. where one could complete their PhD while surrounded by AI safety professionals (which I believe was one of the main benefits of the late Lightcone office).
Moreover, the program guarantees at least some mentorship from your supervisor. Your advisor’s incentives are reasonably aligned with yours: they get judged by your success in general, so want to see you publish well-recognized first-author research, land a top research job after graduation and generally make a name for yourself (and by extension, them).
Doing a PhD also pushes you to learn how to communicate with the broader ML research community. The “publish or perish″ imperative means you’ll get good at writing conference papers and defending your work.
These would be exactly the “anyone around them” about whose opinion they would have to not give a fuck.
I don’t know a good way to do this, but maybe a pointer would be: funders should explicitly state something to the effect of:
“The purpose of this PhD funding is to find new approaches to core problems in AGI alignment. Success in this goal can’t be judged by an existing academic structure (journals, conferences, peer-review, professors) because there does not exist such a structure aimed at the core problems in AGI alignment. You may if you wish make it a major goal of yours to produce output that is well-received by some group in academia, but be aware that this goal would be non-overlapping with the purpose of this PhD funding.”
The Vitalik fellowship says:
To be eligible, applicants should either be graduate students or be applying to PhD programs. Funding is conditional on being accepted to a PhD program, working on AI existential safety research, and having an advisor who can confirm to us that they will support the student’s work on AI existential safety research.
Despite being an extremely reasonable (even necessary) requirement, this is already a major problem according to me. The problem is that (IIUC—not sure) academics are incentivized to, basically, be dishonest, if it gets them funding for projects / students. Of the ~dozen professors here (https://futureoflife.org/about-us/our-people/ai-existential-safety-community/) who I’m at least a tiny bit familiar with, I think maybe 1.5ish are actually going to happily support actually-exploratory PhD students. I could be wrong about this though—curious for more data either way. And how many will successfully communicate to the sort of person who would take a real shot at exploratory conceptual research if given the opportunity to do such research that they would in fact support that? I don’t know. Zero? One? And how would someone sent to the FLI page know of the existence of that professor?
Fellows are expected to participate in annual workshops and other activities that will be organized to help them interact and network with other researchers in the field.
Continued funding is contingent on continued eligibility, demonstrated by submitting a brief (~1 page) progress report by July 1st of each year.
Again, reasonable, but… Needs more clarity on what is expected, and what is not expected.
a technical specification of the proposed research
What does this even mean? This webpage doesn’t get it. We’re trying to buy something that isn’t something someone can already write a technical specification of.
I want to sidestep critique of “more exploratory AI safety PhDs” for a moment and ask: why doesn’t MIRI sponsor high-calibre young researchers with a 1-3 year basic stipend and mentorship? And why did MIRI let Vivek’s team go?
I don’t speak for MIRI, but broadly I think MIRI thinks that roughly no existing research is hopeworthy, and that this isn’t likely to change soon. I think that, anyway.
In discussions like this one, I’m conditioning on something like “it’s worth it, these days, to directly try to solve AGI alignment”. That seems assumed in the post, seems assumed in lots of these discussions, seems assumed by lots of funders, and it’s why above I wrote “the main direct help we can give to AGI alignment” rather than something stronger like “the main help (simpliciter) we can give to AGI alignment” or “the main way we can decrease X-risk”.
I’m reading this as you saying something like “I’m trying to build a practical org that successfully onramps people into doing useful work. I can’t actually do that for arbitrary domains that people aren’t providing funding for. I’m trying to solve one particular part of the problem and that’s hard enough as it is.”
Yes to all this, but also I’ll go one level deeper. Even if I had tons more Manifund money to give out (and assuming all the talent needs discussed in the report are saturated with funding), it’s not immediately clear to me that “giving 1-3 year stipends to high-calibre young researchers, no questions asked” is the right play if they don’t have adequate mentorship, the ability to generate useful feedback loops, researcher support systems, access to frontier models if necessary, etc.
A few points here (all with respect to a target of “find new approaches to core problems in AGI alignment”):
It’s not clear to me what the upside of the PhD structure is supposed to be here (beyond respectability). If the aim is to avoid being influenced by most of the incentives and environment, that’s more easily achieved by not doing a PhD. (to the extent that development of research ‘taste’/skill acts to service a publish-or-perish constraint, that’s likely to be harmful)
This is not to say that there’s nothing useful about an academic context—only that the sensible approach seems to be [create environments with some of the same upsides, but fewer downsides].
I can see a more persuasive upside where the PhD environment gives:
Access to deep expertise in some relevant field.
The freedom to explore openly (without any “publish or perish” constraint).
This seems likely to be both rare, and more likely for professors not doing ML. I note here that ML professors are currently not solving fundamental alignment problems—we’re not in a [Newtonian physics looking for Einstein] situation; more [Aristotelian physics looking for Einstein]. I can more easily imagine a mathematics PhD environment being useful than an ML one (though I’d expect this to be rare too).
This is also not to say that a PhD environment might not be useful in various other ways. For example, I think David Krueger’s lab has done and is doing a bunch of useful stuff—but it’s highly unlikely to uncover new approaches to core problems.
For example, of the 213 concrete problems posed here how many would lead us to think [it’s plausible that a good answer to this question leads to meaningful progress on core AGI alignment problems]? 5? 10? (many more can be a bit helpful for short-term safety)
There are a few where sufficiently general answers would be useful, but I don’t expect such generality—both since it’s hard, and because incentives constantly push towards [publish something on this local pattern], rather than [don’t waste time running and writing up experiments on this local pattern, but instead investigate underlying structure].
I note that David’s probably at the top of my list for [would be a good supervisor for this kind of thing, conditional on having agreed the exploratory aims at the outset], but the environment still seems likely to be not-close-to-optimal, since you’d be surrounded by people not doing such exploratory work.
I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)
I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is “related” in the sense of “might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds”.
Have you looked through the FLI faculty listed there? How many seem useful supervisors for this kind of thing? Why?
If we’re sticking to the [generate new approaches to core problems] aim, I can see three or four I’d be happy to recommend, conditional on their agreeing upfront to the exploratory goals, and that publication would not be necessary (or a very low concrete number agreed upon).
There are about ten more that seem not-obviously-a-terrible-idea, but probably not great (e.g. those who I expect have a decent understanding of the core problems, but basically aren’t working on them).
The majority don’t write anything that suggests they know what the core problems are.
For almost all of these supervisors, doing a PhD would seem to provide quite a few constraints, undesirable incentives, and an environment that’s poor. From an individual’s point of view this can still make sense, if it’s one of the only ways to get stable medium-term funding. From a funder’s point of view, it seems nuts. (again, less nuts if the goal were [incremental progress on prosaic approaches, and generation of a respectable publication record])
Ok I want to just lay out what I’m trying to do here, and why, because it could be based on false assumptions.
A main assumption I’m making, which totally could be false, is that your paragraph
is generally representative of the entire landscape, with a few small-ish exceptions. In other words, I’m assuming that it’s pretty difficult for a young smart person to show up and say “hey, I want to spend 3 whole years thinking about this problem de novo, can I have one year’s salary and a reevaluation after 1 year for a renewal”.
A main assumption that motivates what I’m doing here, and that could be false, is:
To make the assumption slightly more clear: The assumption says that it’s actually quite common, maybe even the single dominant way funders make decisions, for the causality of a decision to flow through literally thousands of little interactions, where the little interactions communicate “I think XYZ is Important/Unimportant”. And these aggregate up into a general sense of importance/unimportance, or something. And then funding decisions work with two filters:
The explicit reasoning about the details—is this person qualified, how much funding, what’s the feedback, who endorses it, etc etc.
The implicit filter of Un/Importance. This doesn’t get raised to attention usually. It’s just in the background.
And “fund a smart motivated youngster without a plan for 3 years with little evaluation” is “unimportant”. And this unimportance is implicitly but strongly reinforced by everyone talking about in-paradigm stuff. And the situation is self-reinforcing because youngsters mostly don’t try to do the thing, because there’s no narrative and no funding, and so it is actually true that there aren’t many smart motivated youngsters just waiting for some funding to do trailblazing.
If my assumptions are true, then IDK what to do about this but would say that at least
people should be aware of this situation, and
people should keep talking about this situation, especially in contexts where they are contributing to the loose diffuse array of impressions by contributing to framing about what AGI alignment needs.
An interesting note: I don’t necessarily want to start a debate about the merits of academia, but “fund a smart motivated youngster without a plan for 3 years with little evaluation” sounds a lot like “fund more exploratory AI safety PhDs” to me. If anyone wants to do an AI safety PhD (e.g., with these supervisors) and needs funding, I’m happy to evaluate these with my Manifund Regrantor hat on.
That would only work for people with the capacity to not give a fuck what anyone around them thinks, especially including the person funding and advising them. And that’s arguably unethical depending on context.
I like Adam’s description of an exploratory AI safety PhD:
Ex ante funding guarantees, like The Vitalik Buterin PhD Fellowship in AI Existential Safety or Manifund or other funders, mitigate my concerns around overly steering exploratory research. Also, if one is worried about culture/priority drift, there are several AI safety offices in Berkeley, Boston, London, etc. where one could complete their PhD while surrounded by AI safety professionals (which I believe was one of the main benefits of the late Lightcone office).
From the section you linked:
These would be exactly the “anyone around them” about whose opinion they would have to not give a fuck.
I don’t know a good way to do this, but maybe a pointer would be: funders should explicitly state something to the effect of:
“The purpose of this PhD funding is to find new approaches to core problems in AGI alignment. Success in this goal can’t be judged by an existing academic structure (journals, conferences, peer-review, professors) because there does not exist such a structure aimed at the core problems in AGI alignment. You may if you wish make it a major goal of yours to produce output that is well-received by some group in academia, but be aware that this goal would be non-overlapping with the purpose of this PhD funding.”
The Vitalik fellowship says:
Despite being an extremely reasonable (even necessary) requirement, this is already a major problem according to me. The problem is that (IIUC—not sure) academics are incentivized to, basically, be dishonest, if it gets them funding for projects / students. Of the ~dozen professors here (https://futureoflife.org/about-us/our-people/ai-existential-safety-community/) who I’m at least a tiny bit familiar with, I think maybe 1.5ish are actually going to happily support actually-exploratory PhD students. I could be wrong about this though—curious for more data either way. And how many will successfully communicate to the sort of person who would take a real shot at exploratory conceptual research if given the opportunity to do such research that they would in fact support that? I don’t know. Zero? One? And how would someone sent to the FLI page know of the existence of that professor?
Again, reasonable, but… Needs more clarity on what is expected, and what is not expected.
What does this even mean? This webpage doesn’t get it. We’re trying to buy something that isn’t something someone can already write a technical specification of.
I want to sidestep critique of “more exploratory AI safety PhDs” for a moment and ask: why doesn’t MIRI sponsor high-calibre young researchers with a 1-3 year basic stipend and mentorship? And why did MIRI let Vivek’s team go?
I don’t speak for MIRI, but broadly I think MIRI thinks that roughly no existing research is hopeworthy, and that this isn’t likely to change soon. I think that, anyway.
In discussions like this one, I’m conditioning on something like “it’s worth it, these days, to directly try to solve AGI alignment”. That seems assumed in the post, seems assumed in lots of these discussions, seems assumed by lots of funders, and it’s why above I wrote “the main direct help we can give to AGI alignment” rather than something stronger like “the main help (simpliciter) we can give to AGI alignment” or “the main way we can decrease X-risk”.
I’m reading this as you saying something like “I’m trying to build a practical org that successfully onramps people into doing useful work. I can’t actually do that for arbitrary domains that people aren’t providing funding for. I’m trying to solve one particular part of the problem and that’s hard enough as it is.”
Is that roughly right?
Fwiw I appreciate your Manifund regrantor Request for Proposals announcement.
I’ll probably have more thoughts later.
Yes to all this, but also I’ll go one level deeper. Even if I had tons more Manifund money to give out (and assuming all the talent needs discussed in the report are saturated with funding), it’s not immediately clear to me that “giving 1-3 year stipends to high-calibre young researchers, no questions asked” is the right play if they don’t have adequate mentorship, the ability to generate useful feedback loops, researcher support systems, access to frontier models if necessary, etc.
A few points here (all with respect to a target of “find new approaches to core problems in AGI alignment”):
It’s not clear to me what the upside of the PhD structure is supposed to be here (beyond respectability). If the aim is to avoid being influenced by most of the incentives and environment, that’s more easily achieved by not doing a PhD. (to the extent that development of research ‘taste’/skill acts to service a publish-or-perish constraint, that’s likely to be harmful)
This is not to say that there’s nothing useful about an academic context—only that the sensible approach seems to be [create environments with some of the same upsides, but fewer downsides].
I can see a more persuasive upside where the PhD environment gives:
Access to deep expertise in some relevant field.
The freedom to explore openly (without any “publish or perish” constraint).
This seems likely to be both rare, and more likely for professors not doing ML. I note here that ML professors are currently not solving fundamental alignment problems—we’re not in a [Newtonian physics looking for Einstein] situation; more [Aristotelian physics looking for Einstein]. I can more easily imagine a mathematics PhD environment being useful than an ML one (though I’d expect this to be rare too).
This is also not to say that a PhD environment might not be useful in various other ways. For example, I think David Krueger’s lab has done and is doing a bunch of useful stuff—but it’s highly unlikely to uncover new approaches to core problems.
For example, of the 213 concrete problems posed here how many would lead us to think [it’s plausible that a good answer to this question leads to meaningful progress on core AGI alignment problems]? 5? 10? (many more can be a bit helpful for short-term safety)
There are a few where sufficiently general answers would be useful, but I don’t expect such generality—both since it’s hard, and because incentives constantly push towards [publish something on this local pattern], rather than [don’t waste time running and writing up experiments on this local pattern, but instead investigate underlying structure].
I note that David’s probably at the top of my list for [would be a good supervisor for this kind of thing, conditional on having agreed the exploratory aims at the outset], but the environment still seems likely to be not-close-to-optimal, since you’d be surrounded by people not doing such exploratory work.
I do think category theory professors or similar would be reasonable advisors for certain types of MIRI research.
I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)
I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is “related” in the sense of “might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds”.
Have you looked through the FLI faculty listed there?
How many seem useful supervisors for this kind of thing? Why?
If we’re sticking to the [generate new approaches to core problems] aim, I can see three or four I’d be happy to recommend, conditional on their agreeing upfront to the exploratory goals, and that publication would not be necessary (or a very low concrete number agreed upon).
There are about ten more that seem not-obviously-a-terrible-idea, but probably not great (e.g. those who I expect have a decent understanding of the core problems, but basically aren’t working on them).
The majority don’t write anything that suggests they know what the core problems are.
For almost all of these supervisors, doing a PhD would seem to provide quite a few constraints, undesirable incentives, and an environment that’s poor.
From an individual’s point of view this can still make sense, if it’s one of the only ways to get stable medium-term funding.
From a funder’s point of view, it seems nuts.
(again, less nuts if the goal were [incremental progress on prosaic approaches, and generation of a respectable publication record])