Yes, there is more than unwieldiness at play here. If we retitled the post “Hiring needs in on-paradigm technical AI safety,” (which does seem unwieldy and introduces an unneeded concept, IMO) this seems like it would work at cross purposes to our (now explicit) claim, “there are few opportunities for those pursuing non-prosaic, theoretical AI safety research.” I think it benefits no-one to make false or misleading claims about the current job market for non-prosaic, theoretical AI safety research (not that I think you are doing this; I just want our report to be clear). If anyone doesn’t like this fact about the world, I encourage them to do something about it! (E.g., found organizations, support mentees, publish concrete agendas, petition funders to change priorities.)
As indicated by MATS’ portfolio over research agendas, our revealed preferences largely disagree with point 1 (we definitely want to continue supporting novel ideas too, constraints permitting, but we aren’t Refine). Among other objectives, this report aims to show a flaw in the plan for point 2: high-caliber newcomers have few mentorship, job, or funding opportunities to mature as non-prosaic, theoretical technical AI safety researchers and the lead time for impactful Connectors is long. We welcome discussion on how to improve paths-to-impact for the many aspiring Connectors and theoretical AI safety researchers.
I agree with Tsvi here (as I’m sure will shock you :)).
I’d make a few points:
“our revealed preferences largely disagree with point 1”—this isn’t clear at all. We know MATS’ [preferences, given the incentives and constraints under which MATS operates]. We don’t know what you’d do absent such incentives and constraints.
I note also that “but we aren’t Refine” has the form [but we’re not doing x], rather than [but we have good reasons not to do x]. (I don’t think MATS should be Refine, but “we’re not currently 20% Refine-on-ramp” is no argument that it wouldn’t be a good idea)
MATS is in a stronger position than most to exert influence on the funding landscape. Sure, others should make this case too, but MATS should be actively making a case for what seems most important (to you, that is), not only catering to the current market.
Granted, this is complicated by MATS’ own funding constraints—you have more to lose too (and I do think this is a serious factor, undesirable as it might be).
If you believe that the current direction of the field isn’t great, then “ensure that our program continues to meet the talent needs of safety teams” is simply the wrong goal.
Of course the right goal isn’t diametrically opposed to that—but still, not that.
There’s little reason to expect the current direction of the field to be close to ideal:
At best, the accuracy of the field’s collective direction will tend to correspond to its collective understanding—which is low.
There are huge commercial incentives exerting influence.
There’s no clarity on what constitutes (progress towards) genuine impact.
There are many incentives to work on what’s already not neglected (e.g. things with easily located “tight empirical feedback loops”). The desirable properties of the non-neglected directions are a large part of the reason they’re not neglected.
Similar arguments apply to [field-level self-correction mechanisms].
Given (4), there’s an inherent sampling bias in taking [needs of current field] as [what MATS should provide]. Of course there’s still an efficiency upside in catering to [needs of current field] to a large extent—but efficiently heading in a poor direction still sucks.
I think it’s instructive to consider extreme-field-composition thought experiments: suppose the field were composed of [10,000 researchers doing mech interp] [10 researchers doing agent foundations].
Where would there be most jobs? Most funding? Most concrete ideas for further work? Does it follow that MATS would focus almost entirely on meeting the needs of all the mech interp orgs? (I expect that almost all the researchers in that scenario would claim mech interp is the most promising direction)
If you think that feedback loops along the lines of [[fast legible work on x] --> [x seems productive] --> [more people fund and work on x]] lead to desirable field dynamics in an AIS context, then it may make sense to cater to the current market. (personally, I expect this to give a systematically poor signal, but it’s not as though it’s easy to find good signals)
If you don’t expect such dynamics to end well, it’s worth considering to what extent MATS can be a field-level self-correction mechanism, rather than a contributor to predictably undesirable dynamics.
I’m not claiming this is easy!!
I’m claiming that it should be tried.
Detailing what job and funding opportunities should exist in the technical AI safety field is beyond the scope of this report.
Understandable, but do you know anyone who’s considering this? As the core of their job, I mean—not on a [something they occasionally think/talk about for a couple of hours] level. It’s non-obvious to me that anyone at OpenPhil has time for this.
It seems to me that the collective ‘decision’ we’ve made here is something like:
Any person/team doing this job would need:
Extremely good AIS understanding.
To be broadly respected.
Have a lot of time.
Nobody like this exists.
We’ll just hope things work out okay using a passive distributed approach.
To my eye this leads to a load of narrow optimization according to often-not-particularly-enlightened metrics—lots of common incentives, common metrics, and correlated failure.
Oh and I still think MATS is great :) - and that most of these issues are only solvable with appropriate downstream funding landscape alterations. That said, I remain hopeful that MATS can nudge things in a helpful direction.
I plan to respond regarding MATS’ future priorities when I’m able (I can’t speak on behalf of MATS alone here and we are currently examining priorities in the lead up to our Winter 2024-25 Program), but in the meantime I’ve added some requests for proposals to my Manifund Regrantor profile.
RFPs seem a good tool here for sure. Other coordination mechanisms too. (And perhaps RFPs for RFPs, where sketching out high-level desiderata is easier than specifying parameters for [type of concrete project you’d like to see])
Oh and I think the MATS Winter Retrospective seems great from the [measure a whole load of stuff] perspective. I think it’s non-obvious what conclusions to draw, but more data is a good starting point. It’s on my to-do-list to read it carefully and share some thoughts.
Ok I want to just lay out what I’m trying to do here, and why, because it could be based on false assumptions.
A main assumption I’m making, which totally could be false, is that your paragraph
Funders of independent researchers we’ve interviewed think that there are plenty of talented applicants, but would prefer more research proposals focused on relatively few existing promising research directions (e.g., Open Phil RFPs, MATS mentors’ agendas), rather than a profusion of speculative new agendas.
is generally representative of the entire landscape, with a few small-ish exceptions. In other words, I’m assuming that it’s pretty difficult for a young smart person to show up and say “hey, I want to spend 3 whole years thinking about this problem de novo, can I have one year’s salary and a reevaluation after 1 year for a renewal”.
A main assumption that motivates what I’m doing here, and that could be false, is:
Funders make decisions mostly by some combination of recommendations from people they trust. The trust might be personal, or might be based on accomplishments, or might be based on some arguments made by the trusted person to the funder—and, centrally, the trust is actually derived from a loose diffuse array of impressions coming from the community, broadly.
To make the assumption slightly more clear: The assumption says that it’s actually quite common, maybe even the single dominant way funders make decisions, for the causality of a decision to flow through literally thousands of little interactions, where the little interactions communicate “I think XYZ is Important/Unimportant”. And these aggregate up into a general sense of importance/unimportance, or something. And then funding decisions work with two filters:
The explicit reasoning about the details—is this person qualified, how much funding, what’s the feedback, who endorses it, etc etc.
The implicit filter of Un/Importance. This doesn’t get raised to attention usually. It’s just in the background.
And “fund a smart motivated youngster without a plan for 3 years with little evaluation” is “unimportant”. And this unimportance is implicitly but strongly reinforced by everyone talking about in-paradigm stuff. And the situation is self-reinforcing because youngsters mostly don’t try to do the thing, because there’s no narrative and no funding, and so it is actually true that there aren’t many smart motivated youngsters just waiting for some funding to do trailblazing.
If my assumptions are true, then IDK what to do about this but would say that at least
people should be aware of this situation, and
people should keep talking about this situation, especially in contexts where they are contributing to the loose diffuse array of impressions by contributing to framing about what AGI alignment needs.
An interesting note: I don’t necessarily want to start a debate about the merits of academia, but “fund a smart motivated youngster without a plan for 3 years with little evaluation” sounds a lot like “fund more exploratory AI safety PhDs” to me. If anyone wants to do an AI safety PhD (e.g., with these supervisors) and needs funding, I’m happy to evaluate these with my Manifund Regrantor hat on.
That would only work for people with the capacity to not give a fuck what anyone around them thinks, especially including the person funding and advising them. And that’s arguably unethical depending on context.
You’ll also have an unusual degree of autonomy: You’re basically guaranteed funding and a moderately supportive environment for 3-5 years, and if you have a hands-off advisor you can work on pretty much any research topic. This is enough time to try two or more ambitious and risky agendas.
Ex ante funding guarantees, like The Vitalik Buterin PhD Fellowship in AI Existential Safety or Manifund or other funders, mitigate my concerns around overly steering exploratory research. Also, if one is worried about culture/priority drift, there are several AI safety offices in Berkeley, Boston, London, etc. where one could complete their PhD while surrounded by AI safety professionals (which I believe was one of the main benefits of the late Lightcone office).
Moreover, the program guarantees at least some mentorship from your supervisor. Your advisor’s incentives are reasonably aligned with yours: they get judged by your success in general, so want to see you publish well-recognized first-author research, land a top research job after graduation and generally make a name for yourself (and by extension, them).
Doing a PhD also pushes you to learn how to communicate with the broader ML research community. The “publish or perish″ imperative means you’ll get good at writing conference papers and defending your work.
These would be exactly the “anyone around them” about whose opinion they would have to not give a fuck.
I don’t know a good way to do this, but maybe a pointer would be: funders should explicitly state something to the effect of:
“The purpose of this PhD funding is to find new approaches to core problems in AGI alignment. Success in this goal can’t be judged by an existing academic structure (journals, conferences, peer-review, professors) because there does not exist such a structure aimed at the core problems in AGI alignment. You may if you wish make it a major goal of yours to produce output that is well-received by some group in academia, but be aware that this goal would be non-overlapping with the purpose of this PhD funding.”
The Vitalik fellowship says:
To be eligible, applicants should either be graduate students or be applying to PhD programs. Funding is conditional on being accepted to a PhD program, working on AI existential safety research, and having an advisor who can confirm to us that they will support the student’s work on AI existential safety research.
Despite being an extremely reasonable (even necessary) requirement, this is already a major problem according to me. The problem is that (IIUC—not sure) academics are incentivized to, basically, be dishonest, if it gets them funding for projects / students. Of the ~dozen professors here (https://futureoflife.org/about-us/our-people/ai-existential-safety-community/) who I’m at least a tiny bit familiar with, I think maybe 1.5ish are actually going to happily support actually-exploratory PhD students. I could be wrong about this though—curious for more data either way. And how many will successfully communicate to the sort of person who would take a real shot at exploratory conceptual research if given the opportunity to do such research that they would in fact support that? I don’t know. Zero? One? And how would someone sent to the FLI page know of the existence of that professor?
Fellows are expected to participate in annual workshops and other activities that will be organized to help them interact and network with other researchers in the field.
Continued funding is contingent on continued eligibility, demonstrated by submitting a brief (~1 page) progress report by July 1st of each year.
Again, reasonable, but… Needs more clarity on what is expected, and what is not expected.
a technical specification of the proposed research
What does this even mean? This webpage doesn’t get it. We’re trying to buy something that isn’t something someone can already write a technical specification of.
I want to sidestep critique of “more exploratory AI safety PhDs” for a moment and ask: why doesn’t MIRI sponsor high-calibre young researchers with a 1-3 year basic stipend and mentorship? And why did MIRI let Vivek’s team go?
I don’t speak for MIRI, but broadly I think MIRI thinks that roughly no existing research is hopeworthy, and that this isn’t likely to change soon. I think that, anyway.
In discussions like this one, I’m conditioning on something like “it’s worth it, these days, to directly try to solve AGI alignment”. That seems assumed in the post, seems assumed in lots of these discussions, seems assumed by lots of funders, and it’s why above I wrote “the main direct help we can give to AGI alignment” rather than something stronger like “the main help (simpliciter) we can give to AGI alignment” or “the main way we can decrease X-risk”.
I’m reading this as you saying something like “I’m trying to build a practical org that successfully onramps people into doing useful work. I can’t actually do that for arbitrary domains that people aren’t providing funding for. I’m trying to solve one particular part of the problem and that’s hard enough as it is.”
Yes to all this, but also I’ll go one level deeper. Even if I had tons more Manifund money to give out (and assuming all the talent needs discussed in the report are saturated with funding), it’s not immediately clear to me that “giving 1-3 year stipends to high-calibre young researchers, no questions asked” is the right play if they don’t have adequate mentorship, the ability to generate useful feedback loops, researcher support systems, access to frontier models if necessary, etc.
A few points here (all with respect to a target of “find new approaches to core problems in AGI alignment”):
It’s not clear to me what the upside of the PhD structure is supposed to be here (beyond respectability). If the aim is to avoid being influenced by most of the incentives and environment, that’s more easily achieved by not doing a PhD. (to the extent that development of research ‘taste’/skill acts to service a publish-or-perish constraint, that’s likely to be harmful)
This is not to say that there’s nothing useful about an academic context—only that the sensible approach seems to be [create environments with some of the same upsides, but fewer downsides].
I can see a more persuasive upside where the PhD environment gives:
Access to deep expertise in some relevant field.
The freedom to explore openly (without any “publish or perish” constraint).
This seems likely to be both rare, and more likely for professors not doing ML. I note here that ML professors are currently not solving fundamental alignment problems—we’re not in a [Newtonian physics looking for Einstein] situation; more [Aristotelian physics looking for Einstein]. I can more easily imagine a mathematics PhD environment being useful than an ML one (though I’d expect this to be rare too).
This is also not to say that a PhD environment might not be useful in various other ways. For example, I think David Krueger’s lab has done and is doing a bunch of useful stuff—but it’s highly unlikely to uncover new approaches to core problems.
For example, of the 213 concrete problems posed here how many would lead us to think [it’s plausible that a good answer to this question leads to meaningful progress on core AGI alignment problems]? 5? 10? (many more can be a bit helpful for short-term safety)
There are a few where sufficiently general answers would be useful, but I don’t expect such generality—both since it’s hard, and because incentives constantly push towards [publish something on this local pattern], rather than [don’t waste time running and writing up experiments on this local pattern, but instead investigate underlying structure].
I note that David’s probably at the top of my list for [would be a good supervisor for this kind of thing, conditional on having agreed the exploratory aims at the outset], but the environment still seems likely to be not-close-to-optimal, since you’d be surrounded by people not doing such exploratory work.
I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)
I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is “related” in the sense of “might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds”.
Have you looked through the FLI faculty listed there? How many seem useful supervisors for this kind of thing? Why?
If we’re sticking to the [generate new approaches to core problems] aim, I can see three or four I’d be happy to recommend, conditional on their agreeing upfront to the exploratory goals, and that publication would not be necessary (or a very low concrete number agreed upon).
There are about ten more that seem not-obviously-a-terrible-idea, but probably not great (e.g. those who I expect have a decent understanding of the core problems, but basically aren’t working on them).
The majority don’t write anything that suggests they know what the core problems are.
For almost all of these supervisors, doing a PhD would seem to provide quite a few constraints, undesirable incentives, and an environment that’s poor. From an individual’s point of view this can still make sense, if it’s one of the only ways to get stable medium-term funding. From a funder’s point of view, it seems nuts. (again, less nuts if the goal were [incremental progress on prosaic approaches, and generation of a respectable publication record])
Yeah that looks good, except that it takes an order of magnitude longer to get going on conceptual alignment directions. I’ll message Adam to hear what happened with that.
For reference there’s this: What I learned running Refine When I talked to Adam about this (over 12 months ago), he didn’t think there was much to say beyond what’s in that post. Perhaps he’s updated since.
My sense is that I view it as more of a success than Adam does. In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
Agreed that Refine’s timescale is clearly too short. However, a much longer program would set a high bar for whoever’s running it. Personally, I’d only be comfortable doing so if the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
Mhm. In fact I’d want to apply a bar that’s even lower, or at least different: [the extent to which the participants (as judged by more established alignment thinkers) seem to be well on the way to developing new promising directions—e.g. being relentlessly resourceful including at the meta-level; having both appropriate Babble and appropriate Prune; not shying away from the hard parts].
the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
Agree that this is an issue, but I think it can be addressed—certainly at least well enough that there’d be worthwhile value-of-info in running such a thing.
I’d be happy to contribute a bit of effort, if someone else is taking the lead. I think most of my efforts will be directed elsewhere, but for example I’d be happy to think through what such a program should look like; help write justificatory parts of grant applications; and maybe mentor / similar.
Yes, there is more than unwieldiness at play here. If we retitled the post “Hiring needs in on-paradigm technical AI safety,” (which does seem unwieldy and introduces an unneeded concept, IMO) this seems like it would work at cross purposes to our (now explicit) claim, “there are few opportunities for those pursuing non-prosaic, theoretical AI safety research.” I think it benefits no-one to make false or misleading claims about the current job market for non-prosaic, theoretical AI safety research (not that I think you are doing this; I just want our report to be clear). If anyone doesn’t like this fact about the world, I encourage them to do something about it! (E.g., found organizations, support mentees, publish concrete agendas, petition funders to change priorities.)
As indicated by MATS’ portfolio over research agendas, our revealed preferences largely disagree with point 1 (we definitely want to continue supporting novel ideas too, constraints permitting, but we aren’t Refine). Among other objectives, this report aims to show a flaw in the plan for point 2: high-caliber newcomers have few mentorship, job, or funding opportunities to mature as non-prosaic, theoretical technical AI safety researchers and the lead time for impactful Connectors is long. We welcome discussion on how to improve paths-to-impact for the many aspiring Connectors and theoretical AI safety researchers.
I agree with Tsvi here (as I’m sure will shock you :)).
I’d make a few points:
“our revealed preferences largely disagree with point 1”—this isn’t clear at all. We know MATS’ [preferences, given the incentives and constraints under which MATS operates]. We don’t know what you’d do absent such incentives and constraints.
I note also that “but we aren’t Refine” has the form [but we’re not doing x], rather than [but we have good reasons not to do x]. (I don’t think MATS should be Refine, but “we’re not currently 20% Refine-on-ramp” is no argument that it wouldn’t be a good idea)
MATS is in a stronger position than most to exert influence on the funding landscape. Sure, others should make this case too, but MATS should be actively making a case for what seems most important (to you, that is), not only catering to the current market.
Granted, this is complicated by MATS’ own funding constraints—you have more to lose too (and I do think this is a serious factor, undesirable as it might be).
If you believe that the current direction of the field isn’t great, then “ensure that our program continues to meet the talent needs of safety teams” is simply the wrong goal.
Of course the right goal isn’t diametrically opposed to that—but still, not that.
There’s little reason to expect the current direction of the field to be close to ideal:
At best, the accuracy of the field’s collective direction will tend to correspond to its collective understanding—which is low.
There are huge commercial incentives exerting influence.
There’s no clarity on what constitutes (progress towards) genuine impact.
There are many incentives to work on what’s already not neglected (e.g. things with easily located “tight empirical feedback loops”). The desirable properties of the non-neglected directions are a large part of the reason they’re not neglected.
Similar arguments apply to [field-level self-correction mechanisms].
Given (4), there’s an inherent sampling bias in taking [needs of current field] as [what MATS should provide]. Of course there’s still an efficiency upside in catering to [needs of current field] to a large extent—but efficiently heading in a poor direction still sucks.
I think it’s instructive to consider extreme-field-composition thought experiments: suppose the field were composed of [10,000 researchers doing mech interp] [10 researchers doing agent foundations].
Where would there be most jobs? Most funding? Most concrete ideas for further work? Does it follow that MATS would focus almost entirely on meeting the needs of all the mech interp orgs? (I expect that almost all the researchers in that scenario would claim mech interp is the most promising direction)
If you think that feedback loops along the lines of [[fast legible work on x] --> [x seems productive] --> [more people fund and work on x]] lead to desirable field dynamics in an AIS context, then it may make sense to cater to the current market. (personally, I expect this to give a systematically poor signal, but it’s not as though it’s easy to find good signals)
If you don’t expect such dynamics to end well, it’s worth considering to what extent MATS can be a field-level self-correction mechanism, rather than a contributor to predictably undesirable dynamics.
I’m not claiming this is easy!!
I’m claiming that it should be tried.
Understandable, but do you know anyone who’s considering this? As the core of their job, I mean—not on a [something they occasionally think/talk about for a couple of hours] level. It’s non-obvious to me that anyone at OpenPhil has time for this.
It seems to me that the collective ‘decision’ we’ve made here is something like:
Any person/team doing this job would need:
Extremely good AIS understanding.
To be broadly respected.
Have a lot of time.
Nobody like this exists.
We’ll just hope things work out okay using a passive distributed approach.
To my eye this leads to a load of narrow optimization according to often-not-particularly-enlightened metrics—lots of common incentives, common metrics, and correlated failure.
Oh and I still think MATS is great :) - and that most of these issues are only solvable with appropriate downstream funding landscape alterations. That said, I remain hopeful that MATS can nudge things in a helpful direction.
I plan to respond regarding MATS’ future priorities when I’m able (I can’t speak on behalf of MATS alone here and we are currently examining priorities in the lead up to our Winter 2024-25 Program), but in the meantime I’ve added some requests for proposals to my Manifund Regrantor profile.
RFPs seem a good tool here for sure. Other coordination mechanisms too.
(And perhaps RFPs for RFPs, where sketching out high-level desiderata is easier than specifying parameters for [type of concrete project you’d like to see])
Oh and I think the MATS Winter Retrospective seems great from the [measure a whole load of stuff] perspective. I think it’s non-obvious what conclusions to draw, but more data is a good starting point. It’s on my to-do-list to read it carefully and share some thoughts.
Ok I want to just lay out what I’m trying to do here, and why, because it could be based on false assumptions.
A main assumption I’m making, which totally could be false, is that your paragraph
is generally representative of the entire landscape, with a few small-ish exceptions. In other words, I’m assuming that it’s pretty difficult for a young smart person to show up and say “hey, I want to spend 3 whole years thinking about this problem de novo, can I have one year’s salary and a reevaluation after 1 year for a renewal”.
A main assumption that motivates what I’m doing here, and that could be false, is:
To make the assumption slightly more clear: The assumption says that it’s actually quite common, maybe even the single dominant way funders make decisions, for the causality of a decision to flow through literally thousands of little interactions, where the little interactions communicate “I think XYZ is Important/Unimportant”. And these aggregate up into a general sense of importance/unimportance, or something. And then funding decisions work with two filters:
The explicit reasoning about the details—is this person qualified, how much funding, what’s the feedback, who endorses it, etc etc.
The implicit filter of Un/Importance. This doesn’t get raised to attention usually. It’s just in the background.
And “fund a smart motivated youngster without a plan for 3 years with little evaluation” is “unimportant”. And this unimportance is implicitly but strongly reinforced by everyone talking about in-paradigm stuff. And the situation is self-reinforcing because youngsters mostly don’t try to do the thing, because there’s no narrative and no funding, and so it is actually true that there aren’t many smart motivated youngsters just waiting for some funding to do trailblazing.
If my assumptions are true, then IDK what to do about this but would say that at least
people should be aware of this situation, and
people should keep talking about this situation, especially in contexts where they are contributing to the loose diffuse array of impressions by contributing to framing about what AGI alignment needs.
An interesting note: I don’t necessarily want to start a debate about the merits of academia, but “fund a smart motivated youngster without a plan for 3 years with little evaluation” sounds a lot like “fund more exploratory AI safety PhDs” to me. If anyone wants to do an AI safety PhD (e.g., with these supervisors) and needs funding, I’m happy to evaluate these with my Manifund Regrantor hat on.
That would only work for people with the capacity to not give a fuck what anyone around them thinks, especially including the person funding and advising them. And that’s arguably unethical depending on context.
I like Adam’s description of an exploratory AI safety PhD:
Ex ante funding guarantees, like The Vitalik Buterin PhD Fellowship in AI Existential Safety or Manifund or other funders, mitigate my concerns around overly steering exploratory research. Also, if one is worried about culture/priority drift, there are several AI safety offices in Berkeley, Boston, London, etc. where one could complete their PhD while surrounded by AI safety professionals (which I believe was one of the main benefits of the late Lightcone office).
From the section you linked:
These would be exactly the “anyone around them” about whose opinion they would have to not give a fuck.
I don’t know a good way to do this, but maybe a pointer would be: funders should explicitly state something to the effect of:
“The purpose of this PhD funding is to find new approaches to core problems in AGI alignment. Success in this goal can’t be judged by an existing academic structure (journals, conferences, peer-review, professors) because there does not exist such a structure aimed at the core problems in AGI alignment. You may if you wish make it a major goal of yours to produce output that is well-received by some group in academia, but be aware that this goal would be non-overlapping with the purpose of this PhD funding.”
The Vitalik fellowship says:
Despite being an extremely reasonable (even necessary) requirement, this is already a major problem according to me. The problem is that (IIUC—not sure) academics are incentivized to, basically, be dishonest, if it gets them funding for projects / students. Of the ~dozen professors here (https://futureoflife.org/about-us/our-people/ai-existential-safety-community/) who I’m at least a tiny bit familiar with, I think maybe 1.5ish are actually going to happily support actually-exploratory PhD students. I could be wrong about this though—curious for more data either way. And how many will successfully communicate to the sort of person who would take a real shot at exploratory conceptual research if given the opportunity to do such research that they would in fact support that? I don’t know. Zero? One? And how would someone sent to the FLI page know of the existence of that professor?
Again, reasonable, but… Needs more clarity on what is expected, and what is not expected.
What does this even mean? This webpage doesn’t get it. We’re trying to buy something that isn’t something someone can already write a technical specification of.
I want to sidestep critique of “more exploratory AI safety PhDs” for a moment and ask: why doesn’t MIRI sponsor high-calibre young researchers with a 1-3 year basic stipend and mentorship? And why did MIRI let Vivek’s team go?
I don’t speak for MIRI, but broadly I think MIRI thinks that roughly no existing research is hopeworthy, and that this isn’t likely to change soon. I think that, anyway.
In discussions like this one, I’m conditioning on something like “it’s worth it, these days, to directly try to solve AGI alignment”. That seems assumed in the post, seems assumed in lots of these discussions, seems assumed by lots of funders, and it’s why above I wrote “the main direct help we can give to AGI alignment” rather than something stronger like “the main help (simpliciter) we can give to AGI alignment” or “the main way we can decrease X-risk”.
I’m reading this as you saying something like “I’m trying to build a practical org that successfully onramps people into doing useful work. I can’t actually do that for arbitrary domains that people aren’t providing funding for. I’m trying to solve one particular part of the problem and that’s hard enough as it is.”
Is that roughly right?
Fwiw I appreciate your Manifund regrantor Request for Proposals announcement.
I’ll probably have more thoughts later.
Yes to all this, but also I’ll go one level deeper. Even if I had tons more Manifund money to give out (and assuming all the talent needs discussed in the report are saturated with funding), it’s not immediately clear to me that “giving 1-3 year stipends to high-calibre young researchers, no questions asked” is the right play if they don’t have adequate mentorship, the ability to generate useful feedback loops, researcher support systems, access to frontier models if necessary, etc.
A few points here (all with respect to a target of “find new approaches to core problems in AGI alignment”):
It’s not clear to me what the upside of the PhD structure is supposed to be here (beyond respectability). If the aim is to avoid being influenced by most of the incentives and environment, that’s more easily achieved by not doing a PhD. (to the extent that development of research ‘taste’/skill acts to service a publish-or-perish constraint, that’s likely to be harmful)
This is not to say that there’s nothing useful about an academic context—only that the sensible approach seems to be [create environments with some of the same upsides, but fewer downsides].
I can see a more persuasive upside where the PhD environment gives:
Access to deep expertise in some relevant field.
The freedom to explore openly (without any “publish or perish” constraint).
This seems likely to be both rare, and more likely for professors not doing ML. I note here that ML professors are currently not solving fundamental alignment problems—we’re not in a [Newtonian physics looking for Einstein] situation; more [Aristotelian physics looking for Einstein]. I can more easily imagine a mathematics PhD environment being useful than an ML one (though I’d expect this to be rare too).
This is also not to say that a PhD environment might not be useful in various other ways. For example, I think David Krueger’s lab has done and is doing a bunch of useful stuff—but it’s highly unlikely to uncover new approaches to core problems.
For example, of the 213 concrete problems posed here how many would lead us to think [it’s plausible that a good answer to this question leads to meaningful progress on core AGI alignment problems]? 5? 10? (many more can be a bit helpful for short-term safety)
There are a few where sufficiently general answers would be useful, but I don’t expect such generality—both since it’s hard, and because incentives constantly push towards [publish something on this local pattern], rather than [don’t waste time running and writing up experiments on this local pattern, but instead investigate underlying structure].
I note that David’s probably at the top of my list for [would be a good supervisor for this kind of thing, conditional on having agreed the exploratory aims at the outset], but the environment still seems likely to be not-close-to-optimal, since you’d be surrounded by people not doing such exploratory work.
I do think category theory professors or similar would be reasonable advisors for certain types of MIRI research.
I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)
I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is “related” in the sense of “might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds”.
Have you looked through the FLI faculty listed there?
How many seem useful supervisors for this kind of thing? Why?
If we’re sticking to the [generate new approaches to core problems] aim, I can see three or four I’d be happy to recommend, conditional on their agreeing upfront to the exploratory goals, and that publication would not be necessary (or a very low concrete number agreed upon).
There are about ten more that seem not-obviously-a-terrible-idea, but probably not great (e.g. those who I expect have a decent understanding of the core problems, but basically aren’t working on them).
The majority don’t write anything that suggests they know what the core problems are.
For almost all of these supervisors, doing a PhD would seem to provide quite a few constraints, undesirable incentives, and an environment that’s poor.
From an individual’s point of view this can still make sense, if it’s one of the only ways to get stable medium-term funding.
From a funder’s point of view, it seems nuts.
(again, less nuts if the goal were [incremental progress on prosaic approaches, and generation of a respectable publication record])
As a concrete proposal, if anyone wants to reboot Refine or similar, I’d be interested to consider that while wearing my Manifund Regrantor hat.
Yeah that looks good, except that it takes an order of magnitude longer to get going on conceptual alignment directions. I’ll message Adam to hear what happened with that.
For reference there’s this: What I learned running Refine
When I talked to Adam about this (over 12 months ago), he didn’t think there was much to say beyond what’s in that post. Perhaps he’s updated since.
My sense is that I view it as more of a success than Adam does. In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
Agreed that Refine’s timescale is clearly too short.
However, a much longer program would set a high bar for whoever’s running it.
Personally, I’d only be comfortable doing so if the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
Ah thanks!
Mhm. In fact I’d want to apply a bar that’s even lower, or at least different: [the extent to which the participants (as judged by more established alignment thinkers) seem to be well on the way to developing new promising directions—e.g. being relentlessly resourceful including at the meta-level; having both appropriate Babble and appropriate Prune; not shying away from the hard parts].
Agree that this is an issue, but I think it can be addressed—certainly at least well enough that there’d be worthwhile value-of-info in running such a thing.
I’d be happy to contribute a bit of effort, if someone else is taking the lead. I think most of my efforts will be directed elsewhere, but for example I’d be happy to think through what such a program should look like; help write justificatory parts of grant applications; and maybe mentor / similar.
Report back if you get details, I’m curious.
See Joe’s sibling comment
https://www.lesswrong.com/posts/QzQQvGJYDeaDE4Cfg/talent-needs-in-technical-ai-safety#JP5LA9cNgqxgdAz8Z
I have, and I also remember seeing Adam’s original retrospective, but I always found it unsatisfying. Thanks anyway!