It has stimulated a lot of good work during its year-long run, but participation has been slowing down from round to round, and we don’t think it’s worth continuing in its current form.
Any guesses why that’s happening? For future prizes, I wonder if it would make sense to accept nominations instead of requiring authors to submit their own work.
Prizes are something people have suggested as providing better incentives than most current forms of funding, so it’s disappointing to see existing prizes shut down. (If the upcoming write-up of lessons learned will talk about this, I can wait for that.)
I think our prize was relatively small in terms of both money and prestige. Offering more money was possible, but people usually won’t work for the mere chance of money unless you offer a stupidly large sum, which would lead to other problems. The real solution is offering more prestige, but that’s hard, unless you have a stash of it somewhere.
I think our prize was relatively small in terms of both money and prestige.
That’s true, but given that, the results don’t seem so bad? What would have counted as a success for this experiment in your view? Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? If someone wanted to spend say 1x to 10x the amount that was given out by this prize per year, do you think prizes are still worth trying (maybe with some design changes) or should they look for something else?
Also this doesn’t seem to explain why participation declined over time, so I’m still curious about that.
Offering more money was possible, but people usually won’t work for the mere chance of money unless you offer a stupidly large sum, which would lead to other problems.
I think maybe there’s a tipping point where prizes could work if they collectively gave out enough money on a regular basis that someone who is sufficiently productive in AI safety research could expect to make a living from prizes alone. (I’m thinking that instead of having fixed periods, just give out a prize whenever a new advance comes in that meets a certain subjective threshold.) Would you consider that a “stupidly large sum” and if so what kind of problems do you think it leads to?
The real solution is offering more prestige, but that’s hard, unless you have a stash of it somewhere.
More prestige certainly helps, but I feel that more money hasn’t been tried hard enough yet, unless you know something that I don’t.
Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? In my opinion for example AI safety camps were significantly more effective. I have maybe 2-3 ideas which would be likely more effective (sorry but shareable in private only).
One possible factor is that there was initially a pool of people who wouldn’t otherwise try to contribute to alignment research (~30 people, going from # of submissions to contest 1 - # of submissions to this contest) who tried their hand early on, but then became discouraged because the winners’ entries seemed more polished and productive than they felt they could realistically hope for. In fact, I felt this way in round two. I imagine that I probably would’ve stopped if the alignment prize had been my sole motivation (i.e., totally ignoring how I feel about the necessity of work on this problem).
This and cousin_it’s suggested novelty effect both make sense, but to me it just means that the prize givers got more than they bargained for in the first rounds and maybe it set people’s expectations too high for what such a prize can accomplish. I failed to pay much attention to the first two rounds (should probably go back and look at them again) and to me the latter two rounds seem like a reasonable steady state result of the prize given the amount of money/prestige involved.
I wonder if another thing that discouraged people was a feeling that they had to compete with experienced professional researchers who already have funding from other sources. I think if I were to design a new prize with the experience of this one in mind, I’d split it into two prizes, one optimized for increasing prestige of the field, and one for funding people who otherwise couldn’t get funding or to provide an alternative source of funding with better incentives. The former would look like conventional prestigious prizes in other fields, and the latter would run continuously and pay out as soon as some entry/nomination meets a certain subjective threshold of quality (which can be adjusted over time depending on the prize budget and quality of submissions), and the prize money would subtract out the amount of formal funding the recipient already received for the work (such as salaries, grants, or other prizes).
I agree with this point. Looking at the things that have won over time it eventually got to feel like it wasn’t worth bothering to submit anything because the winners were going to end up mostly being folks who would have done their work anyway and meet certain levels of prestige. In this way I do sort of feel like the prize failed because it was set up in a way that rewarded work that would have happened anyway and failed to motivate work that wouldn’t have happened otherwise. Maybe it’s only in my mind that the value of a prize like this is to increase work on the margin rather than recognize outstanding work that would have otherwise been done, but I feel like beyond the first round it’s been a prize of the form “here’s money for the best stuff on AI alignment in the last x months” rather than “here’s money to make AI alignment research happen that would otherwise not have happened”. That made me much less interested in it, to the point I put the prize out of my mind until I saw this post reminding me of it today.
I disagree with the view that it’s bad to spend the first few months prizing top researchers who would have done the work anyway. This _in and of itself_ is cleary burning cash, yet the point is to change incentives over a longer time-frame.
If you think research output is heavy-tailed, what you should expect to observe is something like this happening for a while, until promising tail-end researchers realise there’s a stable stream of value to be had here, and put in the effort required to level up and contribute themselves. It’s not implausible to me that would take a >1 year of prizes.
Expecting lots of important counterfactual work, that beats the current best work, to be come out of the woodwork within ~6 months seems to assume that A) making progress on alignment is quite tractable, and B) the ability to do so is fairly widely distributed across people; both to a seemingly unjustified extent.
I personally think prizes should be announced together with precommitments to keep delivering them for a non-trivial amount of time. I believe this because I think changing incentives involves changing expectations, in a way that changes medium-term planning. I expect people to have qualitatively different thoughts if their S1 reliably believes that fleshing out the-kinds-of-thoughts-that-take-6-months-to-flesh-out will be reward after those 6 months.
That’s expensive, in terms of both money and trust.
As an anecdata point, it seems probable that I would not write the essay about the learning-theoretic research agenda without the prize, or at least, it would be significantly delayed. This is because I am usually reluctant to publish anything that doesn’t contain non-trivial theorems, but it felt like for this prize it would be suitable (this preference is partially for objective reasons, but partially it is for entirely subjective motivation issues). In hindsight, I think that spending the time to write that essay was the right decision regardless of the prize.
I observe that, of the 16 awards of money from the AI alignmnet prize, as far as I can see none of the winners had a full-time project that wasn’t working on AI alignment (i.e. they either worked on alignment full time, or else were financially supported in a way that gave them the space to devote their attention to it fully for the purpose of the prize). I myself, just now introspecting on why I didn’t apply, didn’t S1-expect to be able to produce anything I expected to win a prize without ~1 month of work, and I have to work on LessWrong. This suggests some natural interventions (e.g. somehow giving out smaller prizes for good efforts even if they weren’t successful).
Interesting. Can you talk a bit more about how much time you actually devoted to thinking about whitelisting in the lead up to the work that was awarded, and whether you considered it your top priority at the time?
Yes, it was the top idea on/off over a few months. I considered it my secret research and thought on my twice daily walks, in the shower, and in class when bored. I developed it for my CHAI application and extended it as my final Bayesian stats project. Probably 5-10 hours a week, plus more top idea time. However, the core idea came within the first hour of thinking about Concrete Problems.
The second piece, Overcoming Clinginess, was provoked by Abram’s comment that clinginess seemed like the most damning failure of whitelisting; at the time, I thought just finding a way to overcome clinginess would be an extremely productive use of my entire summer (lol). On an AMS—PDX flight, I put on some music and spent hours running through different scenarios to dissolve my confusion. I hit the solution after about 5 hours of work, spending 3 hours formalizing it a bit and 5 more making it look nice.
Yeah, this is similar to how I got into the game. Just thinking about it in my spare time for fun.
From your and others’ comments, it sounds like a prize for best work isn’t the best use of money. It’s better to spend money on getting more people into the game. In that case it probably shouldn’t be a competition: beginners need gradual rewards, not one-shot high stakes. Something like a more flat subsidy for studying and mentoring could work better. Thank you for making me realize that! I’ll try to talk about it with folks.
The first couple rounds attracted many people due to the novelty effect, but then it tapered off and we had no good ideas how to make it grow. Maybe offering 10x the money would solve that, but I think it would mostly attract bad entries.
Could there be some kind of mentorship incentive? Another problem at large in alignment research seems to be lack of mentors, since most of the people skilled enough to fill this role are desperately working against the clock. A naïve solution could be to offer a smaller prize to the mentor of a newer researcher if the newbie’s submission details a significant amount of help on their part. Obviously, dishonest people could throw the name of their friend on the submission because “why not”, but I’m not sure how serious this would be.
What would be nice would be some incentive for high quality mentorship / for bringing new people into the contest and research field, in a way that encourages the mentors to get their friends in the contest, even though that might end up increasing the amount of competition they have for their own proposal.
This might also modestly improve social incentives for mentors, since people like being associated with success and being seen as helpful / altruistic.
ETA: What about a flat prize (a few thousand dollars) you can only win once, but thence can mentor others and receive a slightly more modest sum for prizes they win? It might help kickstart people’s alignment careers if sufficiently selective / give them the confidence to continue work. Have to worry about the details for what counts as mentorship, depending on how cheaty we think people would try to be.
As Raemon noted, mentorship bottleneck is actually a bottleneck. Senior researchers who should mentor are the most bottlenecked resource in the field, and the problem is unlikely to be solved by financial or similar incentives. Motivating too much is probably wrong, because mentoring competes with time to do research, evaluate grants, etc. What can be done is
improve the utilization of time of the mentors (e.g. mentoring teams of people instead of individuals)
do what can be done on peer-to-peer basis
use mentors from other fields to teach people generic skills, e.g. how to do research
I can probably spend some time (perhaps around 4 hours / week) on mentoring, especially for new researchers that want to contribute to the learning-theoretic research agenda or its vicinity. However, I am not sure how to make this known to the relevant people. Should I write a post that says “hey, who wants a mentor?” Is there a better way?
Important not to let the perfect be the enemy of the good. There’s almost certainly a better way to find mentors, but this would be far better than not doing anything, so I’d say that if you can’t find an actionable better option within (let’s say) a month, you should just do it. Or just do it now and replace with better method when you find one.
I think the mentorship bottleneck is quite important, but my sense is it actually is a bottleneck, i.e. most people with the capacity to mentor people already are.
Any guesses why that’s happening? For future prizes, I wonder if it would make sense to accept nominations instead of requiring authors to submit their own work.
Prizes are something people have suggested as providing better incentives than most current forms of funding, so it’s disappointing to see existing prizes shut down. (If the upcoming write-up of lessons learned will talk about this, I can wait for that.)
I think our prize was relatively small in terms of both money and prestige. Offering more money was possible, but people usually won’t work for the mere chance of money unless you offer a stupidly large sum, which would lead to other problems. The real solution is offering more prestige, but that’s hard, unless you have a stash of it somewhere.
That’s true, but given that, the results don’t seem so bad? What would have counted as a success for this experiment in your view? Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? If someone wanted to spend say 1x to 10x the amount that was given out by this prize per year, do you think prizes are still worth trying (maybe with some design changes) or should they look for something else?
Also this doesn’t seem to explain why participation declined over time, so I’m still curious about that.
I think maybe there’s a tipping point where prizes could work if they collectively gave out enough money on a regular basis that someone who is sufficiently productive in AI safety research could expect to make a living from prizes alone. (I’m thinking that instead of having fixed periods, just give out a prize whenever a new advance comes in that meets a certain subjective threshold.) Would you consider that a “stupidly large sum” and if so what kind of problems do you think it leads to?
More prestige certainly helps, but I feel that more money hasn’t been tried hard enough yet, unless you know something that I don’t.
To be clear, I still think this is a good way to spend money. I think the main cost is time.
Whose time do you mean? The judges? Your own time? The participants’ time?
Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? In my opinion for example AI safety camps were significantly more effective. I have maybe 2-3 ideas which would be likely more effective (sorry but shareable in private only).
One possible factor is that there was initially a pool of people who wouldn’t otherwise try to contribute to alignment research (~30 people, going from # of submissions to contest 1 - # of submissions to this contest) who tried their hand early on, but then became discouraged because the winners’ entries seemed more polished and productive than they felt they could realistically hope for. In fact, I felt this way in round two. I imagine that I probably would’ve stopped if the alignment prize had been my sole motivation (i.e., totally ignoring how I feel about the necessity of work on this problem).
This and cousin_it’s suggested novelty effect both make sense, but to me it just means that the prize givers got more than they bargained for in the first rounds and maybe it set people’s expectations too high for what such a prize can accomplish. I failed to pay much attention to the first two rounds (should probably go back and look at them again) and to me the latter two rounds seem like a reasonable steady state result of the prize given the amount of money/prestige involved.
I wonder if another thing that discouraged people was a feeling that they had to compete with experienced professional researchers who already have funding from other sources. I think if I were to design a new prize with the experience of this one in mind, I’d split it into two prizes, one optimized for increasing prestige of the field, and one for funding people who otherwise couldn’t get funding or to provide an alternative source of funding with better incentives. The former would look like conventional prestigious prizes in other fields, and the latter would run continuously and pay out as soon as some entry/nomination meets a certain subjective threshold of quality (which can be adjusted over time depending on the prize budget and quality of submissions), and the prize money would subtract out the amount of formal funding the recipient already received for the work (such as salaries, grants, or other prizes).
I agree with this point. Looking at the things that have won over time it eventually got to feel like it wasn’t worth bothering to submit anything because the winners were going to end up mostly being folks who would have done their work anyway and meet certain levels of prestige. In this way I do sort of feel like the prize failed because it was set up in a way that rewarded work that would have happened anyway and failed to motivate work that wouldn’t have happened otherwise. Maybe it’s only in my mind that the value of a prize like this is to increase work on the margin rather than recognize outstanding work that would have otherwise been done, but I feel like beyond the first round it’s been a prize of the form “here’s money for the best stuff on AI alignment in the last x months” rather than “here’s money to make AI alignment research happen that would otherwise not have happened”. That made me much less interested in it, to the point I put the prize out of my mind until I saw this post reminding me of it today.
I disagree with the view that it’s bad to spend the first few months prizing top researchers who would have done the work anyway. This _in and of itself_ is cleary burning cash, yet the point is to change incentives over a longer time-frame.
If you think research output is heavy-tailed, what you should expect to observe is something like this happening for a while, until promising tail-end researchers realise there’s a stable stream of value to be had here, and put in the effort required to level up and contribute themselves. It’s not implausible to me that would take a >1 year of prizes.
Expecting lots of important counterfactual work, that beats the current best work, to be come out of the woodwork within ~6 months seems to assume that A) making progress on alignment is quite tractable, and B) the ability to do so is fairly widely distributed across people; both to a seemingly unjustified extent.
I personally think prizes should be announced together with precommitments to keep delivering them for a non-trivial amount of time. I believe this because I think changing incentives involves changing expectations, in a way that changes medium-term planning. I expect people to have qualitatively different thoughts if their S1 reliably believes that fleshing out the-kinds-of-thoughts-that-take-6-months-to-flesh-out will be reward after those 6 months.
That’s expensive, in terms of both money and trust.
As an anecdata point, it seems probable that I would not write the essay about the learning-theoretic research agenda without the prize, or at least, it would be significantly delayed. This is because I am usually reluctant to publish anything that doesn’t contain non-trivial theorems, but it felt like for this prize it would be suitable (this preference is partially for objective reasons, but partially it is for entirely subjective motivation issues). In hindsight, I think that spending the time to write that essay was the right decision regardless of the prize.
As another anecdata point, I considered writing more to pursue the prize pool but ultimately didn’t do any more (counterfactual) work!
fwiw, thirding this perception (although my take is less relevant since I didn’t feel like I was in the target reference class in the first place)
I observe that, of the 16 awards of money from the AI alignmnet prize, as far as I can see none of the winners had a full-time project that wasn’t working on AI alignment (i.e. they either worked on alignment full time, or else were financially supported in a way that gave them the space to devote their attention to it fully for the purpose of the prize). I myself, just now introspecting on why I didn’t apply, didn’t S1-expect to be able to produce anything I expected to win a prize without ~1 month of work, and I have to work on LessWrong. This suggests some natural interventions (e.g. somehow giving out smaller prizes for good efforts even if they weren’t successful).
In round three, I was working on computational molecule design research and completing coursework; whitelisting was developed in my spare time.
In fact, during the school year I presently don’t have research funding, so I spend some of my time as a teaching assistant.
Interesting. Can you talk a bit more about how much time you actually devoted to thinking about whitelisting in the lead up to the work that was awarded, and whether you considered it your top priority at the time?
Added: Was it the top idea in your mind for any substantial period of time?
Yes, it was the top idea on/off over a few months. I considered it my secret research and thought on my twice daily walks, in the shower, and in class when bored. I developed it for my CHAI application and extended it as my final Bayesian stats project. Probably 5-10 hours a week, plus more top idea time. However, the core idea came within the first hour of thinking about Concrete Problems.
The second piece, Overcoming Clinginess, was provoked by Abram’s comment that clinginess seemed like the most damning failure of whitelisting; at the time, I thought just finding a way to overcome clinginess would be an extremely productive use of my entire summer (lol). On an AMS—PDX flight, I put on some music and spent hours running through different scenarios to dissolve my confusion. I hit the solution after about 5 hours of work, spending 3 hours formalizing it a bit and 5 more making it look nice.
Yeah, this is similar to how I got into the game. Just thinking about it in my spare time for fun.
From your and others’ comments, it sounds like a prize for best work isn’t the best use of money. It’s better to spend money on getting more people into the game. In that case it probably shouldn’t be a competition: beginners need gradual rewards, not one-shot high stakes. Something like a more flat subsidy for studying and mentoring could work better. Thank you for making me realize that! I’ll try to talk about it with folks.
I also think surveying applicants might be a good idea, since my experience may not be representative.
The first couple rounds attracted many people due to the novelty effect, but then it tapered off and we had no good ideas how to make it grow. Maybe offering 10x the money would solve that, but I think it would mostly attract bad entries.
Could there be some kind of mentorship incentive? Another problem at large in alignment research seems to be lack of mentors, since most of the people skilled enough to fill this role are desperately working against the clock. A naïve solution could be to offer a smaller prize to the mentor of a newer researcher if the newbie’s submission details a significant amount of help on their part. Obviously, dishonest people could throw the name of their friend on the submission because “why not”, but I’m not sure how serious this would be.
What would be nice would be some incentive for high quality mentorship / for bringing new people into the contest and research field, in a way that encourages the mentors to get their friends in the contest, even though that might end up increasing the amount of competition they have for their own proposal.
This might also modestly improve social incentives for mentors, since people like being associated with success and being seen as helpful / altruistic.
ETA: What about a flat prize (a few thousand dollars) you can only win once, but thence can mentor others and receive a slightly more modest sum for prizes they win? It might help kickstart people’s alignment careers if sufficiently selective / give them the confidence to continue work. Have to worry about the details for what counts as mentorship, depending on how cheaty we think people would try to be.
As Raemon noted, mentorship bottleneck is actually a bottleneck. Senior researchers who should mentor are the most bottlenecked resource in the field, and the problem is unlikely to be solved by financial or similar incentives. Motivating too much is probably wrong, because mentoring competes with time to do research, evaluate grants, etc. What can be done is
improve the utilization of time of the mentors (e.g. mentoring teams of people instead of individuals)
do what can be done on peer-to-peer basis
use mentors from other fields to teach people generic skills, e.g. how to do research
prepare better materials for onboarding
Some relevant bits from Critch’s blog, relevant to the “use mentors from other fields for generic skills” include:
Leverage Academia
Using “Get into UC Berkeley” as a screening filter.
Deliberate Grad School.
I can probably spend some time (perhaps around 4 hours / week) on mentoring, especially for new researchers that want to contribute to the learning-theoretic research agenda or its vicinity. However, I am not sure how to make this known to the relevant people. Should I write a post that says “hey, who wants a mentor?” Is there a better way?
Important not to let the perfect be the enemy of the good. There’s almost certainly a better way to find mentors, but this would be far better than not doing anything, so I’d say that if you can’t find an actionable better option within (let’s say) a month, you should just do it. Or just do it now and replace with better method when you find one.
Off-the-cuff: I think making that post is probably good. In the longterm hopefully we can come up with a more enduring solution.
I think the mentorship bottleneck is quite important, but my sense is it actually is a bottleneck, i.e. most people with the capacity to mentor people already are.