One possible factor is that there was initially a pool of people who wouldn’t otherwise try to contribute to alignment research (~30 people, going from # of submissions to contest 1 - # of submissions to this contest) who tried their hand early on, but then became discouraged because the winners’ entries seemed more polished and productive than they felt they could realistically hope for. In fact, I felt this way in round two. I imagine that I probably would’ve stopped if the alignment prize had been my sole motivation (i.e., totally ignoring how I feel about the necessity of work on this problem).
This and cousin_it’s suggested novelty effect both make sense, but to me it just means that the prize givers got more than they bargained for in the first rounds and maybe it set people’s expectations too high for what such a prize can accomplish. I failed to pay much attention to the first two rounds (should probably go back and look at them again) and to me the latter two rounds seem like a reasonable steady state result of the prize given the amount of money/prestige involved.
I wonder if another thing that discouraged people was a feeling that they had to compete with experienced professional researchers who already have funding from other sources. I think if I were to design a new prize with the experience of this one in mind, I’d split it into two prizes, one optimized for increasing prestige of the field, and one for funding people who otherwise couldn’t get funding or to provide an alternative source of funding with better incentives. The former would look like conventional prestigious prizes in other fields, and the latter would run continuously and pay out as soon as some entry/nomination meets a certain subjective threshold of quality (which can be adjusted over time depending on the prize budget and quality of submissions), and the prize money would subtract out the amount of formal funding the recipient already received for the work (such as salaries, grants, or other prizes).
I agree with this point. Looking at the things that have won over time it eventually got to feel like it wasn’t worth bothering to submit anything because the winners were going to end up mostly being folks who would have done their work anyway and meet certain levels of prestige. In this way I do sort of feel like the prize failed because it was set up in a way that rewarded work that would have happened anyway and failed to motivate work that wouldn’t have happened otherwise. Maybe it’s only in my mind that the value of a prize like this is to increase work on the margin rather than recognize outstanding work that would have otherwise been done, but I feel like beyond the first round it’s been a prize of the form “here’s money for the best stuff on AI alignment in the last x months” rather than “here’s money to make AI alignment research happen that would otherwise not have happened”. That made me much less interested in it, to the point I put the prize out of my mind until I saw this post reminding me of it today.
I disagree with the view that it’s bad to spend the first few months prizing top researchers who would have done the work anyway. This _in and of itself_ is cleary burning cash, yet the point is to change incentives over a longer time-frame.
If you think research output is heavy-tailed, what you should expect to observe is something like this happening for a while, until promising tail-end researchers realise there’s a stable stream of value to be had here, and put in the effort required to level up and contribute themselves. It’s not implausible to me that would take a >1 year of prizes.
Expecting lots of important counterfactual work, that beats the current best work, to be come out of the woodwork within ~6 months seems to assume that A) making progress on alignment is quite tractable, and B) the ability to do so is fairly widely distributed across people; both to a seemingly unjustified extent.
I personally think prizes should be announced together with precommitments to keep delivering them for a non-trivial amount of time. I believe this because I think changing incentives involves changing expectations, in a way that changes medium-term planning. I expect people to have qualitatively different thoughts if their S1 reliably believes that fleshing out the-kinds-of-thoughts-that-take-6-months-to-flesh-out will be reward after those 6 months.
That’s expensive, in terms of both money and trust.
As an anecdata point, it seems probable that I would not write the essay about the learning-theoretic research agenda without the prize, or at least, it would be significantly delayed. This is because I am usually reluctant to publish anything that doesn’t contain non-trivial theorems, but it felt like for this prize it would be suitable (this preference is partially for objective reasons, but partially it is for entirely subjective motivation issues). In hindsight, I think that spending the time to write that essay was the right decision regardless of the prize.
I observe that, of the 16 awards of money from the AI alignmnet prize, as far as I can see none of the winners had a full-time project that wasn’t working on AI alignment (i.e. they either worked on alignment full time, or else were financially supported in a way that gave them the space to devote their attention to it fully for the purpose of the prize). I myself, just now introspecting on why I didn’t apply, didn’t S1-expect to be able to produce anything I expected to win a prize without ~1 month of work, and I have to work on LessWrong. This suggests some natural interventions (e.g. somehow giving out smaller prizes for good efforts even if they weren’t successful).
Interesting. Can you talk a bit more about how much time you actually devoted to thinking about whitelisting in the lead up to the work that was awarded, and whether you considered it your top priority at the time?
Yes, it was the top idea on/off over a few months. I considered it my secret research and thought on my twice daily walks, in the shower, and in class when bored. I developed it for my CHAI application and extended it as my final Bayesian stats project. Probably 5-10 hours a week, plus more top idea time. However, the core idea came within the first hour of thinking about Concrete Problems.
The second piece, Overcoming Clinginess, was provoked by Abram’s comment that clinginess seemed like the most damning failure of whitelisting; at the time, I thought just finding a way to overcome clinginess would be an extremely productive use of my entire summer (lol). On an AMS—PDX flight, I put on some music and spent hours running through different scenarios to dissolve my confusion. I hit the solution after about 5 hours of work, spending 3 hours formalizing it a bit and 5 more making it look nice.
Yeah, this is similar to how I got into the game. Just thinking about it in my spare time for fun.
From your and others’ comments, it sounds like a prize for best work isn’t the best use of money. It’s better to spend money on getting more people into the game. In that case it probably shouldn’t be a competition: beginners need gradual rewards, not one-shot high stakes. Something like a more flat subsidy for studying and mentoring could work better. Thank you for making me realize that! I’ll try to talk about it with folks.
One possible factor is that there was initially a pool of people who wouldn’t otherwise try to contribute to alignment research (~30 people, going from # of submissions to contest 1 - # of submissions to this contest) who tried their hand early on, but then became discouraged because the winners’ entries seemed more polished and productive than they felt they could realistically hope for. In fact, I felt this way in round two. I imagine that I probably would’ve stopped if the alignment prize had been my sole motivation (i.e., totally ignoring how I feel about the necessity of work on this problem).
This and cousin_it’s suggested novelty effect both make sense, but to me it just means that the prize givers got more than they bargained for in the first rounds and maybe it set people’s expectations too high for what such a prize can accomplish. I failed to pay much attention to the first two rounds (should probably go back and look at them again) and to me the latter two rounds seem like a reasonable steady state result of the prize given the amount of money/prestige involved.
I wonder if another thing that discouraged people was a feeling that they had to compete with experienced professional researchers who already have funding from other sources. I think if I were to design a new prize with the experience of this one in mind, I’d split it into two prizes, one optimized for increasing prestige of the field, and one for funding people who otherwise couldn’t get funding or to provide an alternative source of funding with better incentives. The former would look like conventional prestigious prizes in other fields, and the latter would run continuously and pay out as soon as some entry/nomination meets a certain subjective threshold of quality (which can be adjusted over time depending on the prize budget and quality of submissions), and the prize money would subtract out the amount of formal funding the recipient already received for the work (such as salaries, grants, or other prizes).
I agree with this point. Looking at the things that have won over time it eventually got to feel like it wasn’t worth bothering to submit anything because the winners were going to end up mostly being folks who would have done their work anyway and meet certain levels of prestige. In this way I do sort of feel like the prize failed because it was set up in a way that rewarded work that would have happened anyway and failed to motivate work that wouldn’t have happened otherwise. Maybe it’s only in my mind that the value of a prize like this is to increase work on the margin rather than recognize outstanding work that would have otherwise been done, but I feel like beyond the first round it’s been a prize of the form “here’s money for the best stuff on AI alignment in the last x months” rather than “here’s money to make AI alignment research happen that would otherwise not have happened”. That made me much less interested in it, to the point I put the prize out of my mind until I saw this post reminding me of it today.
I disagree with the view that it’s bad to spend the first few months prizing top researchers who would have done the work anyway. This _in and of itself_ is cleary burning cash, yet the point is to change incentives over a longer time-frame.
If you think research output is heavy-tailed, what you should expect to observe is something like this happening for a while, until promising tail-end researchers realise there’s a stable stream of value to be had here, and put in the effort required to level up and contribute themselves. It’s not implausible to me that would take a >1 year of prizes.
Expecting lots of important counterfactual work, that beats the current best work, to be come out of the woodwork within ~6 months seems to assume that A) making progress on alignment is quite tractable, and B) the ability to do so is fairly widely distributed across people; both to a seemingly unjustified extent.
I personally think prizes should be announced together with precommitments to keep delivering them for a non-trivial amount of time. I believe this because I think changing incentives involves changing expectations, in a way that changes medium-term planning. I expect people to have qualitatively different thoughts if their S1 reliably believes that fleshing out the-kinds-of-thoughts-that-take-6-months-to-flesh-out will be reward after those 6 months.
That’s expensive, in terms of both money and trust.
As an anecdata point, it seems probable that I would not write the essay about the learning-theoretic research agenda without the prize, or at least, it would be significantly delayed. This is because I am usually reluctant to publish anything that doesn’t contain non-trivial theorems, but it felt like for this prize it would be suitable (this preference is partially for objective reasons, but partially it is for entirely subjective motivation issues). In hindsight, I think that spending the time to write that essay was the right decision regardless of the prize.
As another anecdata point, I considered writing more to pursue the prize pool but ultimately didn’t do any more (counterfactual) work!
fwiw, thirding this perception (although my take is less relevant since I didn’t feel like I was in the target reference class in the first place)
I observe that, of the 16 awards of money from the AI alignmnet prize, as far as I can see none of the winners had a full-time project that wasn’t working on AI alignment (i.e. they either worked on alignment full time, or else were financially supported in a way that gave them the space to devote their attention to it fully for the purpose of the prize). I myself, just now introspecting on why I didn’t apply, didn’t S1-expect to be able to produce anything I expected to win a prize without ~1 month of work, and I have to work on LessWrong. This suggests some natural interventions (e.g. somehow giving out smaller prizes for good efforts even if they weren’t successful).
In round three, I was working on computational molecule design research and completing coursework; whitelisting was developed in my spare time.
In fact, during the school year I presently don’t have research funding, so I spend some of my time as a teaching assistant.
Interesting. Can you talk a bit more about how much time you actually devoted to thinking about whitelisting in the lead up to the work that was awarded, and whether you considered it your top priority at the time?
Added: Was it the top idea in your mind for any substantial period of time?
Yes, it was the top idea on/off over a few months. I considered it my secret research and thought on my twice daily walks, in the shower, and in class when bored. I developed it for my CHAI application and extended it as my final Bayesian stats project. Probably 5-10 hours a week, plus more top idea time. However, the core idea came within the first hour of thinking about Concrete Problems.
The second piece, Overcoming Clinginess, was provoked by Abram’s comment that clinginess seemed like the most damning failure of whitelisting; at the time, I thought just finding a way to overcome clinginess would be an extremely productive use of my entire summer (lol). On an AMS—PDX flight, I put on some music and spent hours running through different scenarios to dissolve my confusion. I hit the solution after about 5 hours of work, spending 3 hours formalizing it a bit and 5 more making it look nice.
Yeah, this is similar to how I got into the game. Just thinking about it in my spare time for fun.
From your and others’ comments, it sounds like a prize for best work isn’t the best use of money. It’s better to spend money on getting more people into the game. In that case it probably shouldn’t be a competition: beginners need gradual rewards, not one-shot high stakes. Something like a more flat subsidy for studying and mentoring could work better. Thank you for making me realize that! I’ll try to talk about it with folks.
I also think surveying applicants might be a good idea, since my experience may not be representative.