I think the contest idea is great and aimed at two absolute core alignment problems. I’d be surprised if much comes out of it, as these are really hard problems and I’m not sure contests are a good way to solve really hard problems. But it’s worth trying!
Now, a bit of a rant:
Submissions will be judged on a rolling basis by Richard Ngo, Lauro Langosco, Nate Soares, and John Wentworth.
I think this panel looks very weird to ML people. Very quickly skimming the Scholar profiles, it looks like the sum of first-author papers in top ML conferences published by these four people is one (Goal Misgeneralisation by Lauro et al.). The person with the most legible ML credentials is Lauro, who’s an early-year PhD student with 10 citations.
Look, I know Richard and he’s brilliant. I love many of his papers. I bet that these people are great researchers and can judge this contest well. But if I put myself into the shoes of an ML researcher who’s not part of the alignment community, this panel sends a message: “wow, the alignment community has hundreds of thousands of dollars, but can’t even find a single senior ML researcher crazy enough to entertain their ideas”.
There are plenty of people who understand the alignment problem very well and who also have more ML credentials. I can suggest some, if you want.
(Probably disregard this comment if ML researchers are not the target audience for the contests.)
+1. The combination of the high dollar amount, the subjective criteria, and the panel drawn from the relatively small/insular ‘core’ AI safety research community mean that I expect this to look pretty fishy to established researchers. Even if the judgments are fair (I think they probably will be!) and the contest yields good work (it might!), I expect the benefit of that to be offset to a pretty significant degree by the red flags this raises about how the AI safety scene deals with money and its connection to mainstream ML research.
(To be fair, I think the Inverse Scaling Prize, which I’m helping with, raises some of these concerns, but the more precise/partially-quantifiable prize rubric, bigger/more diverse panel, and use of additional reviewers outside the panel mitigates them at least partially.)
We chose judges primarily based on their expertise and (our perception of) their ability to evaluate submissions about goal misgeneralization and corrigibility. Lauro, Richard, Nate, and John ade some of few researchers who have thought substantially about these problems. In particular, Lauro first-authored the first paper about goal misgeneralization and Nate first-authored a foundational paper about corrigibility.
We think the judges do have some reasonable credentials (e.g., Richard works at OpenAI, Lauro is a PhD student at the University of Cambridge, Nate Soares is the Executive Director of a research organization & he has an h-index of 12, as well as 500+ citations). I think the contest meets the bar of “having reasonably well-credentialed judges” but doesn’t meet the bar of “having extremely well-credentialed judges (e.g., well-established professors with thousands of citations). I think that’s fine.
We got feedback from several ML people before launching. We didn’t get feedback that this looks “extremely weird” (though I’ll note that research competitions in general are pretty unusual).
I think it’s plausible that some people will find this extremely weird (especially people who judge things primarily based on the cumulative prestige of the associated parties & don’t think that OpenAI/500 citations/Cambridge are enough), but I don’t expect this to be a common reaction.
Some clarifications + quick thoughts on Sam’s points:
The contest isn’t aimed primarily/exclusively at established ML researchers (though we are excited to receive submissions from any ML researchers who wish to participate).
We didn’t optimize our contest to attract established researchers. Our contests are optimized to take questions that we think are at the core of alignment research and present them in a (somewhat less vague) format that gets more people to think about them.
We’re excited that other groups are running contests that are designed to attract established researchers & present different research questions.
All else equal, we think that precise/quantifiable grading criteria & a diverse panel of reviewers are preferable. However, in our view, many of the core problems in alignment (including goal misgeneralization and corrigibility) have not been sufficiently well-operationalized to have precise/quantifiable grading criteria at this stage.
Concretely, I think that if I’d show the prize to people in my lab and they actually looked at the judges (and I had some way of eliciting honest responses from them), I’d think that >60% would have some reactions according to what Sam and I described (i.e. seeing this prize as evidence that AI alignment concerns are mostly endorsed by (sometimes rich) people who have no clue about ML; or that the alignment community is dismissive of academia/peer-reviewed publishing/mainstream ML/default ways of doing science; or … ).
Your point 3.) about the feedback from ML researchers could convince me that I’m wrong, depending on whom exactly you got feedback from and how that looked like.
By the way, I’m highlighting this point in particular not because it’s highly critical (I haven’t thought much about how critical it is), but because it seems relatively easy to fix.
I think the contest idea is great and aimed at two absolute core alignment problems. I’d be surprised if much comes out of it, as these are really hard problems and I’m not sure contests are a good way to solve really hard problems. But it’s worth trying!
Now, a bit of a rant:
I think this panel looks very weird to ML people. Very quickly skimming the Scholar profiles, it looks like the sum of first-author papers in top ML conferences published by these four people is one (Goal Misgeneralisation by Lauro et al.). The person with the most legible ML credentials is Lauro, who’s an early-year PhD student with 10 citations.
Look, I know Richard and he’s brilliant. I love many of his papers. I bet that these people are great researchers and can judge this contest well. But if I put myself into the shoes of an ML researcher who’s not part of the alignment community, this panel sends a message: “wow, the alignment community has hundreds of thousands of dollars, but can’t even find a single senior ML researcher crazy enough to entertain their ideas”.
There are plenty of people who understand the alignment problem very well and who also have more ML credentials. I can suggest some, if you want.
(Probably disregard this comment if ML researchers are not the target audience for the contests.)
+1. The combination of the high dollar amount, the subjective criteria, and the panel drawn from the relatively small/insular ‘core’ AI safety research community mean that I expect this to look pretty fishy to established researchers. Even if the judgments are fair (I think they probably will be!) and the contest yields good work (it might!), I expect the benefit of that to be offset to a pretty significant degree by the red flags this raises about how the AI safety scene deals with money and its connection to mainstream ML research.
(To be fair, I think the Inverse Scaling Prize, which I’m helping with, raises some of these concerns, but the more precise/partially-quantifiable prize rubric, bigger/more diverse panel, and use of additional reviewers outside the panel mitigates them at least partially.)
Hastily written; may edit later
Thanks for mentioning this, Jan! We’d be happy to hear suggestions for additional judges. Feel free to email us at akash@alignmentawards.com and olivia@alignmentawards.com.
Some additional thoughts:
We chose judges primarily based on their expertise and (our perception of) their ability to evaluate submissions about goal misgeneralization and corrigibility. Lauro, Richard, Nate, and John ade some of few researchers who have thought substantially about these problems. In particular, Lauro first-authored the first paper about goal misgeneralization and Nate first-authored a foundational paper about corrigibility.
We think the judges do have some reasonable credentials (e.g., Richard works at OpenAI, Lauro is a PhD student at the University of Cambridge, Nate Soares is the Executive Director of a research organization & he has an h-index of 12, as well as 500+ citations). I think the contest meets the bar of “having reasonably well-credentialed judges” but doesn’t meet the bar of “having extremely well-credentialed judges (e.g., well-established professors with thousands of citations). I think that’s fine.
We got feedback from several ML people before launching. We didn’t get feedback that this looks “extremely weird” (though I’ll note that research competitions in general are pretty unusual).
I think it’s plausible that some people will find this extremely weird (especially people who judge things primarily based on the cumulative prestige of the associated parties & don’t think that OpenAI/500 citations/Cambridge are enough), but I don’t expect this to be a common reaction.
Some clarifications + quick thoughts on Sam’s points:
The contest isn’t aimed primarily/exclusively at established ML researchers (though we are excited to receive submissions from any ML researchers who wish to participate).
We didn’t optimize our contest to attract established researchers. Our contests are optimized to take questions that we think are at the core of alignment research and present them in a (somewhat less vague) format that gets more people to think about them.
We’re excited that other groups are running contests that are designed to attract established researchers & present different research questions.
All else equal, we think that precise/quantifiable grading criteria & a diverse panel of reviewers are preferable. However, in our view, many of the core problems in alignment (including goal misgeneralization and corrigibility) have not been sufficiently well-operationalized to have precise/quantifiable grading criteria at this stage.
This response does not convince me.
Concretely, I think that if I’d show the prize to people in my lab and they actually looked at the judges (and I had some way of eliciting honest responses from them), I’d think that >60% would have some reactions according to what Sam and I described (i.e. seeing this prize as evidence that AI alignment concerns are mostly endorsed by (sometimes rich) people who have no clue about ML; or that the alignment community is dismissive of academia/peer-reviewed publishing/mainstream ML/default ways of doing science; or … ).
Your point 3.) about the feedback from ML researchers could convince me that I’m wrong, depending on whom exactly you got feedback from and how that looked like.
By the way, I’m highlighting this point in particular not because it’s highly critical (I haven’t thought much about how critical it is), but because it seems relatively easy to fix.