This is why we introduced X-Risk Sheets, a questionnaire that researchers should include in their paper if they’re claiming that their paper reduces AI x-risk. This way researchers need to explain their thinking and collect evidence that they’re not just advancing capabilities.
We now include these x-risk sheets in our papers. For example, here is an example x-risk sheet included in an arXiv paper we put up yesterday.
At first glance of seeing this, I’m reminded of the safety questionnaires I had to fill out as part of running a study when taking experimental psychology classes in undergrad. It was a lot of annoyance and mostly a box ticking exercise. Everyone mostly did what they wanted to do anyway, and then hurriedly gerrymandered that questionnaire right before the deadline, so the faculty would allow them to proceed. Except the very conscientious students, who saw this as an excellent opportunity to prove their box ticking diligence.
As a case in point, I might even point to the arXiv paper you cite that has an x-risk sheet at the end.
In short, this could help resolve matters of fact that influence policies and decisions made by political leaders in an increasingly complex modern world, putting humanity in a better place to deal with the global turbulence and uncertainty created by AI systems when they rapidly reshape society. A fuller motivation for “ML for Improving Epistemics” is described in Hendrycks and Mazeika (2022).
I think that for most AI papers that come out you can find a corner case at this level of abstraction where that paper helps with x-risk, basically regardless of whether it actually does. For what it’s worth, I spent about 2 years of my life working full-time on building a platform where top forecasters attacked AI questions, and wrote what I believe at the time was the most well-written database of forecasting questions on AI. I eventually stopped working on that, because I believe the work didn’t really matter for alignment.
(So, overall, I’m pretty worried that, if used frequently, these kinds of spreadsheets will mostly increase safetywashing, because they’ll mostly make lots of researchers talk about safety even though they have nothing to say, and there’ll be pressure toward saying that their thing helps with safety.)
[I work for Dan Hendrycks but he hasn’t reviewed this.]
It seems to me like your comment roughly boils down to “people will exploit safety questionaires.” I agree with that. However, I think they are much more likely to exploit social influence, blog posts, and vagueness than specific questionaires. The biggest strengths of the x-risk sheet, in my view, are:
(1) It requires a specific explanation of how the paper is relevant to x-risk, which cannot be tuned depending on the audience one is talking to. You give the example from the forecasting paper and suggest it’s unconvincing. The counterfactual is that the forecasting paper is released, the authors are telling people and funders that it’s relevant to safety, and there isn’t even anything explicitly written for you to find unconvincing and argue against. The sheets can help resolve this problem (though in this case, you haven’t really said why you find it unconvincing). Part of the reason I was motivated to write Pragmatic AI Safety (which covers many of these topics) was so that the ideas in it are staked out clearly. That way people can have something clear to criticize, and it also forces their criticisms to be more specific.
(2) There is a clear trend of saying that papers that are mostly about capabilities are about safety. This sheet forces authors to directly address this in their paper, and either admit the fact that they are doing capabilities or attempt to construct a contorted and falsifiable argument otherwise.
(3) The standardized form allows for people to challenge specific points made in the x-risk sheet, rather than cherrypicked things the authors feel like mentioning in conversation or blog posts.
Your picture of faculty simply looking at the boxes being checked and approving is, I hope, not actually how funders in the AI safety space are operating (if they are, then yes, no x-risk sheet can save them). I would hope that reviewers and evaluators of papers will directly address the evidence for each piece of the x-risk sheet and challenge incorrect assertions.
I’d be a bit worried if x-risk sheets were included in every conference, but if you instead just make them a requirement for “all papers that want AI safety money” or “all papers that claim to be about AI safety” I’m not that worried that the sheets themselves would make any researchers talk about safety if they were not already talking about it.
This is why we introduced X-Risk Sheets, a questionnaire that researchers should include in their paper if they’re claiming that their paper reduces AI x-risk. This way researchers need to explain their thinking and collect evidence that they’re not just advancing capabilities.
We now include these x-risk sheets in our papers. For example, here is an example x-risk sheet included in an arXiv paper we put up yesterday.
At first glance of seeing this, I’m reminded of the safety questionnaires I had to fill out as part of running a study when taking experimental psychology classes in undergrad. It was a lot of annoyance and mostly a box ticking exercise. Everyone mostly did what they wanted to do anyway, and then hurriedly gerrymandered that questionnaire right before the deadline, so the faculty would allow them to proceed. Except the very conscientious students, who saw this as an excellent opportunity to prove their box ticking diligence.
As a case in point, I might even point to the arXiv paper you cite that has an x-risk sheet at the end.
I think that for most AI papers that come out you can find a corner case at this level of abstraction where that paper helps with x-risk, basically regardless of whether it actually does. For what it’s worth, I spent about 2 years of my life working full-time on building a platform where top forecasters attacked AI questions, and wrote what I believe at the time was the most well-written database of forecasting questions on AI. I eventually stopped working on that, because I believe the work didn’t really matter for alignment.
(So, overall, I’m pretty worried that, if used frequently, these kinds of spreadsheets will mostly increase safetywashing, because they’ll mostly make lots of researchers talk about safety even though they have nothing to say, and there’ll be pressure toward saying that their thing helps with safety.)
[I work for Dan Hendrycks but he hasn’t reviewed this.]
It seems to me like your comment roughly boils down to “people will exploit safety questionaires.” I agree with that. However, I think they are much more likely to exploit social influence, blog posts, and vagueness than specific questionaires. The biggest strengths of the x-risk sheet, in my view, are:
(1) It requires a specific explanation of how the paper is relevant to x-risk, which cannot be tuned depending on the audience one is talking to. You give the example from the forecasting paper and suggest it’s unconvincing. The counterfactual is that the forecasting paper is released, the authors are telling people and funders that it’s relevant to safety, and there isn’t even anything explicitly written for you to find unconvincing and argue against. The sheets can help resolve this problem (though in this case, you haven’t really said why you find it unconvincing). Part of the reason I was motivated to write Pragmatic AI Safety (which covers many of these topics) was so that the ideas in it are staked out clearly. That way people can have something clear to criticize, and it also forces their criticisms to be more specific.
(2) There is a clear trend of saying that papers that are mostly about capabilities are about safety. This sheet forces authors to directly address this in their paper, and either admit the fact that they are doing capabilities or attempt to construct a contorted and falsifiable argument otherwise.
(3) The standardized form allows for people to challenge specific points made in the x-risk sheet, rather than cherrypicked things the authors feel like mentioning in conversation or blog posts.
Your picture of faculty simply looking at the boxes being checked and approving is, I hope, not actually how funders in the AI safety space are operating (if they are, then yes, no x-risk sheet can save them). I would hope that reviewers and evaluators of papers will directly address the evidence for each piece of the x-risk sheet and challenge incorrect assertions.
I’d be a bit worried if x-risk sheets were included in every conference, but if you instead just make them a requirement for “all papers that want AI safety money” or “all papers that claim to be about AI safety” I’m not that worried that the sheets themselves would make any researchers talk about safety if they were not already talking about it.