I am really annoyed by the Twitter thread about this paper. I doubt it will hold up and it’s been seen 450k times. Hendryks had ample opportunity after initial skepticism to remove it, but chose not to. I expect this to have reputational costs to him and to AI safety in general. If people think he (and by association some of us) are charlatan’s for saying one thing and doing anohter in terms of being careful with the truth, I will have some sympathy with their position.
I sympathize with the annoyance, but I think the response from the broader safety crowd (e.g., your Manifold market, substantive critiques and general ill-reception on LessWrong) has actually been pretty healthy overall; I think it’s rare that peer review or other forms of community assessment work as well or quickly.
Hendryks had ample opportunity after initial skepticism to remove it, but chose not to.
IMO, this seems to demand a very immediate/sudden/urgent reaction. If Hendrycks ends up being wrong, I think he should issue some sort of retraction (and I think it would be reasonable to be annoyed if he doesn’t.)
But I don’t think the standard should be “you need to react to criticism within ~24 hours” for this kind of thing. If you write a research paper and people raise important concerns about it, I think you have a duty to investigate them and respond to them, but I don’t think you need to fundamentally change your mind within the first few hours/days.
I think we should afford researchers the time to seriously evaluate claims/criticisms, reflect on them, and issue a polished statement (and potential retraction).
(Caveat that there are some cases where immediate action is needed– like EG if a company releases a product that is imminently dangerous– but I don’t think “making an intellectual claim about LLM capabilities that turns out to be wrong” would meet my bar.)
Under peer review, this never would have been seen by the public. It would have incentivized CAIS to actually think about the potential flaws in their work before blasting it to the public.
I am really annoyed by the Twitter thread about this paper. I doubt it will hold up and it’s been seen 450k times. Hendryks had ample opportunity after initial skepticism to remove it, but chose not to. I expect this to have reputational costs to him and to AI safety in general. If people think he (and by association some of us) are charlatan’s for saying one thing and doing anohter in terms of being careful with the truth, I will have some sympathy with their position.
I sympathize with the annoyance, but I think the response from the broader safety crowd (e.g., your Manifold market, substantive critiques and general ill-reception on LessWrong) has actually been pretty healthy overall; I think it’s rare that peer review or other forms of community assessment work as well or quickly.
IMO, this seems to demand a very immediate/sudden/urgent reaction. If Hendrycks ends up being wrong, I think he should issue some sort of retraction (and I think it would be reasonable to be annoyed if he doesn’t.)
But I don’t think the standard should be “you need to react to criticism within ~24 hours” for this kind of thing. If you write a research paper and people raise important concerns about it, I think you have a duty to investigate them and respond to them, but I don’t think you need to fundamentally change your mind within the first few hours/days.
I think we should afford researchers the time to seriously evaluate claims/criticisms, reflect on them, and issue a polished statement (and potential retraction).
(Caveat that there are some cases where immediate action is needed– like EG if a company releases a product that is imminently dangerous– but I don’t think “making an intellectual claim about LLM capabilities that turns out to be wrong” would meet my bar.)
What is reasonable here? 2 weeks? 2 months?
Hm, good question. I think it should be proportional to the amount of time it would take to investigate the concern(s).
For this, I think 1-2 weeks seems reasonable, at least for an initial response.
Under peer review, this never would have been seen by the public. It would have incentivized CAIS to actually think about the potential flaws in their work before blasting it to the public.