However, people seem to have strong objections, and Danny Halawi (Anthropic) says that the results don’t seem to be correct (both that the results don’t seem to generalize to sufficiently recent questions, and that there are many issues with the paper):
That’s very interesting, thanks!
However, people seem to have strong objections, and Danny Halawi (Anthropic) says that the results don’t seem to be correct (both that the results don’t seem to generalize to sufficiently recent questions, and that there are many issues with the paper):
https://x.com/dannyhalawi15/status/1833295067764953397 twitter thread
It would be nice to have a follow-up here at some point, addressing this controversy...