David Duvenaud comments on The Most Forbidden Technique

David Duvenaud Mar 17, 2025, 3:26 PM
8 points
4
As someone who writes these kinds of papers, I try to make an effort to cite the original inspirations when possible. And although I agree with Robin’s theory broadly, there are also some mechanical reasons why Yudkowsky in particular is hard to cite.

The most valuable things about the academic paper style as a reader are:
1) Having a clear, short summary (the abstract)
2) Stating the claimed contributions explicitly
3) Using standard jargon, or if not, noting so explicitly
4) A related work section that contrasts one’s own position against others′
5) Being explicit about what evidence you’re marshalling and where it comes from.
6) Listing main claims explicitly.
7) The best papers include a “limitations” or “why I might be wrong” section.

Yudkowsky mostly doesn’t do these things. That doesn’t mean he doesn’t deserve credit for making a clear and accessible case for many foundational aspects of AI safety. It’s just that in any particular context, it’s hard to say what, exactly, his claims or contributions were.

In this setting, maybe the most appropriate citation would be something like “as illustrated in many thought experiments by yudkowsky [cite particular sections of the sequences and hpmor], it’s dangerous to rely on any protocol for detecting scheming by agents more intelligent than oneself”. But that’s a pretty broad claim. Maybe I’m being unfair—but it’s not clear to me what exactly yudkowsky’s work says about the workability of these schemes other than “there be dragons here”.