I found the description of warning fatigue interesting. Do you have takes on the warning fatigue concern?
Warning Fatigue
The playbook for politicians trying to avoid scandals is to release everything piecemeal. You want something like:
Rumor Says Politician Involved In Impropriety. Whatever, this is barely a headline, tell me when we know what he did.
Recent Rumor Revealed To Be About Possible Affair. Well, okay, but it’s still a rumor, there’s no evidence.
New Documents Lend Credence To Affair Rumor. Okay, fine, but we’re not sure those documents are true.
Politician Admits To Affair. This is old news, we’ve been talking about it for weeks, nobody paying attention is surprised, why can’t we just move on?
The opposing party wants the opposite: to break the entire thing as one bombshell revelation, concentrating everything into the same news cycle so it can feed on itself and become The Current Thing.
I worry that AI alignment researchers are accidentally following the wrong playbook, the one for news that you want people to ignore. They’re very gradually proving the alignment case an inch at a time. Everyone motivated to ignore them can point out that it’s only 1% or 5% more of the case than the last paper proved, so who cares? Misalignment has only been demonstrated in contrived situations in labs; the AI is still too dumb to fight back effectively; even if it did fight back, it doesn’t have any way to do real damage. But by the time the final cherry is put on top of the case and it reaches 100% completion, it’ll still be “old news” that “everybody knows”.
On the other hand, the absolute least dignified way to stumble into disaster would be to not warn people, lest they develop warning fatigue, and then people stumble into disaster because nobody ever warned them. Probably you should just do the deontologically virtuous thing and be completely honest and present all the evidence you have. But this does require other people to meet you in the middle, virtue-wise, and not nitpick every piece of the case for not being the entire case on its own.
Scott Alexander wrote a post on our paper (linked) that people might be interested in reading.
I found the description of warning fatigue interesting. Do you have takes on the warning fatigue concern?
I don’t currently have a strong view. I also found it interesting. It updated me a bit toward working on other types of projects.