I anticipate most of the academia-based AI safety research to be:
Safety but not alignment.
Near-term but not extinction-preventing.
Flawed in a way that a LessWrong Sequences reader would quickly notice, but random academics (even at top places) might not.
Accidentally speeding up capabilities.
Heavy on tractability but light on neglectedness and importance.
...buuuuuut 0.05% of it could turn out to be absolutely critical for “hardcore” AI alignment anyway. This is simply due to the sheer size of academia giving it a higher absolute number of [people who will come up with good ideas] than the EA/rationalist/LessWrong-sphere.
Four recommendations I have (in addition to posting on Arxiv!):
It’d be interesting to have a prediction market / measurement on this sort of thing, somehow.
If something’s already pretty likely to get a grant within academia, EA should probably not fund it, modulo whatever considerations might override that on a case-by-case basis (e.g. promoting an EA grantor via proximity to a promising mainstream project… As you can imagine, I don’t think such considerations move the needle in the overwhelming majority of cases.)
Related to (2): Explicitly look to fund the “uncool”-by-mainstream-standards projects. This is partly due to neglectedness, and partly due to “worlds where iterative design fails” AI-risk-process logic.
The community should investigate and/or setup better infrastructure to filter for the small number of crucial papers noted in my prediction (6).
I anticipate most of the academia-based AI safety research to be:
Safety but not alignment.
Near-term but not extinction-preventing.
Flawed in a way that a LessWrong Sequences reader would quickly notice, but random academics (even at top places) might not.
Accidentally speeding up capabilities.
Heavy on tractability but light on neglectedness and importance.
...buuuuuut 0.05% of it could turn out to be absolutely critical for “hardcore” AI alignment anyway. This is simply due to the sheer size of academia giving it a higher absolute number of [people who will come up with good ideas] than the EA/rationalist/LessWrong-sphere.
Four recommendations I have (in addition to posting on Arxiv!):
It’d be interesting to have a prediction market / measurement on this sort of thing, somehow.
If something’s already pretty likely to get a grant within academia, EA should probably not fund it, modulo whatever considerations might override that on a case-by-case basis (e.g. promoting an EA grantor via proximity to a promising mainstream project… As you can imagine, I don’t think such considerations move the needle in the overwhelming majority of cases.)
Related to (2): Explicitly look to fund the “uncool”-by-mainstream-standards projects. This is partly due to neglectedness, and partly due to “worlds where iterative design fails” AI-risk-process logic.
The community should investigate and/or setup better infrastructure to filter for the small number of crucial papers noted in my prediction (6).