Karl, thanks for the very good summary and interesting analysis. There is one factual error, though, that I would appreciate you fix: my 10 to 50% estimate (2nd row in the table) was not for x-risk but for superhuman AI. FYI it was obtained by polling through hand-raising a group of RL researcher in a workshop (most having no or very little exposure to AI safety). Another (mild) error is that although I have been a reader of (a few) AI safety papers for about a decade, it is only recently that I started writing about it.
Yoshua Bengio
Karma: 83
I am not convinced at all that this is true. Consider an AI whose training objective simply makes it want to model how the world works as well as possible, like a pure scientist which is not trying to acquire more knowledge via experiments but only reasons and explores explanatory hypotheses to build a distribution over theories of the observed data. It is agency and utilities or rewards that induce a preference over certain states of the world.