I think overall this is a well-written blogpost. His previous blogpost already indicated that he took the arguments seriously, so this is not too much of a surprise. That previous blogpost was discussed and partially criticized on Lesswrong. As for the current blogpost, I also find it noteworthy that active LW user David Scott Krueger is in the acknowledgements.
This blogpost might even be a good introduction for AI xrisk for some people.
I hope he engages further with the issues. For example, I feel like inner misalignment is still sort of missing from the arguments.
It seems like inner misalignment is a subset of “we don’t know how to make aligned AI”. Maybe he could’ve fit that in neatly, but adding more is at odds with the function as an intro to AI risk.
I think overall this is a well-written blogpost. His previous blogpost already indicated that he took the arguments seriously, so this is not too much of a surprise. That previous blogpost was discussed and partially criticized on Lesswrong. As for the current blogpost, I also find it noteworthy that active LW user David Scott Krueger is in the acknowledgements.
This blogpost might even be a good introduction for AI xrisk for some people.
I hope he engages further with the issues. For example, I feel like inner misalignment is still sort of missing from the arguments.
Yoshua Bengio was on David Krueger’s PhD thesis committee, according to David’s CV.
David had many conversations with Bengio about alignment during his PhD, and gets a lot of credit for Bengio taking AI risk seriously
Bengio was his Master’s thesis advisor too.
It seems like inner misalignment is a subset of “we don’t know how to make aligned AI”. Maybe he could’ve fit that in neatly, but adding more is at odds with the function as an intro to AI risk.