tailcalled answers Are there any impossibility theorems for strong and safe AI?

tailcalled 11 Mar 2022 7:16 UTC
3 points
I think this sort of counts?

https://www.lesswrong.com/posts/WCX3EwnWAx7eyucqH/corrigibility-can-be-vnm-incoherent

I’ve also sort of derived some informal arguments myself in the same vein, though I haven’t published them anywhere.

Basically, approximately all of the focus is on creating/aligning a consequentialist utility maximizer, but consequentialist utility maximizers don’t like being corrected, will tend to want to change your preferences, etc, which all seems bad for alignment.