David Scott Krueger (formerly: capybaralet) comments on AI Alignment Open Thread August 2019

David Scott Krueger (formerly: capybaralet) 7 Aug 2019 7:36 UTC
1 point
A slightly misspecified reward function can lead to anything from perfectly aligned behavior to catastrophic failure. So I think we need much stronger and more formal arguments to believe that catastrophe is almost inevitable than EY’s genie post provides.