David Scott Krueger (formerly: capybaralet) comments on Disambiguating “alignment” and related notions

David Scott Krueger (formerly: capybaralet) 10 Jun 2018 14:26 UTC
2 points
It seems like you are referring to daemons.
To the extent that daemons result from an AI actually doing a good job of optimizing the right reward function, I think we should just accept that as the best possible outcome.
To the extent that daemons result from an AI doing a bad job of optimizing the right reward function, that can be viewed as a problem with capabilities, not alignment. That doesn’t mean we should ignore such problems; it’s just out of scope.
Indeed, most people at MIRI seem to think that most of the difficulty of alignment is getting from “has X as explicit terminal goal” to “is actually trying to achieve X.”
That seems like the wrong way of phrasing it to me. I would put it like “MIRI wants to figure out how to build properly ‘consequentialist’ agents, a capability they view us as currently lacking”.