Wei Dai comments on Impact measurement and value-neutrality verification

Wei Dai 18 Oct 2019 18:47 UTC
LW: 13 AF: 7
AF
Some clarifications I got from Evan (evhub) on MIRIxDiscord:
1. AI not being value-neutrality is one way that the strategy-stealing assumption might be false, and therefore one thing we can work on if we want to make the strategy-stealing assumption true (or true to the extent possible).
2. It’s not clear if “AI not being value-neutrality” falls into one of Paul’s 11 failure scenarios for strategy-stealing. The closest seems to be failure #1 “AI alignment” but one could also argue that an AI can be aligned but still not value-neutral.
3. The “neutrality” measure formally defined in this post is meant to be a starting point for people to work on, and not necessarily close to the final solution.
4. “Strategy-stealing” was originally defined in terms of maintaining a baseline resource distribution, but it’s not clear if that’s the right concept, and in this post Evan “moved somewhat towards maintaining a value distribution.”
I think Evan has or will incorporate some of these clarifications into the post itself, but this may still be helpful for people who read the original post.