Some clarifications I got from Evan (evhub) on MIRIxDiscord:
AI not being value-neutrality is one way that the strategy-stealing assumption might be false, and therefore one thing we can work on if we want to make the strategy-stealing assumption true (or true to the extent possible).
It’s not clear if “AI not being value-neutrality” falls into one of Paul’s 11 failure scenarios for strategy-stealing. The closest seems to be failure #1 “AI alignment” but one could also argue that an AI can be aligned but still not value-neutral.
The “neutrality” measure formally defined in this post is meant to be a starting point for people to work on, and not necessarily close to the final solution.
“Strategy-stealing” was originally defined in terms of maintaining a baseline resource distribution, but it’s not clear if that’s the right concept, and in this post Evan “moved somewhat towards maintaining a value distribution.”
I think Evan has or will incorporate some of these clarifications into the post itself, but this may still be helpful for people who read the original post.
Some clarifications I got from Evan (evhub) on MIRIxDiscord:
AI not being value-neutrality is one way that the strategy-stealing assumption might be false, and therefore one thing we can work on if we want to make the strategy-stealing assumption true (or true to the extent possible).
It’s not clear if “AI not being value-neutrality” falls into one of Paul’s 11 failure scenarios for strategy-stealing. The closest seems to be failure #1 “AI alignment” but one could also argue that an AI can be aligned but still not value-neutral.
The “neutrality” measure formally defined in this post is meant to be a starting point for people to work on, and not necessarily close to the final solution.
“Strategy-stealing” was originally defined in terms of maintaining a baseline resource distribution, but it’s not clear if that’s the right concept, and in this post Evan “moved somewhat towards maintaining a value distribution.”
I think Evan has or will incorporate some of these clarifications into the post itself, but this may still be helpful for people who read the original post.