paulfchristiano comments on The strategy-stealing assumption

paulfchristiano 24 Sep 2019 9:25 UTC
LW: 4 AF: 2
AF
This seems too glib, if “long-term preferences” are in some sense the “right” preferences, e.g., if under reflective equilibrium we would wish that we currently put a lot more weight on long-term preferences. Even if we only give unaligned AIs a one-time advantage (which I’m not sure about LW), that could still cause us to lose much of the potential value of the universe.
To be clear, I am worried about people not understanding or caring about the long-term future, and AI giving them new opportunities to mess it up.
I’m particularly concerned about things like people giving their resources to some unaligned AI that seemed like a good idea at the time, rather than simply opting out of competition so that unaligned AIs might represent a larger share of future-influencers. This is another failure of strategy-stealing that probably belongs in the post—even if we understand alignment, there may be plenty of people not trying to solve alignment and instead doing something else, and the values generated by that “something else” will get a natural boost.