Vladimir_Nesov comments on Safely and usefully spectating on AIs optimizing over toy worlds

Vladimir_Nesov 15 Aug 2018 21:39 UTC
2 points
I mentioned this construction on Agent Foundations forum last year. (The idea that which worlds an agent cares about is an aspect of preference is folklore by now. This naturally allows not caring about particular worlds, if nothing in the worlds that such an agent cares about depends on those worlds.)
This happens automatically in the more tractable decision theory setups where we don’t let the agent potentially care about everything, no universal priors etc., so maybe also no optimization daemons. It’s a desirable property for the theory, but probably incompatible with following human values.