To you, is the golden rule about values (utility function, input to decision theory) or about policy (output of decision theory)? Reading the linked post, it sounds like it’s the former, but if you value aliens directly in your utility function, and then you also use UDT, are you not worried about double counting the relevant intuitions, and ending up being too “nice”, or being too certain that you should be “nice”?
Overall, I think the question “which AIs are good successors?” is both neglected and time-sensitive, and is my best guess for the highest impact question in moral philosophy right now.
It seems we can divide good successors into “directly good successors” (what the AI ends up doing in our future light cone is good according to our own values) and “indirectly good successors” (handing control to the unaligned AI “causes” aligned AI to be given control of some remote part of the universe/multiverse). Does this make sense and if so do you have an intuition of which one is more fruitful to investigate?
To you, is the golden rule about values (utility function, input to decision theory) or about policy (output of decision theory)? Reading the linked post, it sounds like it’s the former, but if you value aliens directly in your utility function, and then you also use UDT, are you not worried about double counting the relevant intuitions, and ending up being too “nice”, or being too certain that you should be “nice”?
The golden rule is an intuition prior to detailed judgments about decision theory / metaethics / values, and also prior to the separation between them (which seems super murky).
In my view, learning that decision theory captured this intuition would partly “screen off” the evidence it provides about values (and conversely). There might also be some forms of double counting I’d endorse.
Does this make sense and if so do you have an intuition of which one is more fruitful to investigate?
That division makes sense. I’m weakly inclined to care more about indirectly good successors, but the distinction is muddled by (a) complex decision-theoretic issues (e.g. how far back to behind the veil of ignorance do you go? do you call that being updateless about values, or about having your values but exerting acausal control on agents with other values?), that may end up being only semantic distinctions, (b) ethical intuitions that might actually be captured by decision theory rather than values.
I’m curious, how does this work out for fellow animals?
ie If you value (human and non-human) animals directly in your utility function, and then you also use UDT, are you not worried about double counting the relevant intuitions, and ending up being too “nice”, or being too certain that you should be “nice”?
Perhaps it is arguable that that is precisely what’s going on when we end up caring more for our friends and family?
To you, is the golden rule about values (utility function, input to decision theory) or about policy (output of decision theory)? Reading the linked post, it sounds like it’s the former, but if you value aliens directly in your utility function, and then you also use UDT, are you not worried about double counting the relevant intuitions, and ending up being too “nice”, or being too certain that you should be “nice”?
It seems we can divide good successors into “directly good successors” (what the AI ends up doing in our future light cone is good according to our own values) and “indirectly good successors” (handing control to the unaligned AI “causes” aligned AI to be given control of some remote part of the universe/multiverse). Does this make sense and if so do you have an intuition of which one is more fruitful to investigate?
The golden rule is an intuition prior to detailed judgments about decision theory / metaethics / values, and also prior to the separation between them (which seems super murky).
In my view, learning that decision theory captured this intuition would partly “screen off” the evidence it provides about values (and conversely). There might also be some forms of double counting I’d endorse.
That division makes sense. I’m weakly inclined to care more about indirectly good successors, but the distinction is muddled by (a) complex decision-theoretic issues (e.g. how far back to behind the veil of ignorance do you go? do you call that being updateless about values, or about having your values but exerting acausal control on agents with other values?), that may end up being only semantic distinctions, (b) ethical intuitions that might actually be captured by decision theory rather than values.
I’m curious, how does this work out for fellow animals?
ie If you value (human and non-human) animals directly in your utility function, and then you also use UDT, are you not worried about double counting the relevant intuitions, and ending up being too “nice”, or being too certain that you should be “nice”?
Perhaps it is arguable that that is precisely what’s going on when we end up caring more for our friends and family?