but there’s a great difference to an AI that would be friendly on its own, and one that would choose only friendly compromises with a powerful other AI with human-relevant preferences.
An AI must compromise with the universe and only implement something physically possible. If it’s going to make sure this compromise is friendly, why wouldn’t it make a friendly compromise with an FFAI?
why wouldn’t it make a friendly compromise with an FFAI?
Because that’s an extra constraint: universe AND FFAI. The class of AIs that would be FAI with the universe, is larger than the class that would be FAI with the universe and an FFAI to deal with.
To pick a somewhat crude example, imagine an AI that maximises the soft-minimum of two quantities: human happiness and human preferences. It turns out each quantity is roughly equivalent in difficulty of satisfying (ie not too many order of magnitudes between them), so this is a FAI in our universe.
However, add a FFAI that hates human preferences and loves human happiness. Then the compromise might be on a very high happiness, which the previous FAI can live with (it was only a soft minimum, not a hard minimum).
Or maybe this is a better way of formulating things: there are FAIs, and AIs which act as FAIs given the expected conditions of the universe. It’s the second category that might be very problematic in negotiations.
An AI must compromise with the universe and only implement something physically possible. If it’s going to make sure this compromise is friendly, why wouldn’t it make a friendly compromise with an FFAI?
Because that’s an extra constraint: universe AND FFAI. The class of AIs that would be FAI with the universe, is larger than the class that would be FAI with the universe and an FFAI to deal with.
To pick a somewhat crude example, imagine an AI that maximises the soft-minimum of two quantities: human happiness and human preferences. It turns out each quantity is roughly equivalent in difficulty of satisfying (ie not too many order of magnitudes between them), so this is a FAI in our universe.
However, add a FFAI that hates human preferences and loves human happiness. Then the compromise might be on a very high happiness, which the previous FAI can live with (it was only a soft minimum, not a hard minimum).
Or maybe this is a better way of formulating things: there are FAIs, and AIs which act as FAIs given the expected conditions of the universe. It’s the second category that might be very problematic in negotiations.