But it seems increasingly plausible that AIs will not have explicit utility functions, so that doesn’t seem much better than saying humans could merge their utility functions.
There are a couple of ways to extend the argument:
Having an utility function (or some other stable explicit representation of values) is a likely eventual outcome of recursive self-improvement, since it makes you less vulnerable to value drift and manipulation, and makes coordination easier.
Even without utility functions, AIs can try to merge, i.e., negotiate and jointly build successors with values that represent a compromise of their individual values. It seems likely this will be much easier, less costly, more scalable, and more effective for them than the analogous thing is for humans (to the extent that there is an analogy, perhaps having and raising kids together).
I think AIs with simpler values (e.g., paperclip maximizers) have an advantage with both 1 and 2, which seems like bad news for AI risk.
There are a couple of ways to extend the argument:
Having an utility function (or some other stable explicit representation of values) is a likely eventual outcome of recursive self-improvement, since it makes you less vulnerable to value drift and manipulation, and makes coordination easier.
Even without utility functions, AIs can try to merge, i.e., negotiate and jointly build successors with values that represent a compromise of their individual values. It seems likely this will be much easier, less costly, more scalable, and more effective for them than the analogous thing is for humans (to the extent that there is an analogy, perhaps having and raising kids together).
I think AIs with simpler values (e.g., paperclip maximizers) have an advantage with both 1 and 2, which seems like bad news for AI risk.