I’ve posted on the theoretical difficulties of aggregating the utilities of different agents. But doing it in practice is much more feasible (scale the utilities to some not-too-unreasonable scale, add them, maximise sum).
But value extrapolation is different from human value aggregation; for example, low power (or low impact) AIs can be defined with value extrapolation, and that doesn’t need human value aggregation.
I’m skeptical that many of the problems with aggregation don’t both apply to actual individual human values once extrapolated, and generalize to AIs with closely related values, but I’d need to lay out the case for that more clearly. (I did discuss the difficulty of cooperation even given compatible goals a bit in this paper, but it’s nowhere near complete in addressing this issue.)
I’ve posted on the theoretical difficulties of aggregating the utilities of different agents. But doing it in practice is much more feasible (scale the utilities to some not-too-unreasonable scale, add them, maximise sum).
But value extrapolation is different from human value aggregation; for example, low power (or low impact) AIs can be defined with value extrapolation, and that doesn’t need human value aggregation.
I’m skeptical that many of the problems with aggregation don’t both apply to actual individual human values once extrapolated, and generalize to AIs with closely related values, but I’d need to lay out the case for that more clearly. (I did discuss the difficulty of cooperation even given compatible goals a bit in this paper, but it’s nowhere near complete in addressing this issue.)
It’s worth you write up your point and post it—that tends to clarify the issue, for yourself as well as for others.