So this is 1) a single utility function, not a utility function for each human, and 2) being an aggregate of everything humanity wants, it naturally includes information about what each human wants.
Okay, but it still aggregates a utility function-like thing for each human. I don’t care what you call it.
I want to know what assumptions you’re making.
For the case of aggregating two people’s preferences, only that 1) Both people and the aggregation are VNM utility agents, 2) Whenever both people prefer A to B, the aggregation prefers A to B, and 3) the previous assumption is non-vacuous. Given those, then the aggregation must maximize a weighted sum of their utility functions. For the many-person case, I was using analogous assumptions, but I think there might be a flaw in my induction, so I’ll get back to you when I have a proof that actually works.
I don’t see how a weighted sum captures a friendly AI that has preferences about the utility functions that humans use.
We currently have preferences about the utility functions that future humans use. So any linear aggregation of our current utility functions will also have preferences about the utility functions that future humans use.
Okay, but it still aggregates a utility function-like thing for each human. I don’t care what you call it.
For the case of aggregating two people’s preferences, only that 1) Both people and the aggregation are VNM utility agents, 2) Whenever both people prefer A to B, the aggregation prefers A to B, and 3) the previous assumption is non-vacuous. Given those, then the aggregation must maximize a weighted sum of their utility functions. For the many-person case, I was using analogous assumptions, but I think there might be a flaw in my induction, so I’ll get back to you when I have a proof that actually works.
Edit: http://www.stanford.edu/~hammond/HarsanyiFest.pdf
We currently have preferences about the utility functions that future humans use. So any linear aggregation of our current utility functions will also have preferences about the utility functions that future humans use.