Qiaochu_Yuan comments on Interpersonal and intrapersonal utility comparisons

Qiaochu_Yuan 4 Jan 2013 10:16 UTC
2 points

Edit: Apparently the contents of your comment changed drastically as I was drafting this response. But it looks like this still mostly makes sense as a response.

My bad.

Actually, your personal preferences are your CEV, not some function that also takes into account other people’s CEVs.

I don’t think this is how Eliezer is using the term. From the wiki:

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge.

So this is 1) a single utility function, not a utility function for each human, and 2) being an aggregate of everything humanity wants, it naturally includes information about what each human wants.

this risks situations where the aggregation makes a choice that everyone whose preferences got aggregated disagrees with. A weighted sum is the only way to aggregate utility functions that consistently avoids this. I’ve sketched out a proof of this, but I’m getting tired, so I’ll write it up tomorrow.

I would be very interested to see this proof! In particular, I want to know what assumptions you’re making. As I mentioned way up in the parent comment, I don’t see how a weighted sum captures a friendly AI that has preferences about the utility functions that humans use.
- AlexMennen 4 Jan 2013 21:03 UTC
  1 point
  Parent
  
  So this is 1) a single utility function, not a utility function for each human, and 2) being an aggregate of everything humanity wants, it naturally includes information about what each human wants.
  
  Okay, but it still aggregates a utility function-like thing for each human. I don’t care what you call it.
  
  I want to know what assumptions you’re making.
  
  For the case of aggregating two people’s preferences, only that 1) Both people and the aggregation are VNM utility agents, 2) Whenever both people prefer A to B, the aggregation prefers A to B, and 3) the previous assumption is non-vacuous. Given those, then the aggregation must maximize a weighted sum of their utility functions. For the many-person case, I was using analogous assumptions, but I think there might be a flaw in my induction, so I’ll get back to you when I have a proof that actually works.
  
  Edit: http://www.stanford.edu/~hammond/HarsanyiFest.pdf
  
  I don’t see how a weighted sum captures a friendly AI that has preferences about the utility functions that humans use.
  
  We currently have preferences about the utility functions that future humans use. So any linear aggregation of our current utility functions will also have preferences about the utility functions that future humans use.