Davidmanheim comments on What is ambitious value learning?

Davidmanheim 8 Nov 2018 20:18 UTC
1 point
Sorry, I needed to clarify my thinking and my claim a lot further. This is in addition to the (what I assumed was obvious) claim that correct Bayesian thinkers should be able to converge on beliefs despite potentially having different values. I’m speculating that if terminal values are initially drawn from a known distribution, AND “if you think that a different set of life experiences means that you are a different person with different values,” but that values change based on experiences in ways that are understandable, then rational humans will act in a coherent way so that we should expect to be able to learn human values and their distribution, despite the existence of shifts.
Conditional on those speculative thoughts, I disagree with your conclusion that “that’s a really good reason to assume that the whole framework of getting the true human utility function is doomed.” Instead, I think we should be able to infer the distribution of values that humans actually have—even if they individually change over time from experiences.
- Rohin Shah 9 Nov 2018 0:46 UTC
  1 point
  Parent
  But what do you optimize then?
  - Davidmanheim 9 Nov 2018 11:23 UTC
    2 points
    Parent
    That’s an important question, bu it’s also fundamentally hard, since it’s almost certainly true that human values are inconsistent—if not individually, than at an aggregate level. (You can’t reconcile opposite preferences, or maximize each person’s share of a finite resource.)
    The best answer I have seen is Eric Drexler’s discussion of Pareto-topia, where he suggests that we can make huge progress and gain of utility according to all value-systems held by humans, despite the fact that they are inconsistent.
    - Rohin Shah 10 Nov 2018 18:00 UTC
      6 points
      0
      Parent
      That seems right. Though if you accept that human values are inconsistent and you won’t be able to optimize them directly, I still think “that’s a really good reason to assume that the whole framework of getting the true human utility function is doomed.”
      By “true human utility function” I really do mean a single function that when perfectly maximized leads to the optimal outcome.
      I think “human values are inconsistent” and “people with different experiences will have different values” and “there are distributional shifts which cause humans to be different than they would otherwise have been” are all different ways of pointing at the same problem.