Charlie Steiner comments on Which values are stable under ontology shifts?

Charlie Steiner 23 Jul 2022 2:59 UTC
3 points
0
Not everything is a type of bet.
You say
And if we understand reward prediction error in terms of updates to our policy, then deliberately invoking happiness would be in tension with acting effectively in the world.
And I think “acting effectively to do what?”
I think there’s a completely implicit answer you give in this post: agents will give up everything good so that they can be more effective replicators in a grim Malthusian future.
Which… sure. This is why we should avoid a grim Malthusian future.
- Richard_Ngo 23 Jul 2022 4:41 UTC
  2 points
  0
  Parent
  Edited to clarify that this isn’t what I’m saying. Added:
  If there’s simply a tradeoff between them, we might still want to sacrifice accurate beliefs and effective action for happiness. But what I’m gesturing towards is the idea that happiness might not actually be a concept which makes much sense given a complete understanding of minds—as implied by the buddhist view of happiness as an illusion, for example.
  - Charlie Steiner 23 Jul 2022 12:53 UTC
    2 points
    0
    Parent
    Alright.
    
    But it’s not like happiness is the Tooth Fairy. It’s an honest ingredient in useful models I have of human beings, at some level of abstraction. If you think future decision-makers might decide happiness “isn’t real,” what I hear is that they’re ditching those models of humans that include happiness, and deciding to use models that never mention it or use it.
    
    And I would consider this a fairly straightforward failure to align with my meta-preferences (my preferences about what should count as my preferences). I don’t want to be modeled in ways that don’t include happiness, and this is a crucial part of talking about my preferences, because there is no such thing as human preferences divorced from any model of the world used to represent them.
    
    I agree that some people are imagining that we’ll end up representing human values in some implicitly-determined, or “objective” model of the world. And such a process might end up not thinking happiness is a good model ingredient. To me, this sounds like a solid argument to not do that.