Stuart_Armstrong comments on Synthesising divergent preferences: an example in population ethics

Stuart_Armstrong 22 Jan 2019 11:48 UTC
2 points

But they lead to very different normalization outcomes, don’t they?

Apologies, I was wrong in my answer. The normalisation is “human-realistic”, in that the agent is estimating “the best they themselves could do” vs “the worse they themselves could do”.

Since this means we’ll almost certainly regret doing this, it strongly suggests that something is wrong with the idea.

This is an inevitable feature of any normalisation process that depends on the difference in future expected values. Suppose $u$ is a utility that can be $1$ or $0$ within the next day; after that, any action or observation will only increase or decrease $u$ by at most $10^{- 10}$ . The utility $v$ , in contrast, is $0$ unless the human does the same action every day for ten years, when it will become $1$ . The normalisation of $u$ and $v$ will be very different depending on whether you normalise now or in two days time.

You might say that there’s an idealised time in the past where we should normalise it (a sort of veil of ignorance), but that just involves picking a time, or a counterfactual time.

Lastly, “regret” doesn’t quite mean the same thing as usual, since this is regret between weights of preference which we hold.

Now, there is another, maybe more natural way of normalising things: cache out the utilities as examples, and see how intense our approval/disapproval of these examples is. But that approach doesn’t allow us to overcome, eg, scope insensitivity.

if we eventually just figure out what our true/normative values are.

I am entirely convinced that there are no such things. There are maps from {lists of assumptions + human behaviour + elements of the human internal process} to sets of values, but different assumptions will give different values, and we have no principled way to distinguish between them, except for using our own contradictory and underdefined meta-preferences.
- Wei Dai 23 Jan 2019 4:15 UTC
  2 points
  Parent
  
  The normalisation is “human-realistic”, in that the agent is estimating “the best they themselves could do” vs “the worse they themselves could do”.
  
  But this means the normalization depends on how capable the human is, which seems strange, especially in the context of AI. In other words, it doesn’t make sense that an AI would obtain different values from two otherwise identical humans who differ only in how capable they are.
  
  I am entirely convinced that there are no such things.
  
  In a previous post, you didn’t seem this certain about moral anti-realism:
  
  Even if the moral realists are right, and there is a true R, thinking about it is still misleading. Because there is, as yet, no satisfactory definition of this true R, and it’s very hard to make something converge better onto something you haven’t defined. Shifting the focus from the unknown (and maybe unknowable, or maybe even non-existent) R, to the actual P, is important.
  
  Did you move further in the anti-realist direction since then? If so, why?
  
  There are maps from {lists of assumptions + human behaviour + elements of the human internal process} to sets of values, but different assumptions will give different values, and we have no principled way to distinguish between them, except for using our own contradictory and underdefined meta-preferences.
  
  I agree this is the situation today, but I don’t see how we can be so sure that it won’t get better in the future. Philosophical progress is a thing, right?
  - Stuart_Armstrong 23 Jan 2019 12:30 UTC
    2 points
    Parent
    
    But this means the normalization depends on how capable the human is, which seems strange, especially in the context of AI.
    
    The min-max normalisation is supposed to measure how much a particular utility function “values” the human moving from being a u-antagonist to a u-maximiser. The full impact of that change is included; so if the human is about to program an AI, the effect is huge. You might see it as the AI asking “utility u—maximise, yes or no?”, and the spread between “yes” and “no” is normalised.
    
    Did you move further in the anti-realist direction since then? If so, why?
    
    How I describe my position can vary a lot. Essentially I think that there might be a partial order among sets of moral axioms, in that it seems plausible to me that you could say that set A is almost-objectively better than set B (more rigorously: according to criteria c, A>B, and criteria c seems a very strong candidate for an “objectively true” axiom; something comparable to the basic properties of equality https://en.wikipedia.org/wiki/Equality_(mathematics)#Basic_properties ).
    
    But it seems clear there is not going to be a total order, nor a maximum element.
    
    I agree this is the situation today, but I don’t see how we can be so sure that it won’t get better in the future. Philosophical progress is a thing, right?
    
    Progress in philosophy involves uncovering true things, not making things easier; mathematics is a close analogue. For example, computational logic would have been a lot simpler if in fact there existed an algorithm that figured out if a given Turing machine would halt. The fact that Turing’s result made everything more complicated didn’t mean that it was wrong.
    
    Similarly, the only reason to expect that philosophy would discover moral realism to be true, is if we currently had strong reasons to suppose that moral realism is true.