Stuart_Armstrong comments on How much can value learning be disentangled?

Stuart_Armstrong 31 Jan 2019 9:16 UTC
5 points
Humans have beliefs and values twisted together in all kinds of odd ways. In practice, increasing our understanding tends to go along with having a more individualist outlook, a greater power to impact the natural world, less concern about difficult-to-measure issues, and less respect for traditional practices and group identities (and often the creation of new group identities, and sometimes new traditions).

Now, I find those changes to be (generally) positive, and I’d like them to be more common. But these are value changes, and I understand why people with different values could object to them.
- John_Maxwell 31 Jan 2019 10:13 UTC
  2 points
  Parent
  Your original argument, as I understood it, was something like: Explanation aims for a particular set of mental states in the student, which is also what manipulation does, so therefore explanation can’t be defined in a way that distinguishes it from manipulation. I pushed back on that. Now you’re saying that explanation tends to produce side effects in the listener’s values. Does this mean you’re allowing the possibility that explanation can be usefully defined in a way that distinguishes it from manipulation?
  
  BTW, computer security researchers distinguish between “reject by default” (whitelisting) and “accept by default” (blacklisting). “Reject by default” is typically more secure. I’m more optimistic about trying to specify what it means to explain something (whitelisting) than what it means to manipulate someone in a way that’s improper (blacklisting). So maybe we’re shooting at different targets.
  
  Tying all of this back to FAI… you say you find the value changes that come with greater understanding to be (generally) positive and you’d like them to be more common. I’m worried about the possibility that AGI will be a global catastrophic risk. I think there are good arguments that by default, AGI will be something which is not positive. Maybe from a triage point of view, it makes sense to focus on minimizing the probability that AGI is a global catastrophic risk, and worry about the prevention of things that we think are likely to be positive once we’re pretty sure the global catastrophic risk aspect of things has been solved?
  
  In Eliezer’s CEV paper, he writes:
  
  In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that inter- preted.
  
  I haven’t seen anyone on Less Wrong argue against CEV as a vision for how the future of humanity should be determined. And CEV seems to involve having the future be controlled by humans who are more knowledgable than current humans in some sense. But maybe you’re a CEV skeptic?
  - Stuart_Armstrong 1 Feb 2019 13:51 UTC
    2 points
    Parent
    
    I haven’t seen anyone on Less Wrong argue against CEV as a vision for how the future of humanity should be determined.
    
    Well, now you’ve seen one ^_^ : https://www.lesswrong.com/posts/vgFvnr7FefZ3s3tHp/mahatma-armstrong-ceved-to-death
    
    I’ve been going on about the problems with CEV (specifically with extrapolation) for years. This post could also be considered a CEV critique: https://www.lesswrong.com/posts/WeAt5TeS8aYc4Cpms/values-determined-by-stopping-properties
  - Stuart_Armstrong 1 Feb 2019 13:45 UTC
    2 points
    Parent
    
    possibility that explanation can be usefully defined in a way that distinguishes it from manipulation?
    
    I think explanation can be defined (see https://agentfoundations.org/item?id=1249 ). I’m not confident “explanation with no manipulation” can be defined.