Gordon Seidoh Worley comments on Deconfusing Human Values Research Agenda v1

Gordon Seidoh Worley 27 Mar 2020 23:39 UTC
LW: 3 AF: 2
AF
I’d be interested in specific examples of well-intentioned dictators that screwed things up (though I anticipate my objections will be that 1. they weren’t well-intentioned or 2. they didn’t have the power to actually impose decisions centrally, and had to spend most of their power ensuring that they remained in power).
Some examples of actions taken by dictators that I think were well intentioned and meant to further goals that seemed laudable and not about power grabbing to the dictator but had net negative outcomes for the people involved and the world:
- Joseph Stalin’s collectivization of farms
- Tokugawa Iemitsu’s closing off of Japan
- Hugo Chávez’s nationalization of many industries
I know you’re saying that, I just don’t see many arguments for it. From my perspective, you are asserting that Goodhart problems are robust, rather than arguing for it. That’s fine, you can just call it an intuition you have, but to the extent you want to change my mind, restating it in different words is not very likely to work.
I’ve made my case for that here.
Do you really believe that you can predict facts about humans better just by reasoning about evolution (and using no information you’ve learned by looking at humans), relative to building a model by looking at humans (and using no information you’ve learned from the theory of evolution)? I suspect you actually mean some other thing, but idk what.
No, it’s not my goal that we not look at humans. I instead think we’re currently too focused on trying to figure out everything from only looking at the kinds of evidence we can easily collect today, and that we also don’t have detailed enough models to know what other evidence is likely relevant. I think understanding whatever is going on with values is hard because there is data further “down the stack”, if you will, from observations of behavior that is relevant. I think that because I look at issues like latent preferences that by definition exist because we didn’t have enough data to infer their existence but that need not necessarily exist if we gather more data about how those latent preferences are generated such that we could discover them in advance by looking earlier in the process that generates them.
- Rohin Shah 28 Mar 2020 5:53 UTC
  LW: 3 AF: 3
  AF Parent
  Some examples of actions taken by dictators that I think were well intentioned and meant to further goals that seemed laudable and not about power grabbing to the dictator but had net negative outcomes for the people involved and the world:
  What’s your model for why those actions weren’t undone?
  
  To pop back up to the original question—if you think making your friend 10x more intelligent would be net negative, would you make them 10x dumber? Or perhaps it’s only good to make them 2x smarter, but after that more marginal intelligence is bad?
  It would be really shocking if we were at the optimal absolute level of intelligence, so I assume that you think we’re at the optimal relative level of intelligence, that is, the best situation is when your friends are about as intelligent as you are. In that case, let’s suppose that we increase/decrease all of your friends and your intelligence by a factor of X. For what range of X would you expect this intervention is net positive?
  (I’m aware that intelligence is not one-dimensional, but I feel like this is still a mostly meaningful question.)
  Just to be clear about my own position, a well intentioned superintelligent AI system totally could make mistakes. However, it seems pretty unlikely that they’d be of the existentially-catastrophic kind. Also, the mistake could be net negative, but the AI system overall should be net positive.
  - Gordon Seidoh Worley 28 Mar 2020 20:27 UTC
    LW: 3 AF: 2
    AF Parent
    What’s your model for why those actions weren’t undone?
    Not quite sure what you’re asking here. In the first two cases they eventually were undone after people got fed up with the situation, the last is recent enough I don’t consider it’s not having already been undone as evidence people like it, only that they don’t have the power to change it. My view is that these changes stayed in place because the dictators and their successors continued to believe the good out weighted the harm when either this was clearly contrary to the ground truth but served some narrow purpose that was viewed as more important or when the ground truth was too hard to discover at the time and we only believe it was net harmful through the lens of historical analysis.
    To pop back up to the original question—if you think making your friend 10x more intelligent would be net negative, would you make them 10x dumber? Or perhaps it’s only good to make them 2x smarter, but after that more marginal intelligence is bad?
    It would be really shocking if we were at the optimal absolute level of intelligence, so I assume that you think we’re at the optimal relative level of intelligence, that is, the best situation is when your friends are about as intelligent as you are. In that case, let’s suppose that we increase/decrease all of your friends and your intelligence by a factor of X. For what range of X would you expect this intervention is net positive?
    I’m not claiming we’re at some optimal level of intelligence for any particular purpose, only that more intelligence leads to greater agency which, in the absence of sufficient mechanisms to constrain actions to beneficial ones, results in greater risk of negative outcomes due to things like deviance and unilateral action. Thus I do in fact think we’d be safer from ourselves, for example screening off existential risks humanity faces due to outside threats like asteroids, if we were dumber.
    By comparison, chimpanzees may not live what look to us like very happy lives, they are some factor dumber than us, but also they aren’t at risk of making themselves extinct because one chimp really wanted a lot of bananas.
    I’m not sure how much smarter we could all get without putting us at too much risk. I think there’s an anthropic argument to be made that we are below whatever level of intelligence is dangerous to ourselves without greater safeguards because we wouldn’t exist in such universes due to having killed ourselves, but I feel like I have little evidence to make a judgement about how much smarter is safe given, for example, being, say, 95th percentile smart didn’t stop people from building things like atomic weapons or developing dangerous chemical applications. I would expect making my friends smarter to risk similarly bad outcomes. Making them dumber seems safer, especially when I’m in the frame of thinking about AGI.