Wei Dai comments on Very different, very adequate outcomes

Wei Dai 3 Aug 2019 15:12 UTC
LW: 2 AF: 1
AF
This seems way too handwavy. If q being close enough 0 will cause a disaster, why isn’t 5% close enough to 0? How much do you expect switching from q=1 to q=5% to reduce $U_{p}$ ? Why?

If moving from q=1 to q=5% reduces $U_{p}$ by a factor of 2, for example, and it turns out that $U_{p}$ is the correct utility function, that would be equivalent to incurring a 50% x-risk. Do you think that should be considered “ok” or “adequate”, or have some reason to think that $U_{p}$ wouldn’t be reduced nearly this much?
- Stuart_Armstrong 4 Aug 2019 19:23 UTC
  LW: 2 AF: 1
  AF Parent
  I’m finding these “is the correct utility function” hard to parse. Humans have a bit of $U_{p}$ and a bit of $U_{h}$ . But we are underdefined systems; there is no specific value of $q$ that is “true”. We can only assess the quality of $q$ using other aspects of human underdefined preferences.
  
  This seems way too handwavy.
  
  It is. Here’s an attempt at a more formal definition: humans have collections of underdefined and somewhat contradictory preferences (using preferences in a more general sense than preference utilitarianism). These preferences seem to be stronger in the negative sense than in the positive: humans seem to find the loss of a preference much worse than the gain. And the negative is much more salient, and often much more clearly defined, than that positive.
  
  Given that maximising one preference tends to put the values of others at extreme values, human overall preferences seem better captured by a weighted mix of preferences (or a smooth min of preferences) than by any single preference, or small set of preferences. So it is not a good idea to be too close to the extremes (extremes being places where some preferences have $0 %$ weight put on them).
  
  Now there may be some sense in which these extreme preferences are “correct”, according to some formal system. But this formal system must reject the actual preferences of humans today; so I don’t see why these preferences should be followed at all, even if they are correct.
  
  Ok, so the extremes are out; how about being very close to the extremes? Here is where it gets wishywashy. We don’t have a full theory of human preferences. But, according to the picture I’ve sketched above, the important thing is that each preference gets some positive traction in our future. So, yes $1 %$ to $5 %$ might no mean much (and smooth min might be better anyway). But I believe I could say:
  - There are many weighted combinations of human preferences that are compatible with the picture I’ve sketched here. Very different outcomes, from the numerical perspective of the different preferences, but all falling within an “acceptability” range.
  Still a bit too handwavy. I’ll try and improve it again.