Thane Ruthenis comments on Goal Alignment Is Robust To the Sharp Left Turn

Thane Ruthenis 1 Dec 2022 18:00 UTC
LW: 1 AF: 1
0
AF
I no longer believe this claim quite as strongly as implied: see here and here. The shard theory has presented a very compelling alternate case of human value formation, and it suggests that even the ultimate compilation of two different modern people’s values would likely yield different unitary utility functions.
I still think there’s a sense in which stone-age!humans and modern humans, if tasked with giving an AI an utility function that’d make all humans happy, would arrive at the same result (if given thousands of years to think). But it might be the same sense in which we and altruistic aliens would arrive at “satisfy the preferences of all sapient beings” or something. (Although I’m not fully sure our definitions of “a sapient being” would be the same as randomly-chosen aliens’, but that’s a whole different line of thoughts.)
- Seb Farquhar 8 Dec 2022 17:01 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Thanks, that makes sense.
  
  I think part of my skepticism about the original claim comes from the fact that I’m not sure that any amount of time for people living in some specific stone-age grouping would come up with the concept of ‘sapient’ without other parts of their environment changing to enable other concepts to get constructed.
  
  There might be a similar point translated into something shard theoryish that’s like ‘The available shards are very context dependent, so persistent human values across very different contexts is implausible.’ SLT in particular probably involves some pretty different contexts.