Shmi comments on Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight

Shmi 18 Nov 2022 4:51 UTC
LW: 2 AF: 1
0
AF
I meant this:
Shard Question: How does the human brain ensure alignment with its values, and how can we use that information to ensure the alignment of an AI with its designers’ values?
which does indeed beg the question in the standard meaning of it.
My point is that there is very much no alignment between different values! They are independent at best and contradictory in many cases. There is an illusion of coherent values that is a rationalization. The difference in values sometimes leads to catastrophic Fantasia-like outcomes on the margins (e.g. people with addiction don’t want to be on drugs but are), but most of the time it results in a mild akrasia (I am writing this instead of doing something that makes me money). This seems like a good analogy: http://max.mmlc.northwestern.edu/mdenner/Demo/texts/swan_pike_crawfish.htm
- Jacy Reese Anthis 18 Nov 2022 11:46 UTC
  5 points
  13
  Parent
  Hm, the begging question meaning is probably just a verbal dispute, but I don’t think asking questions can in general beg questions because they don’t have conclusions. There is no “assuming its conclusion is true” if there is no conclusion. Not a big deal though!
  I wouldn’t say values are independent (i.e., orthogonal) at best; they are often highly correlated, such as values of “have enjoyable experiences” and “satisfy hunger” both leading to eating tasty meals. I agree they are often contradictory, and this is one valid model of catastrophic addiction or mild problems. I think any rigorous theory of “values” (shard theory or otherwise) will need to make sense of those phenomena, but I don’t see that as an issue for the claim “ensure alignment with its values” because I don’t think alignment requires complete satisfaction of every value, which is almost always impossible.
- TurnTrout 15 Dec 2022 3:44 UTC
  LW: 2 AF: 2
  1
  AF Parent
  Hm. I think you can dissolve the perceived question-begging by replacing “values” with its substance:
  How does the genome, in the presence of e.g. modern Western culture, reliably form decision-influences which push the person to e.g. take actions which increase the welfare of their family and friends? (i.e. where do friendship-shards come from?)
  We’re then asking a relatively well-defined question with a guaranteed-to-exist answer.