cfoster0 comments on Shard Theory: An Overview

cfoster0 11 Aug 2022 16:01 UTC
5 points
0
I disagree with how this post uses the word “values” throughout, rather than “desires” (or “preferences”) which (AFAICT) would be a better match to how the term is being used here.

This has definitely been a point of confusion. There are a couple of ways one might reasonably interpret the phrase “human values”:
- the common denominator between all humans ever about what they care about
- the ethical consensus of (some subset of) humanity
- the injunctions a particular human would verbally endorse
- the cognitive artifacts inside each particular human that implement that human’s valuing-of-X, including the cases where they verbally endorse that valuing (along with a bunch of other kinds of preferences, both wanted and unwanted)
I think the shard theory workstream generally uses “human values” in the last sense.
- TurnTrout 11 Aug 2022 17:24 UTC
  12 points
  8
  Parent
  I view values as action-determinants, the moment-to-moment internal tugs which steer my thinking and action. I cash out “values”/”subshards” as “contextually activated computations which are shaped into existence by past reinforcement, and which often steer towards their historical reinforcers.”
  This is a considerably wider definition of “human values” than usually considered. For example, I might have a narrowly activated value against taking COVID tests, because past reinforcement slapped down my decision to do so after I tested positive and realized I’d be isolating all alone. (The testing was part of why I had to isolate, after all, so I infer that my credit assignment tagged that decision for down-weighting)
  This unusual definitional broadness is a real cost, which is why I like talking about an anti-COVID-test subshard, instead of an anti-test “value.”
  - Steven Byrnes 11 Aug 2022 18:05 UTC
    6 points
    3
    Parent
    I might have a narrowly activated value against taking COVID tests…
    Hmm, I think I’d say “I might feel an aversion…” there.
    “Desires & aversions” would work in a context where the sign was ambiguous. So would “preferences”.