lynettebye comments on The shard theory of human values

lynettebye 4 Nov 2022 18:39 UTC
1 point
0
Time inconsistency example: You’ve described shards as context-based predictions of getting reward. One way to model the example would be to imagine there is one shard predicting the chance of being rewarded in the situation where someone is offering you something right now, and another shard predicting the chance you will be rewarded if someone is promising they will give you something tomorrow.
For example, I place a substantially better probability on getting to eat cake if someone is currently offering me the slice of cake, compared to someone promising that they will bring a slightly better cake to the office party tomorrow. (In the second case, they might get sick, or forget, or I might not make it to the party.)
- TurnTrout 7 Nov 2022 22:49 UTC
  2 points
  0
  Parent
  You’ve described shards as context-based predictions of getting reward.
  I think you’re summarizing “Shard theory views ‘shards’ as contextually-activated predictors of low-level reward events (i.e. reward prediction errors).” If so, that’s not what I meant to communicate. On my view, shards usually aren’t reward predictors at all, the shards were simply shaped into existence by past reward events. Here’s how I’d analyze the situation:
  My cake-shard would have been shaped into existence by past reinforcement events related to cake. My cake shard affects my decisions more strongly in situations which are similar to the past reinforcement events (e.g. because I internalized heuristics like “If I see cake, then be more likely to eat cake”), and therefore I’m more tempted by cake when I can see cake.
  - lynettebye 7 Nov 2022 23:43 UTC
    3 points
    2
    Parent
    I wasn’t thinking of shards as reward prediction errors, but I can see how the language was confusing. What I meant is that when multiple shards are activated, they affect behavior according to how strongly and reliably they were reinforced in the past. Practically, this looks like competing predictions of reward (because past experience is strongly correlated with predictions of future experience), although technically it’s not a prediction—the shard is just based on the past experience and will influence behavior similarly even if you rationally know the context has changed. E.g. the cake shard will probably still reinforce eating cake even if you know that you just had mouth-changing surgery that means you don’t like cake anymore.
    (However, I would expect that shards evolve over time. So in the this example, after enough repetitions reliably failing to reinforce cake eating, the cake shard would eventually stop making you crave cake when you see cake.)
    So in my example, cleaner language might be: For example, I more reliably ate cake in the past if someone was currently offering me the slice of cake, compared to someone promising that they will bring a slightly better cake to the office party tomorrow. So when the “someone is currently offering me something” shard and the “someone is promising me something” shard are both activated, the first shard affects my decisions more, because it was rewarded more reliably in the past.
    (One test of this theory might be whether people are more likely to take the bigger, later payout if they grew up in extremely reliable environments where they could always count on the adults to follow through on promises. In that case, their “someone is promising me something” shard should have been reinforced similarly to the “someone is currently offering me something” shard. This is basically one explanation given for the classic Marshmallow Experiment—kids waited if they trusted adults to follow through with the promised two marshmallows; kids ate the marshmallow immediately if they didn’t trust adults.)