TurnTrout comments on Understanding and avoiding value drift

TurnTrout 13 Sep 2022 1:02 UTC
LW: 5 AF: 4
2
AF
I regret that this post doesn’t focus on practical advice derived from shard theory. Instead, I mostly focused on a really cool ideal-agency trick (“pretend really hard to wholly fool your own credit assignment”), which is cool but impracticable for real people (joining the menagerie currently inhabited by e.g. logical inductors, value handshakes, and open-source game theory).
I think that shard theory suggests a range of practical ways to improve your own value formation and rationality. For example, suppose I log in and see that my friend John complimented this post. This causes a positive reward event. By default, I might (subconsciously) think “this feels good because John complimented me.” Which causes me to be more likely to act to make John (and others) approve of me.
However, that’s not how I want to structure my motivation. Instead, in this situation, I can focus on the cognition I want reinforced:
this feels good because John complimented me, which happened because I thought carefully this spring and came up with new ideas, and then communicated them clearly. I’m glad I thought carefully, that was great. I noticed confusion when Quintin claimed (IIRC) that wireheading always makes you more of a wireheader—I stopped to ask whether that was actually true. What do I think I know, and why do I think I know it? Noticing that confusion was also responsible for this moment.
I’m basically repeating and focusing the parts I want to be reinforced. While I don’t have a tight first-principles argument that this conscious attention does in fact redirect my credit assignment in the right way, I really think it should, and so I’ve started this practice on that hunch.
- Martin Randall 1 Feb 2025 15:27 UTC
  2 points
  0
  Parent
  This seems relatively common in parenting advice. Parents are recommended to specifically praise the behavior they want to see more of, rather than give generic praise. Presumably the generic praise is more likely to be credit-assigned to the appearance of good behavior, rather than what parents are trying to train.