David Udell comments on Shard Theory: An Overview

David Udell 12 Aug 2022 3:45 UTC
LW: 1 AF: 1
0
AF
I don’t know, I think of the brain as doing credit assignment pretty well, but we may have quite different definitions of good and bad. Is there an example you were thinking of?
Say that the triggers for pleasure are hardwired. After a pleasurable event, how do only those computations running in the brain that led to pleasure (and not those randomly running computations) get strengthened? After all, the pleasure circuit is hardwired, and can’t reason causally about what thoughts led to what outcomes.
(I’m not currently confident that pleasure is exactly the same thing as reinforcement, but the two are probably closely related, and pleasure is a nice and concrete thing to discuss.)
What’s to stop the human shards from being dominated and extinguished by the non-human shards? IE is there reason to expect equilibrium?
Nothing except those shards fighting for their own interests and succeeding to some extent.
You probably have many contending values that you hang on to now, and would even be pretty careful with write access to your own values, for instrumental convergence reasons. If you mostly expect outcomes where one shard eats all the others, why do you have a complex balance of values rather than a single core value?
- TurnTrout 15 Aug 2022 5:21 UTC
  LW: 3 AF: 3
  0
  AF Parent
  If you mostly expect outcomes where one shard eats all the others, why do you have a complex balance of values rather than a single core value?
  There’s a further question which is “How do people behave when they’re given more power over and understanding of their internal cognitive structures?”, which could actually resolve in “People collapse onto one part of their values.” I just think it won’t resolve that way.