If I know that Shard Theory is 100% true, then I don’t care if I see or don’t see direct benefits for Alignment: understanding human psychology is important for Alignment one way or another.
But if I’m not sure that Shard Theory is true, then I would like to judge it by its direct benefits for Alignment too. Maybe even judge its plausibility by its usefulness. (Because of this I’m not sure that making multiple posts about multiple aspects of Shard Theory is the best option.) An example of mixing “plausibility” and “usefulness”: neural nets don’t make predictions about human behavior, but they’re extremely useful, they’re the only thing that does anything human-like, so the idea of “neural nets as a model of humans” gets its plausibility from its usefulness.
If you think that usual Alignment problems don’t apply to shards, the burden of proof is on you. You have to translate those problems into the language of the theory yourself.
Examples of explaining human behavior are strange:
You choose what specific things to explain with your theory.
Other possible explanations aren’t explored.
You cut your data (“human behavior”) in very specific pieces and you are not critical of the pieces.
The third point may be the most important one. Here’re some specific examples:
Why do we care more about nearby visible strangers as opposed to distant strangers?
I’m not sure that’s true. Here we already moved from facts to interpretations of facts.
We think that the answer is simple. First consider the relevant context. The person sees a drowning child. What shards activate? Consider the historical reinforcement events relevant to this context. Many of these events involved helping children and making them happy. These events mostly occurred face-to-face.
One thing that feels strange about those examples is that they seem to ignore people’s general ability to think. The theory may be getting too mechanical on too high level of cognition.
Personally, I (TurnTrout) am more inclined to make plans with my friends when I’m already hanging out with them—when we are already physically near each other. But why?
I think anything can trigger making plans with friends, depending on your personality and absolutely random factors. If it weren’t true people would be as rigid as zombies.
Therefore, the sunflower-timidity-shard was grown from… Hm. It wasn’t grown. The claim isn’t true, and this shard doesn’t exist, because it’s not downstream of past reinforcement.
Thus: Shard theory does not explain everything, because shards are grown from previous reinforcement events and previous thoughts. Shard theory constrains anticipation around actual observed human nature.
I’m not sure Shard Theory really doesn’t explain this. You could say that sunflowers are very similar to other good things in the past, e.g. Beautiful Calm Nature and Movies and Flowers As Something Good and Places That Are 100% Not School or Workplace or Busy City. And I think that Shard Theory is actually in trouble either way:
If it explains everything, that’s a problem.
If it doesn’t explain “everything”, then it unrealistically limits the human thinking and emotions way too much.
If you explain human behavior using both Shard Theory and general thinking ability, then it becomes unclear what’s caused by shards and what’s caused by any other thoughts. And your argument “we can’t use common sense” doesn’t hold anymore.
Maybe that’s the key thing that makes me doubt the theory.
You could say that sunflowers are very similar to other good things in the past, e.g. Beautiful Calm Nature and Movies and Flowers As Something Good and Places That Are 100% Not School or Workplace or Busy City. And I think that Shard Theory is actually in trouble either way:
How does that reinforcement event history create a sunflower-timidity-shard?
I think such reinforcement history could create “Nature—timidity” shard and sunflowers (and flowers in general) could be a strong symbol of nature.
By the way, I would like if explanations of human behavior were discussed more in the post. E.g. if the post proposed a couple of shard based explanations and compared them to some non-shard based explanations. For example: (I realize that the explanation is just an example, it’s not final)
For example, perhaps there is a hardcoded reward circuit which is activated by a crude subcortical smile-detector and a hardcoded attentional bias towards objects with relatively large eyes. Then reinforcement events around making children happy would cause people to care about children. For example, an adult’s credit assignment might correctly credit decisions like “smiling at the child” and “helping them find their parents at a fair” as responsible for making the child smile. “Making the child happy” and “looking out for the child’s safety” are two reliable correlates of smiles, and so people probably reliably grow child-subshards around these correlates.
Does the theory say that a full-grown adult wouldn’t have enough mental machinery to care about children strong enough if she lacked “smile-detector” and “large eyes detector” or a couple of specific decisions in the past?
If you saw that someone vandalizes something important to your friend (e.g. her artworks), you probably would get a strong reaction to that just because you understand what’s happening. Or because of some more general shards (e.g. related to “effort” and to yourself and to your friend). Wouldn’t a drowning child activate much more shards and/or other things?
Caring about anything/anyone alive.
Seeing yourself in the child (empathy).
Caring about anyone who can care about the child.
Something bad happening. Emergency.
Fear of guilt/judgement.
Disgust about the possibilities.
Sorry if you already wrote about it, but does Shard Theory fall under the umbrella of behaviorism?
Behaviorism is a systematic approach to understanding the behavior of humans and other animals.[1] It assumes that behavior is either a reflex evoked by the pairing of certain antecedent stimuli in the environment, or a consequence of that individual’s history, including especially reinforcement and punishment contingencies, together with the individual’s current motivational state and controlling stimuli. Although behaviorists generally accept the important role of heredity in determining behavior, they focus primarily on environmental events.
If I know that Shard Theory is 100% true, then I don’t care if I see or don’t see direct benefits for Alignment: understanding human psychology is important for Alignment one way or another.
But if I’m not sure that Shard Theory is true, then I would like to judge it by its direct benefits for Alignment too. Maybe even judge its plausibility by its usefulness. (Because of this I’m not sure that making multiple posts about multiple aspects of Shard Theory is the best option.) An example of mixing “plausibility” and “usefulness”: neural nets don’t make predictions about human behavior, but they’re extremely useful, they’re the only thing that does anything human-like, so the idea of “neural nets as a model of humans” gets its plausibility from its usefulness.
If you think that usual Alignment problems don’t apply to shards, the burden of proof is on you. You have to translate those problems into the language of the theory yourself.
Examples of explaining human behavior are strange:
You choose what specific things to explain with your theory.
Other possible explanations aren’t explored.
You cut your data (“human behavior”) in very specific pieces and you are not critical of the pieces.
The third point may be the most important one. Here’re some specific examples:
I’m not sure that’s true. Here we already moved from facts to interpretations of facts.
One thing that feels strange about those examples is that they seem to ignore people’s general ability to think. The theory may be getting too mechanical on too high level of cognition.
I think anything can trigger making plans with friends, depending on your personality and absolutely random factors. If it weren’t true people would be as rigid as zombies.
I’m not sure Shard Theory really doesn’t explain this. You could say that sunflowers are very similar to other good things in the past, e.g. Beautiful Calm Nature and Movies and Flowers As Something Good and Places That Are 100% Not School or Workplace or Busy City. And I think that Shard Theory is actually in trouble either way:
If it explains everything, that’s a problem.
If it doesn’t explain “everything”, then it unrealistically limits the human thinking and emotions way too much.
If you explain human behavior using both Shard Theory and general thinking ability, then it becomes unclear what’s caused by shards and what’s caused by any other thoughts. And your argument “we can’t use common sense” doesn’t hold anymore.
Maybe that’s the key thing that makes me doubt the theory.
How does that reinforcement event history create a sunflower-timidity-shard?
I think such reinforcement history could create “Nature—timidity” shard and sunflowers (and flowers in general) could be a strong symbol of nature.
By the way, I would like if explanations of human behavior were discussed more in the post. E.g. if the post proposed a couple of shard based explanations and compared them to some non-shard based explanations. For example: (I realize that the explanation is just an example, it’s not final)
Does the theory say that a full-grown adult wouldn’t have enough mental machinery to care about children strong enough if she lacked “smile-detector” and “large eyes detector” or a couple of specific decisions in the past?
If you saw that someone vandalizes something important to your friend (e.g. her artworks), you probably would get a strong reaction to that just because you understand what’s happening. Or because of some more general shards (e.g. related to “effort” and to yourself and to your friend). Wouldn’t a drowning child activate much more shards and/or other things?
Caring about anything/anyone alive.
Seeing yourself in the child (empathy).
Caring about anyone who can care about the child.
Something bad happening. Emergency.
Fear of guilt/judgement.
Disgust about the possibilities.
Sorry if you already wrote about it, but does Shard Theory fall under the umbrella of behaviorism?