A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system’s world model (presumably, the world model is shared among all the shards in a brain).
Individually, shards are pretty dumb. A simple shard might just be an algorithm for executing some rote behavior, conditional on some observation, that is sufficient to harvest sufficient reinforcement to continue existing. Taken together, all of your shards are exactly as intelligent as you, a human-level intelligence. Large coalitions of shards can leverage the algorithms of coalition members, once they happen upon the strategy of cooperating with other shards to gain more steering control by preventing rival shards from being activated or born.
Interesting human behaviors, on shard theory, are the product of game-theoretic interaction among shards in the brain. The negotiation-game equilibria that shards (and coalitions of shards) reach can be arbitrarily good or bad—remember that shards are sub-human-intelligence. C.f. George Ainslie on the game-theoretic shape of addiction in humans.
Shards are factored utility functions: our utility functions are far too informationally complex to represent in the brain, and so our approach to reaching coherence is to have situationally activated computations that trigger when a relevant opportunity is observed (where apparent opportunities are chunked using the current conceptual scheme of the agent’s world model). So shard theory can be understood as an elaboration of the standard agent model for computationally bounded agents (of varying levels of coherence) like humans and deep RL agents.
I’m pretty skeptical that sophisticated game theory happens between shards in the brain, and also that coalitions between shards are how value preservation in an AI will happen (rather than there being a single consequentialist shard, or many shards that merge into a consequentialist, or something I haven’t thought of).
To the extent that shard theory makes such claims, they seem to be interesting testable predictions.
A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system’s world model (presumably, the world model is shared among all the shards in a brain).
Individually, shards are pretty dumb. A simple shard might just be an algorithm for executing some rote behavior, conditional on some observation, that is sufficient to harvest sufficient reinforcement to continue existing. Taken together, all of your shards are exactly as intelligent as you, a human-level intelligence. Large coalitions of shards can leverage the algorithms of coalition members, once they happen upon the strategy of cooperating with other shards to gain more steering control by preventing rival shards from being activated or born.
Interesting human behaviors, on shard theory, are the product of game-theoretic interaction among shards in the brain. The negotiation-game equilibria that shards (and coalitions of shards) reach can be arbitrarily good or bad—remember that shards are sub-human-intelligence. C.f. George Ainslie on the game-theoretic shape of addiction in humans.
Shards are factored utility functions: our utility functions are far too informationally complex to represent in the brain, and so our approach to reaching coherence is to have situationally activated computations that trigger when a relevant opportunity is observed (where apparent opportunities are chunked using the current conceptual scheme of the agent’s world model). So shard theory can be understood as an elaboration of the standard agent model for computationally bounded agents (of varying levels of coherence) like humans and deep RL agents.
I’m pretty skeptical that sophisticated game theory happens between shards in the brain, and also that coalitions between shards are how value preservation in an AI will happen (rather than there being a single consequentialist shard, or many shards that merge into a consequentialist, or something I haven’t thought of).
To the extent that shard theory makes such claims, they seem to be interesting testable predictions.